SAP IQ – Quietly Brilliant
Author – Mike Coe, Senior Analytics Consultant, NTT DATA Business Solutions UK&I
A benefits summary of the IQ technology that underpins SAP HANA Data Lake
Whilst SAP HANA has rightly taken the headlines over recent years with in-memory processing, SAP IQ has quietly continued to provide its customer base with fast, reliable database solutions.
With impressive compression rates, massively parallel processing and superior scalability, SAP IQ has a deserved reputation for providing large-scale columnar store database solutions. Not only this, but SAP IQ is ideal for modern big data and analytics workloads. With columnar storage, multiple optimisation techniques and in-memory caches, data is made available in near real-time for easy analysis.
SAP IQ has a long history of providing columnar storage solutions. Starting out as Expressway in the early 1990s, it was acquired by Sybase in 1995 and became an SAP product in 2010. Since then it has built an enviable reputation for stability and performance within its user community. According to the German-based E-3 magazine, SAP Sybase IQ is implemented at 45 of the top 50 banks and security firms and when version 16 was released in 2014, SAP advertised a 97 percent customer satisfaction rate.
Since then, SAP IQ has grown beyond transactional and analytical use cases and has been coupled with SAP HANA as Near Line Storage (NLS) to deliver a distributed in-memory analytics platform.
More recently, SAP IQ has moved into the cloud with a new name – SAP HANA Data Lake or SAP HDL for short but more on this later.
In this blog, I will summarise the benefits of the IQ technology to explain why it is the perfect choice to be partnered with SAP HANA Cloud or as a standalone analytics database.
Lightening read performance
SAP IQ has impressive response times and that is because read and write operations are directed to the memory caches, nothing goes directly to disk. For read optimisation, SAP IQ uses Distributed Query Processing and numerous indexing options to provide fantastic read performance.
For Distributed Query Processing, the read query is handled by a leader node which may then call upon numerous worker nodes to complete the execution. Any query capable of parallel processing will automatically be handled this way unless you have explicitly turned off this functionality. What this means is that your full CPU resources are being used to return your results.
This approach is combined with clever use of indexing. IQ offers multiple indexing options and by default it automatically indexes data on ingestion – whenever you load data into an IQ, you are actually building column indexes.
By default, IQ will apply the Fast Projection index to new tables which optimizes projections. In a typical analytic deployment comprising of dimension and fact tables, I would recommend the High Group (HG) index for integer surrogate key columns that form the basis of the joins. The HG index handles the “Group By” function more efficiently.
Importantly, SQP IQ can use its indexes more effectively than other relational databases – not only looking at the table/attributes referenced in the query but considering joins and search arguments.
This combination of Distributed Query Processing plus its effective use of multiple indexes is key to its fast read response.
Even faster write performance
If you think IQ query time is impressive, wait until you see how fast your data can be loaded.
When it comes to write performance, the multi-threaded bulk loader excels as it will use all available cores for the task – so making better use of your system resources. Wherever possible use the LOAD TABLE command or enable the bulk loader option on ETL tools such as SAP Data Services. SAP Data Services will then generate a file that is then loaded directly into IQ using LOAD TABLE.
In the more recent versions of IQ, an in-memory, write-optimized store is also available. This is known as the Row-Level Versioning (RLV) store. The RLV store enables concurrent updates to different rows of the same table that can make data available quickly for real-time analysis. The RLV store combined with the IQ main store allows concurrent read and write access to tables. The RLV Store Merge process is self-managing and will merge updates into the main store automatically.
In the latest version, SAP have introduced the Large Memory Cache which further turbo charges the write performance.
Disk. Memory and CPU
You will need a lot less storage space for IQ than other databases – including SAP HANA! You may be surprised to learn that the size of your data in IQ is actually less than the size of your raw data, that is because compression will kick in as soon as it is loaded. It is not uncommon to see a reduction to 20-50% of the original raw data size.
When you scale up hardware, you hope to realise those gains in database performance. With traditional databases, I/O is often the bottleneck and adding more cores may not give you the boost you were expecting. SAP IQ is CPU and memory hungry, the bulk loader will scale linearly as cores are added. For example, a load using 32 cores will be roughly twice as fast as a load using 16 cores.
Memory allocation also plays a key part in IQ’s performance particularly in relation to the role of the temporary cache and the use of Row Level Versioning. The temporary cache is the area that is used to rebuild indexes, which is essentially your data. Having the temporary cache the same size as your main cache makes perfect sense therefore to maintain efficient data access and the size of the RLV cache is dependent on the frequency and volume of data loads. If your IQ database is only updated by a daily batch update, you might even disable RLV for all tables.
Cloud based IQ – HANA Data Lake
It was only a matter of time before IQ moved into the cloud and in April 2020 SAP launched SAP HANA Data Lake (HDL).
The obvious benefits of cloud based HDL over on-premise IQ are its elasticity and scalability:-
- Elastic in that you can increase CPUs for peak periods of usage through the year, for example.
- Scalable in that you can add nodes and other components as your requirements grow.
The configuration of HDL is managed through the SAP HANA Cloud Central cockpit, which will be familiar to HANA Cloud customers. What makes it particularly powerful is that data stored within the SAP HDL can now include structured, semi-structured, and unstructured data files increasing its flexibility. Most importantly though is that there is close integration between SAP HANA Cloud and SAP HDL meaning that you can easily combine data within the two systems whilst benefiting from the strengths of both.
That said, the SAP IQ based HANA Data Lake can also be implemented as a standalone solution. This allows existing customers to transition easily from on-premise SAP IQ to SAP HDL. To help in that transition, SAP HDL also provides continued support of on-premise ETL tools such as SAP Data Services.
By offering HANA Data Lake as a standalone solution, existing IQ customers can plan a phased migration to the cloud. Once IQ customers have migrated to HANA Data Lake, evolving into a full SAP HANA Cloud solution that leverages the power of in-memory HANA becomes an easier and more logical step.
Whether you are considering SAP IQ or HANA Data Lake as a standalone database or as part of a distributed platform with SAP HANA Cloud, NTT DATA Business Solutions can help your organisation make the most of this technology.
Further reading and references
NTT DATA Business Solutions offer a range of Analytics Assessment services to help develop your strategy – from building the business case to conversion, landscape transformation and change management. To discover more, please get in touch here