Native Storage Extension – Giving HANA a Warm Glow
Author, Matt Rogers, Managing Consultant, NTT DATA Business Solutions UK
I am an Analytics consultant with NTT DATA Business Solutions UK, working with our customers to transform and enhance their data into valuable information using the SAP suite of data tools.
I have done a number of data warehousing and analytics projects with clients with HANA as the backbone of their solution and one of the things that you become acutely aware of when using HANA is the need to manage your memory.
Hardly surprising – HANA is THE in-memory database after all – and so having sufficient memory to support both your analytical, transformational and operational needs is vital – not only when you are initially sizing your system, but also thinking ahead to future growth and requirements.
So how do you make the most of your precious memory?
Memory management
As you develop in HANA, it soon becomes second nature to monitor your server’s memory status, checking rtedump.oom trace files and thinking hard about the potential memory usage implications of what you design and develop.
It becomes even more important when you start designing your ETL/ELT data flows and begin thinking about how to support multiple users running queries against your virtual data models. Better to be prepared to be efficient from the start than be searching for improvements when users are complaining and the client is looking increasingly concerned!
Since it is a question of when, and not if, you will get memory pressures though, you need to start thinking about how you will handle them. So, what are some of our options?
1. Tweak our code and design to be EVEN MORE EFFICIENT.
2. Buy more memory/licence so the problem goes away.
3. Manage our data more effectively so we hold less in memory.
Option 1 is always something to start with – but it can only go so far in terms of mitigating the problem. Option 2 is arguably the easiest option but is not necessarily the most cost-effective approach and does not address any fundamental issues you may have.
Options 1 and 2 are both valid and need to be explored – but Option 3 is something that every data warehouse project will need to some extent when you begin to put meaningful quantities of data into your system and start to run queries that are more challenging.
Data Management Options
As your data volumes grow, you will experience a variety of management challenges that can be handled in a variety of ways – partitioning, more sophisticated delta merging, not pre-loading tables etc. – but a fundamental one is recognising when you need to implement a data temperature strategy.
Data temperature refers to the fact that “not all data is equal”, so not all data needs or deserves to sit at the top of the HANA tree, and be held in your expensive memory. Some data is less important and some business questions are less urgent so where this is the case, we have the opportunity to move some of our data our of expensive, high-performance “hot” memory layer and into “cooler” layers.
“Warm” data is not deemed time-critical or is accessed less frequently so can be stored on slower, lower cost storage tiers and “Cold” data is even less time-sensitive and so can be stored away on the lowest-cost tiers for limited and slower access.
The decision as to where your data gets located will depend upon a number of factors but choosing the right one will allow you to continue to expand the size of your Data Warehouse whilst still giving your users the performance they need.
Over the years, SAP HANA has had a number of ways to implement Data Temperature management strategies and so the landscape has become a little confusing. As we can see from the graphic below – the Hot Store is simple enough being essentially your RAM – but Warm and Cold have had a variety of different approaches that combined flexibility, cost-effectiveness, performance and management complexity to varying degrees of effectiveness.
But with the arrival of HANA 2.0 SPS 04, we now have a new option within the Warm tier offering – the Native Storage Extension (or NSE) – and it looks like it could be really rather important.
What is the Native Storage Extension?
In HANA, normally data is held completely in-memory for it to be available for processing and querying. Data is persisted on disk but loaded up into memory either in advance (pre-loading), or upon demand as it is requested by queries and other processes. Leaving aside how the underlying table is partitioned for the time being, fundamentally data is “column loadable” i.e. it is the entire table column that is loaded, up into memory.
If perhaps you need to use all the data within a column for your query then this is great – but if not then it just sits there taking up valuable RAM until it’s used or it is unloaded by the system to make way for more data; and loading and unloading of unnecessary data is something you want to minimise.
NSE works differently in that data is still persisted on disk but you can define that some, or all of this data is classed as “page loadable”. This means that when the system requires the data, it can selectively load the data you need, page by page, data into memory.
Page loadable data does not need to be held completely in-memory, unlike column loadable data.
To facilitate this loading, HANA requires a Buffer Cache. This is a dedicated memory block, which is used to transfer data pages between disk and memory. By default it is enabled and sized at 10% of HANA memory. You can alter the Buffer Cache size using the following parameters;
- max_size – specifies the upper limit explicitly, in MBs.
- max_size_rel – specifies the upper limit as a percentage of the SAP HANA memory.
The full Buffer Cache is not allocated immediately but only as required, by the processes, which need NSE data, so if you are not using NSE then you will not suddenly lose 10% of your memory.
When NSE data is requested, only the necessary data gets loaded into the Buffer Cache and once in the Cache, it can be used by multiple queries.
The graphic below from the official SAP documentation illustrates how data flows between the disk and the Buffer Cache.
Why use NSE over the other Warm options?
As mentioned, HANA has other options for implementing Warm data tiers but NSE offers some very persuasive reasons for using it.
• Most importantly, NSE is incorporated into the core HANA architecture. It is integrated with the other HANA functional layers, such as query optimizer, query execution engine, column store, and persistence layers so it supports full HANA functionality and almost all HANA data types and models (see Considerations below). This means you not need to modify any applications built over your existing data models.
• It is easy to implement. It requires minimal changes required to the table and column definitions to make data page loadable. All you need to do is specify the “Load Unit” preference as “PAGE” within your Create or Alter table DDL statement and HANA will automatically begin the process of data page loadable and in a non-blocking manner so no downtime. As well as tables, you can also make Indexes and Dictionaries page loadable, further reducing your memory usage.
• Thirdly, it can simplify your landscape – no need for additional physical nodes like with Extension Nodes or add on functionality like Dynamic Tiering where an additional service (esserver) is added to the system. With NSE, it is all available immediately.
Finally, SAP are making it even easier to see where you can implement NSE by providing a XS app called the NSE Advisor which runs a heuristic algorithm over a cache of data access statistics. Based on the analysis it will then advise whether you should consider changing the Load Unit of a table, partition or column based on the performance/memory use balance.
Considerations
It all sounds promising but nothings perfect so there are obviously some limitations to consider.
There is an overhead in terms of performance to moving data from disk to memory – even if you are able to do it selectively by page – but you can mitigate this further in a couple of ways. Firstly, because NSE supports partitioning you can set up your page-loadable tables more effectively for querying and take advantage of parallelisation, partition pruning and more efficient loading (including delta merge). Secondly, you can take advantage of SSD’s to make the impact of storing and retrieving data on disk as small as possible.
In addition to the performance impact though, NSE also has some functional limitations. For these you need to refer to the SAP Note 2771956, but in summary;
– NSE is limited to 10 TB per HANA Indexserver.
– NSE is only supported for heterogeneous partitions and specifically Unbalanced range or Unbalanced range-range.
– NSE does not support the following data types:
o TEXT
o ARRAY
o TimeSeries
o DocStore
– NSE does not support the following table types:
o Row store
o History
o No logging
o Temporary
Conclusion
NSE appears to offer a very attractive addition to HANA’s already impressive capabilities. By allowing us to select which data is page loadable, we can choose the right balance of performance vs cost for our data to suit our use cases.
The fact that it is baked into HANA is also gratifying – it makes it easier to implement and leverage and the performance levels that I have experienced in my own testing suggest that it will be a more than acceptable option in many situations.
NSE could not be called revolutionary – pulling data off disk into memory on demand is hardly that. What does represent is an evolution of HANA’s capabilities outside of pure in-memory processing and one that further blurs the lines between where data sits and simplifies how users and processes interact with it.
According to SAP, it is complementary to the other Warm data tiering options but it is apparently the primary option and so it appears that SAP see a long-term future for this. At the recent TechEd in Las Vegas, one of the presentations around HANA Cloud services outlined the techniques for scaling HANA up in terms of data volumes. NSE was front and centre and that bodes well for its long-term validity.
A version of NSE is apparently already in use within S/4HANA with further development ahead and BW/4HANA is apparently considering supporting it in 2020. So again, further endorsement for the approach and its potential.
With this in mind, I would encourage you to investigate NSE further and see whether it can be incorporated into your native HANA solution landscape. It is simple to try and you never know, you might end up with a warm feeling inside.
If you have any questions on Native Storage Extension, please reach out to us via email.
Useful links:
SAP documentation –
SAP HANA Native Storage Extension
SAP HANA NSE Functional Restrictions
(To access SAP Note 2771956 you need to have login credentials as it is not public access)
Blogs
Data Tiering Options in SAP HANA Webcast Recap
SAP HANA Native Storage Extension: A Native Warm Data Tiering Solution