Row Vs Column store – What’s all the fuss about?
Welcome back readers to the next beginner’s guide to HANA where we try to understand what row store and column store mean in terms of data storage.
To all those excited folks who just want to see the development steps in one of the modeling tools, I understand your anxiety but trust me when I tell you by experience that you are chasing the wrong goals. Understanding how databases behave is really important in your well rounded growth as a back end developer and architect.
Patience is the key. Trust me that you’ll grow and learn everything you need as we progress – one step at a time.
Coming back to the topic, we all know what a table is. We have been drawing them since our school math classes and then in excel spreadsheets soon after.
The most common way of storing data in a database is in the form of a table. For example, have a look at the below representation of a table. This is what we call a logical representation – a way for regular folks to draw and represent rows and columns of data.
But have you ever asked yourself how this information is stored in a database? Disk memory can be envisioned as a continuous array of blocks where each block can store 1 cell/value of data (from a table in this case).
Row Store
1
Row Store
Traditional database storage of tables is generally row store.
Let’s understand this with the figure where you can see how the table above is stored in the database.
As you can see, it is stored sequentially as Row1, Row2, Row3 and so on.
Each row of data is stored one after the other in perfect sequence. From this image, you could start to see the evident problems to read data. Even if I want to read only one column of this dataset, my read skims through the entire data array. The picture of a table changes dramatically when looking at it in terms of memory, doesn’t it?
Column Store
Some databases support another additional form of storage called the column store. Each column in the table acts as an individual table, an
d gets stored separately.
Why would you do that? Excellent question. Thank you for asking.
When each column acts as an individual table, each of these individual mini-tables can be indexed (=sorted) and compressed (=process of removing duplicates).
This makes sure that each of these tables only contains a unique entry.
The below example illustrates how it works but does no real justice in portraying the real advantage. To realize this, imagine a table with a million rows. Most of the columns would have only a few hundred or thousand at most of unique values. Compression makes sure you save disk space and indexing makes sure you find things faster.
Note: Databases supporting column store entries also support row store.
Now at this point you might be wondering if the Column store is such a magical gift to mankind, why bother with a Row store at all? Before you decide to kill it with fire, let’s analyze the pros and cons.
Major Differences between Row store and Column store
Property | Row Store | Column Store | Reason |
---|---|---|---|
Memory usage | Higher | Lower | Compression |
Transactions | Faster | Slower | Modifications require updates to multiple columnar tables |
Analytics | Slower even if indexed | Faster | Smaller dataset to scan, inherent indexing |
Now there are deeper concepts to Row stores and Column stores. For example the Delta merge process for column store wherein a column store consists of a main store, a primary delta store and a secondary delta store which is actually a row store. Did I lose you there? Don’t lose sleep on these concepts for now. I would come back to these eventually once you are a bit more mature on this technological journey.
By the end of this post you would have now:
Gained a basic understanding how row and columnar databases work.
Help this website grow by sharing this document on social media by using the icons below.
Stay tuned for the next tutorials. Happy Learning!
Sponsored Ad:
Whether you are using Ubuntu, Chrome, Mac, iOS or Android operating system on your device, you can be productive on a Windows Cloud Desktop And can also get Integrated SAP HANA with SharePoint From Apps4Rent.com
Great Work Shyam..! Keep doing and Growing
Thanks Shakthi!
Really well written with simple language. Kudos Shyam.
Thanks for the kind words. Glad this was helpful to you.
Wonderful explanation. Thanks for your initiative to share your knowledge.
Thank you Venkatesh for your kind words. Please support this site by sharing the documents you like on social media.
One of the best dude… keep posting.. this is how things shoud be explained.. Damn good
Still confused about the column store after compression, how do DB rebuild the row , or say how is a query find me revenue of sports category work . Can you give me an example.
i think there must be a number added after compression say housing , 2 after compression which then denotes 2 rows with housing are compressed.
I want to know the difference between native and non-native hana ?
hana and s/4 hana ?
Native HANA is a term loosely used for HANA development using SAP HANA studio directly on the SAP HANA database. This means that you are not using any application like SAP BW or SAP ECC on top of SAP HANA.
There is no specific term called non-native HANA.
S/4 HANA is the latest version of ECC on SAP HANA. It is an application running on top of SAP HANA.
That means I can built directly reports on top of calculated view without loading again into BW for reporting
Very well explained.. Thanks a lot Shyam!!
Thanks! Please share a tutorial to help us grow.
Thank you soooo much for these amazing articles.I feel like keep reading and reading and reading..Simple language ,to the point topics ..Short and sweet 🙂
Thank you so much for providing simple but excellent reading materials 🙂
Thank you very much for appreciating my work. Glad you found it helpful
Nice work Shyam. You have explained it in such a easy way. Keep sharing knowledge. Appreciate your efforts..!!!
Excellent way of explaining. Keep up the great work!
I am new in SAP HANA environment but want to know that It is hard requirement to learn SAP ABAP before learning HANA or we can avoid it.
If I avoid ABAP what are the things I am leaving behind ?
ABAP is not required for learning enterprise HANA. The database language is HANA SQL and most of the operations can be done without any coding at all.
The MIT paper in this link is not working
http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf
Thank you for letting me know. It’s been 7 years so most of the external URLs are dead. I just removed it from the blog. It’s best to google Delta merge in SAP and you’d get detailed articles on this. I won’t link a new one since that would expire again someday. Best to google it. Thanks for letting me know.