Row Vs Column store – What’s all the fuss about?
Welcome back readers to the next beginner’s guide to HANA where we try to understand what row store and column store mean in terms of data storage.
To all those excited folks who just want to see the development steps in one of the modeling tools, I understand your anxiety but trust me when I tell you by experience that you are chasing the wrong goals. Understanding how databases behave is really important in your well rounded growth as a back end developer and architect.
Patience is the key. Trust me that you’ll grow and learn everything you need as we progress – one step at a time.
Coming back to the topic, we all know what a table is. We have been drawing them since our school math classes and then in excel spreadsheets soon after.
The most common way of storing data in a database is in the form of a table. For example, have a look at the below representation of a table. This is what we call a logical representation – a way for regular folks to draw and represent rows and columns of data.
But have you ever asked yourself how this information is stored in a database? Disk memory can be envisioned as a continuous array of blocks where each block can store 1 cell/value of data (from a table in this case).
Row Store
1
Row Store
Traditional database storage of tables is generally row store.
Let’s understand this with the figure where you can see how the table above is stored in the database.
As you can see, it is stored sequentially as Row1, Row2, Row3 and so on.
Each row of data is stored one after the other in perfect sequence. From this image, you could start to see the evident problems to read data. Even if I want to read only one column of this dataset, my read skims through the entire data array. The picture of a table changes dramatically when looking at it in terms of memory, doesn’t it?
Column Store
d gets stored separately.
Why would you do that? Excellent question. Thank you for asking.
When each column acts as an individual table, each of these individual mini-tables can be indexed (=sorted) and compressed (=process of removing duplicates).
This makes sure that each of these tables only contains a unique entry.
The below example illustrates how it works but does no real justice in portraying the real advantage. To realize this, imagine a table with a million rows. Most of the columns would have only a few hundred or thousand at most of unique values. Compression makes sure you save disk space and indexing makes sure you find things faster.
Note: Databases supporting column store entries also support row store.
Now at this point you might be wondering if the Column store is such a magical gift to mankind, why bother with a Row store at all? Before you decide to kill it with fire, let’s analyze the pros and cons.
Major Differences between Row store and Column store
Property | Row Store | Column Store | Reason |
---|---|---|---|
Memory usage | Higher | Lower | Compression |
Transactions | Faster | Slower | Modifications require updates to multiple columnar tables |
Analytics | Slower even if indexed | Faster | Smaller dataset to scan, inherent indexing |
Now there are deeper concepts to Row stores and Column stores. For example the Delta merge process for column store wherein a column store consists of a main store, a primary delta store and a secondary delta store which is actually a row store. Did I lose you there? Don’t lose sleep on these concepts for now. I would come back to these eventually once you are a bit more mature on this technological journey.
By the end of this post you would have now:
Gained a basic understanding how row and columnar databases work.
Help this website grow by sharing this document on social media by using the icons below.
Stay tuned for the next tutorials. Happy Learning!
Sponsored Ad:
Whether you are using Ubuntu, Chrome, Mac, iOS or Android operating system on your device, you can be productive on a Windows Cloud Desktop And can also get Integrated SAP HANA with SharePoint From Apps4Rent.com