Row Vs Column store – What’s all the fuss about?

Welcome back readers to the next beginner’s guide to HANA where we try to understand what row store and column store mean in terms of data storage.


To all those excited folks who just want to see the development steps in one of the modeling tools, I understand your anxiety but trust me when I tell you by experience that you are chasing the wrong goals. Understanding how databases behave is really important in your well rounded growth as a back end developer and architect.

Patience is the key. Trust me that you’ll grow and learn everything you need as we progress – one step at a time.

Coming back to the topic, we all know what a table is. We have been drawing them since our school math classes and then in excel spreadsheets soon after.

ROW STORE COLUMN STORE

The most common way of storing data in a database is in the form of a table. For example, have a look at the below representation of a table. This is what we call a logical representation – a way for regular folks to draw and represent rows and columns of data.

But have you ever asked yourself how this information is stored in a database? Disk memory can be envisioned as a continuous array of blocks where each block can store 1 cell/value of data (from a table in this case).

Row Store

1

Row Store

ROW STORE COLUMN STORE

Traditional database storage of tables is generally row store.

Let’s understand this with the figure where you can see how the table above is stored in the database.

As you can see, it is stored sequentially as Row1, Row2, Row3 and so on.

Each row of data is stored one after the other in perfect sequence. From this image, you could start to see the evident problems to read data. Even if I want to read only one column of this dataset, my read skims through the entire data array. The picture of a table changes dramatically when looking at it in terms of memory, doesn’t it?

Column Store

ROW STORE COLUMN STORESome databases support another additional form of storage called the column store. Each column in the table acts as an individual table, an
d gets stored separately.

Why would you do that? Excellent question. Thank you for asking.

When each column acts as an individual table, each of these individual mini-tables can be indexed (=sorted) and compressed (=process of removing duplicates).

This makes sure that each of these tables only contains a unique entry.
The below example illustrates how it works but does no real justice in portraying the real advantage. To realize this, imagine a table with a million rows. Most of the columns would have only a few hundred or thousand at most of unique values. Compression makes sure you save disk space and indexing makes sure you find things faster.

Note: Databases supporting column store entries also support row store.

Now at this point you might be wondering if the Column store is such a magical gift to mankind, why bother with a Row store at all? Before you decide to kill it with fire, let’s analyze the pros and cons.

Major Differences between Row store and Column store

Property Row Store Column Store Reason
Memory usage Higher Lower Compression
Transactions Faster Slower Modifications require updates to multiple columnar tables
Analytics Slower even if indexed Faster Smaller dataset to scan, inherent indexing



Now there are deeper concepts to Row stores and Column stores. For example the Delta merge process for column store wherein a column store consists of a main store, a primary delta store and a secondary delta store which is actually a row store. Did I lose you there? Don’t lose sleep on these concepts for now. I would come back to these eventually once you are a bit more mature on this technological journey.

For those curious technical nerds out there, if you don’t want to wait for my explanation, I would recommend reading the below:

  1. Delta Merge Concepts
  2. An excellent MIT paper elaborating how row store and column store work in detail

By the end of this post you would have now:

Gained a basic understanding how row and columnar databases work.

Help this website grow by sharing this document on social media by using the icons below.

Stay tuned for the next tutorials. Happy Learning!

Sponsored Ad:

Whether you are using Ubuntu, Chrome, Mac, iOS or Android operating system on your device, you can be productive on a Windows Cloud Desktop And can also get Integrated SAP HANA with SharePoint From Apps4Rent.com

<<Previous Tutorial                                                                                                             Next Tutorial>>

Content Protection by DMCA.com
Tagged , , , , , , .

16 Comments

    • Thank you Venkatesh for your kind words. Please support this site by sharing the documents you like on social media.

  1. Still confused about the column store after compression, how do DB rebuild the row , or say how is a query find me revenue of sports category work . Can you give me an example.
    i think there must be a number added after compression say housing , 2 after compression which then denotes 2 rows with housing are compressed.

    • Native HANA is a term loosely used for HANA development using SAP HANA studio directly on the SAP HANA database. This means that you are not using any application like SAP BW or SAP ECC on top of SAP HANA.
      There is no specific term called non-native HANA.
      S/4 HANA is the latest version of ECC on SAP HANA. It is an application running on top of SAP HANA.

  2. Thank you soooo much for these amazing articles.I feel like keep reading and reading and reading..Simple language ,to the point topics ..Short and sweet 🙂

Leave a Reply