SAP HANA SQL script concepts- SQL JOIN, UNION, UNION ALL
Welcome to the final installment of this SAP HANA SQL Scripts core concepts section where we learn how to pick up related data from different tables using SQL JOIN. We also learn the concept of combining data sets using the SQL UNION whilst understanding the difference between UNION and UNION ALL.
Before we get started on this section, please read the article on SQL JOIN explaining joins and join types in SAP HANA.
Joins are created when the information we need is split across different tables. For example, if there is a requirement to provide the employee ID, first name, email and age of an employee, we know that the data exists but exists in two separate tables, namely EMP_NAMES and EMP_MASTER.
The basic template for an SQL JOIN is:
SELECT <field_names> from
Note: You can add WHERE, GROUP BY, HAVING and ORDER BY clause as per the requirement just like a regular SELECT condition.
As you might have read in my document on JOINS, the order of tables matters a lot in case of LEFT and RIGHT OUTER JOINS but not in INNER JOIN. The LEFT table dictates the list of key field values for which corresponding values must be picked up from the table on the right side in a LEFT OUTER JOIN. The opposite is true for a RIGHT OUTER JOIN.
When you have two or more tables, using aliases are important for both readability and coding. Each time you write a field name from a table, use it with the alias name as <alias>.<field_name> so that SAP HANA knows which table it comes from.
After the ON keyword, specify the join condition which would be the relationship between the fields that connect these two tables.
So the first thing to do is to decide the type of SQL join you wish to use. Most times, a join would be LEFT OUTER JOINs or INNER JOINS. In our case, both tables contain the same key entries – 50 values of employee ID so it doesn’t matter which one we keep on the left or right. Even if we do a LEFT OUTER or an INNER JOIN, both would return the same values in this case. Now, the decision boils down to performance. Which one of these joins would be faster? The answer is a LEFT OUTER JOIN. As discussed in our SQL JOIN document, LEFT OUTER JOIN only has to scan the key values on the left side table and picks up the corresponding entries on the right table. An INNER JOIN on the other hand scans both sides.
Using this logic, we draft the below code. As you can see below, I give aliases to the two tables involved as ename and emast (you can call them anything but it should look relevant to your table name). We know that EMP_ID exists in both tables. As a thumb rule, try to always pick up common fields from the left table. After the ON keyword, we specify the join condition as ename.EMP_ID = emast.EMP_ID which means that the connecting fields of these two tables is the EMP_ID.
On execution, we see the output as if all this data was coming in from a single table.
As explained earlier, you can add more conditions to this statement like any other SELECT statement that we learnt till now. Below, we see an example where we display only the earlier statement with values for which age is more than or equal to 25 and then we order it by employee ID in descending format.
The output filters and sorts as below.
If you are starting to find this complicated, please try this on your system and try changing the clauses, play around using your creativity to modify these select statements and their outputs to get a better feel of how things work. Don’t worry with some practice, you will find these really easy.
Now, in practical real time project scenarios, there are often more than two tables involved in joins. For example, if we had a requirement to enhance the above join with the salary field from EMP_SALARY table, we would need to add another join condition below the first one but before the WHERE clause.
After the first join executes, the result executes as the left side of the next join. In this way you can add as many joins as required. All their WHERE conditions should come in at the end. This new draft would look like the below. Leave a comment in the comment section below if you feel you have questions on this topic.
The output now looks as the below.
You can add more formulas to this statement like the BONUS and FULL_NAME calculations we did earlier.
We have the new fields as shown below, You can change the position of these columns by simply moving it to the position you like in the SELECT statement.
DIFFERENCE BETWEEN UNION AND UNION ALL IN SAP HANA
Another important operation when in data modelling is UNION. When you have two or more separate lists of data having the same or similar fields, you might want to combine them both vertically into a unified single list. For example, let’s say you need to pick the first five entries of employee IDs from EMP_SALARY table where country code is in the list IN, CA or US. Also you need a similar list for country codes DE, FR or CH. When you have these lists, it is required to combine these five from each result set into one single set of 10. In such a case, we write the below code.
Result is as shown below. We have the first 5 records from each of the two result sets.
Now we change the code slightly and also include IN as a filter in the second select statement. This now means that IN records may occur in both result sets. Let’s see the execution results.
As seen below, five and five don’t always equal ten. There were some records that were common between the first five and the next five and their duplicates got removed/aggregated. That is how a UNION works.
Now what if you don’t want this aggregation to happen. What do you do?
The answer comes in the form of UNION ALL. It does the same work as UNION whilst not removing any duplicates/aggregation operation. Let’s now see the same example with UNION ALL. Try executing it.
In the result we see the two entries that have duplicates. Row 1 and 2 are the same as row 6 and 7. These were aggregated during the UNION operation earlier but remain in the output in case of UNION ALL.
Now you might be thinking as to why would someone preserve duplicates. What’s the point of doing so. The answer is that usually we wouldn’t want to have duplicates in result set and should be using UNION. But whenever you are sure that the two datasets involved in the UNION will have distinct datasets and have no duplicates, always do UNION ALL since it doesn’t waste time and system resources to remove duplicates and hence is faster in terms of performance.
I hope this was easy to understand and follow. Stay tuned for the next set of HANA SQL Tutorials.
Make sure to share this content on social media using the share buttons below to show your support to this website and to keep it alive.