Learning Objective
The objective of this SQL Server tutorial is to teach you how to use CUBE to aggregate different groups and the complete details about them.
What is CUBE in SQL Server?
CUBE is an extension of GROUPING SETS. It is almost the same as ROLLUP with one subtle difference. The difference is that it does an exhaustive grouping of the specified columns using all possible permutations.
For e.g. if we specify 3 table columns or fields (X, Y, Z) in rollup it will group as below. The total number of groups created and reported will be n+1 where n is the number of columns specified in the ROLLUP clause.
(X, Y, Z), (X, Y), (X), ()
But CUBE will group as below. The total number of groups created and reported in the resultset will be 2n where n is the number of columns specified in the CUBE clause.
(X, Y, Z), (X, Z), (X, Y), (X), (Y), ()
So, we can see it produces a larger resultset containing more rows of information.
SQL Server CUBE Syntax
The basic syntax of the CUBE is given below.
SELECT column1, column2, aggregate_function (column 3) FROM table GROUP BY CUBE (column1, column2);
In this syntax,
- column – a column from the table in the SELECT list.
- aggregate_function (column) – column on which aggregate function (i.e. SUM, COUNT, AVG etc.) is used.
- GROUP BY – SQL keyword combination to specify a column or multiple columns to create a group on which the to apply the aggregate.
- CUBE – SQL keyword combination which creates an exhaustive grouping of the specified columns and aggregates them.
SQL Server CUBE with Example
Let us try to understand when and how to use the CUBE operator in a SQL query. Suppose we have an employee table containing the information about employees in the company. The table is represented below. We will use this sample table for reference in our examples.
id | name | gender | salary | department |
1 | David Jackson | Male | 5000 | IT |
2 | Jim Jameson | Female | 6000 | HR |
3 | Kate Johnson | Female | 7500 | IT |
4 | Will Ray | Male | 6500 | Marketing |
5 | Shane Mathews | Female | 5500 | Finance |
6 | Shed Price | Male | 8000 | Marketing |
7 | Viktor Smith | Male | 7200 | HR |
8 | Vincent Smithson | Female | 6600 | IT |
9 | Janice Streep | Female | 5400 | Marketing |
10 | Laura Wells | Female | 6300 | Finance |
11 | Mac Bull | Male | 5700 | Marketing |
12 | Patrick Patterson | Male | 7000 | HR |
13 | Julie Orbison | Female | 7100 | IT |
14 | Elice Hemingway | Female | 6800 | Marketing |
15 | Wayne Johnson | Male | 5000 | Finance |
Let us start by doing a ROLLUP on department and gender for employee salary. The following query does the same. The sum of employee salaries is aliased as ‘employee cost’.
SELECT department, gender, sum(salary) AS 'employee cost' FROM employee GROUP BY ROLLUP (department, gender);
The resultset of the above query is the following. It has 13 rows of information or records. They include the output of the following set of groups – (department, gender) with 8 records, (department) with 4 records and () 1 record which is the total cost of company of all employees of all genders and all departments i.e. 2+1=3 as mentioned in introduction above.
department | gender | employee cost |
Finance | Female | 11800 |
Finance | Male | 5000 |
Finance | NULL | 16800 |
HR | Female | 6000 |
HR | Male | 14200 |
HR | NULL | 20200 |
IT | Female | 21200 |
IT | Male | 5000 |
IT | NULL | 26200 |
Marketing | Female | 12200 |
Marketing | Male | 20200 |
Marketing | NULL | 32400 |
NULL | NULL | 95600 |
Now we will apply the CUBE operator on the same query and see the difference. The following query does the same.
SELECT department, gender, sum(salary) AS 'employee cost' FROM employee GROUP BY CUBE (department, gender);
The resultset of the above query is the following. It has 15 records i.e. 2 more than the output from ROLLUP. The additional records are highlighted in pink and it is the output of the (gender) group which was missing in the ROLLUP resultset i.e. the total cost to company of the male employees and female employees separately. The total number of groups created and reported is 2n as mentioned above in the introduction i.e. (department, gender), (gender), (department) and ().
department | gender | employee cost |
Finance | Female | 11800 |
HR | Female | 6000 |
IT | Female | 21200 |
Marketing | Female | 12200 |
NULL | Female | 51200 |
Finance | Male | 5000 |
HR | Male | 14200 |
IT | Male | 5000 |
Marketing | Male | 20200 |
NULL | Male | 44400 |
NULL | NULL | 95600 |
Finance | NULL | 16800 |
HR | NULL | 20200 |
IT | NULL | 26200 |
Marketing | NULL | 32400 |
The output can be formatted with the help of coalesce to address gap in values i.e. NULL values. The following query does the same making the resultset more readable and comprehensible.
SELECT coalesce (department, 'All Department') AS department, coalesce (gender, 'Both Genders') AS gender, sum(salary) AS 'employee cost' FROM employee GROUP BY CUBE (department, gender);
The formatted output is the following. The NULL values have been replaced with meaningful values.
department | gender | employee cost |
Finance | Female | 11800 |
HR | Female | 6000 |
IT | Female | 21200 |
Marketing | Female | 12200 |
All Department | Female | 51200 |
Finance | Male | 5000 |
HR | Male | 14200 |
IT | Male | 5000 |
Marketing | Male | 20200 |
All Department | Male | 44400 |
All Department | Both Genders | 95600 |
Finance | Both Genders | 16800 |
HR | Both Genders | 20200 |
IT | Both Genders | 26200 |
Marketing | Both Genders | 32400 |