Systems Engineering and RDBMS

GROUP BY and CUBE(), ROLLUP(), GROUPING() and GROUPING_ID() functions

Posted by decipherinfosys on November 2, 2007

You must already be aware of the GROUP BY clause. It is used for grouping the rows by a selected set of attributes and is typically done to get some aggregated information for that group of columns.

/*********************************************************************************************************
Creation of a TEST table and population of some dummy data
**********************************************************************************************************/
CREATE TABLE TEST_TBL (COL1 INT, COL2 VARCHAR(10), COL3 INT)
GO
INSERT INTO TEST_TBL VALUES (1, ‘A’, 10)
INSERT INTO TEST_TBL VALUES (1, ‘A’, 20)
INSERT INTO TEST_TBL VALUES (1, ‘A’, 30)
INSERT INTO TEST_TBL VALUES (1, ‘B’, 90)
INSERT INTO TEST_TBL VALUES (2, ‘A’, 30)
INSERT INTO TEST_TBL VALUES (2, ‘A’, 100)
INSERT INTO TEST_TBL VALUES (3, ‘C’, 110)
INSERT INTO TEST_TBL VALUES (3, ‘C’, 120)
GO

/*********************************************************************************************************
Before starting, let’s take a look at the data that exists in the table
**********************************************************************************************************/

SELECT * FROM TEST_TBL
GO

COL1        COL2       COL3
----------- ---------- -----------
1           A          10
1           A          20
1           A          30
1           B          90
2           A          30
2           A          100
3           C          110
3           C          120

/*********************************************************************************************************
Simple Example of a GROUP BY Clause. The grouping is being done over COL1 and COL2
and the aggregate functions are used to display the total, the average, the max and min values, etc.
**********************************************************************************************************/

SELECT COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL, AVG(COL3) AVG_VAL, MAX(COL3) MAX_VAL, MIN(COL3) MIN_VAL
FROM TEST_TBL
GROUP BY COL1, COL2
GO

COL1        COL2       CNT_RECORD  TOTAL_VAL   AVG_VAL     MAX_VAL     MIN_VAL
----------- ---------- ----------- ----------- ----------- ----------- -----------
1           A          3           60          20          30          10
2           A          2           130         65          100         30
1           B          1           90          90          90          90
3           C          2           230         115         120         110

/*********************************************************************************************************
Now, let us take the same SQL and use the RollUP() function in addition to the GROUP BY clause:
**********************************************************************************************************/
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY ROLLUP (COL1, COL2)
GO

Usage of the ROLLUP() function generates the GROUP BY aggregate rows PLUS super-aggregate (cumulative) rows and a final grand total row as well.  If you see below, one row with a sub-total is generated for each unique combination of values of (COL1, COL2), and (COL1). As should be clear from the preceding statement, the order of the columns in the ROLLUP() function can change the output as well as the number of rows in the final result set.

COL1        COL2       TOTAL_VAL
----------- ---------- -----------
1           A          60        ==> Same as in the representation above
1           B          90        ==> Same as in the representation above
1           NULL       150       ==> Aggregation of the records from above
2           A          130       ==> Same as in the representation above
2           NULL       130       ==> Aggregation of the record from above
3           C          230       ==> Same as in the representation above
3           NULL       230       ==> Aggregation of the record from above
NULL        NULL       510       ==> The grand total

The above SQL can also be written as:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY COL1, COL2 WITH ROLLUP

/*********************************************************************************************************
Now, let us take the same SQL and use the CUBE() function in addition to the GROUP BY clause:
**********************************************************************************************************/
SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY CUBE (COL1, COL2)
GO

A CUBE() as the name suggests generates data for the grouping of all permutations of expressions i.e. since we have 2 columns in our example that we are using the CUBE() function on, we have 2^2 which means 4 grouping sets:

COL1, COL2
COL2, COL1
COL1
COL2

Thus, one row will be produced for each unique grouping set from above and in addition, a sub-total row is generated for each row and an aggregated grand total row is produced with NULL values in all the other columns. You can see the output below:

COL1        COL2       TOTAL_VAL
----------- ---------- -----------
1           A          60
2           A          130
NULL        A          190
1           B          90
NULL        B          90
3           C          230
NULL        C          230
NULL        NULL       510
1           NULL       150
2           NULL       130
3           NULL       230

And as is obvious from the output, the order of the columns has no bearing on the output in the case of the CUBE() function.

The above SQL can also be written as:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL
FROM TEST_TBL
GROUP BY COL1, COL2 WITH CUBE

In order to distinguish the NULL values that are returned because of the usage of the ROLLUP() and/or CUBE() functions versus actual NULL values in the tables, one can make use of the GROUPING function. Example:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL2) AS GRP_VALUE
FROM TEST_TBL
GROUP BY ROLLUP (COL1, COL2)
GO

COL1        COL2       TOTAL_VAL   GRP_VALUE
----------- ---------- ----------- ---------
1           A          60          0
1           B          90          0
1           NULL       150         1
2           A          130         0
2           NULL       130         1
3           C          230         0
3           NULL       230         1
NULL        NULL       510         1

As seen from above, whereever the GRP_VALUE is marked as 1, those are the records that were generated because of the ROLLUP() function usage.  Since the GROUPING() function takes in only one argument, we made usage of the outermost column in order to filter out all those records where the NULL value was being generated because of the usage of the function. In the case of the CUBE() function, you can use the GROUPING() function multiple times to filter out those records.

Another function to be aware of is the GROUPING_ID() function. This function can be used to compute the level of grouping. So, if there are two columns like we have in our example in this post, the GROUPING_ID() will be computed as:

Column(s)        GROUPING_ID(COL1, COL2) = GROUPING(COL1) + GROUPING(COL2)        GROUPING_ID() Output
COL1             10                                                                2
COL2             01                                                                1
COL1, COL2       11                                                                3
COL2, COL1       11                                                                3

Now, let us see this with the actual SQL execution:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL1) AS C1, GROUPING(COL2) AS C2, GROUPING_ID(COL1, COL2) AS GRP_ID_VALUE
FROM TEST_TBL
GROUP BY ROLLUP (COL1, COL2)
GO

COL1        COL2       TOTAL_VAL   C1   C2   GRP_ID_VALUE
----------- ---------- ----------- ---- ---- ------------
1           A          60          0    0    0
1           B          90          0    0    0
1           NULL       150         0    1    1
2           A          130         0    0    0
2           NULL       130         0    1    1
3           C          230         0    0    0
3           NULL       230         0    1    1
NULL        NULL       510         1    1    3

In the next post, we will talk about the new feature of GROUPING SETS that has been introduced in SQL Server 2008. Using the GROUPING SETS, one can aggregate only the specified groups instead of the full set of aggregations as generated by the CUBE() or ROLLUP() functions.

2 Responses to “GROUP BY and CUBE(), ROLLUP(), GROUPING() and GROUPING_ID() functions”

  1. […] GROUP BY and CUBE(), ROLLUP(), GROUPING() and GROUPING_ID() functions […]

  2. […] If you recall from a previous posts in which we had covered some other T-SQL features of SQL 2008 (here and here), this table has three columns: COL1, COL2 and COL3 and this set of […]

Sorry, the comment form is closed at this time.

 
%d bloggers like this: