Business Intelligence and Transact-SQL: Extensions of GROUP BY

Transact-SQL extends the GROUP BY clause with the following operators and functions:

  • CUBE
  • ROLLUP
  • Grouping functions
  • Grouping sets

The following sections describe these operators and functions.

1. CUBE Operator

This section looks at the differences between grouping using the GROUP BY clause alone and grouping using GROUP BY in combination with the CUBE and ROLLUP operators. The main difference is that the GROUP BY clause defines one or more columns as a group such that all rows within any group have the same values for those columns. CUBE and ROLLUP provide additional summary rows for grouped data. These summary rows are also called multidimensional summaries.

The following two examples demonstrate these differences. Example 24.9 applies the GROUP BY clause to group the rows of the project_dept table using two criteria: dept_name and emp_cnt.

Example 24.9

Using GROUP BY, group the rows of the project_dept table that belong to the Accounting and Research departments using the dept_name and emp_cnt columns and additionally calculate the sum of the budgets:

USE sample;

SELECT dept_name, emp_cnt, SUM(budget) sum_of_budgets

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’)

GROUP BY dept_name, emp_cnt;

The result is

Example 24.10 and its result set shows the difference when you additionally use the CUBE operator.

Example 24.10

Group the rows of the project_dept table that belong to the Accounting and Research departments using the dept_name and emp_cnt columns and additionally display all possible summary rows:

USE sample;

SELECT dept_name, emp_cnt, SUM(budget) sum_of_budgets

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’)

GROUP BY CUBE (dept_name, emp_cnt);

The result is

The main difference between the previous two examples is that the result set of Example 24.9 displays only the values in relation to the grouping, while the result set of Example 24.10 contains, additionally, all possible summary rows. (Because the CUBE operator displays every possible combination of groups and summary rows, the number of rows is the same, regardless of the order of columns in the GROUP BY clause.) The placeholder for the values in the unneeded columns of summary rows is displayed as NULL. For example, the following row from the result set:

NULL                         NULL              305000

shows the grand total (that is, the sum of all budgets of all existing projects in the table), while the row:

NULL                         5                         125000

shows the sum of all budgets for all projects that employ exactly five employees.

NOTE The syntax of the CUBE operator in Example 24.10 corresponds to the standardized syntax of that operator. Because of its backward compatibility, the Database Engine also supports the old-style syntax:

USE sample;

SELECT dept_name, emp_cnt, SUM(budget) sum_of_budgets

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’)

GROUP BY dept_name, emp_cnt

WITH CUBE;

2. ROLLUP Operator

In contrast to CUBE, which returns every possible combination of groups and summary rows, the group hierarchy using ROLLUP is determined by the order in which the grouping columns are specified. Example 24.11 shows the use of the ROLLUP operator.

Example 24.11

Group the rows of the project_dept table that belong to the Accounting and Research departments using the dept_name and emp_cnt columns and additionally display summary rows for the dept_name column:

USE sample;

SELECT dept_name, emp_cnt, SUM(budget) sum_of_budgets

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’)

GROUP BY ROLLUP (dept_name, emp_cnt);

The result is

As you can see from the result of Example 24.11, the number of retrieved rows in this example is smaller than the number of displayed rows in the example with the CUBE operator. The reason is that the summary rows are displayed only for the first column in the GROUP BY ROLLUP clause.

NOTE The syntax used in Example 24.11 is the standardized syntax. The old-style syntax for ROLLUP is similar to the syntax for CUBE, which is shown in the second part of Example 24.10.

3. Grouping Functions

As you already know, NULL is used in combination with CUBE and ROLLUP to specify the placeholder for the values in the unneeded columns. In such a case, it isn’t possible to distinguish NULL in relation to CUBE and ROLLUP from the NULL value. Transact-SQL supports the following two standardized grouping functions that allow you to resolve the problem with the ambiguity of NULL:

  • GROUPING
  • GROUPING_ID

The following subsections describe in detail these two functions.

3.1. GROUPING Function

The GROUPING function returns 1 if the NULL in the result set is in relation to CUBE or ROLLUP, and 0 if it represents the group of NULL values.

Example 24.12 shows the use of the GROUPING function.

Example 24.12

Using the GROUPING function, clarify which NULL values in the result of the following SELECT statement display summary rows:

USE sample;

SELECT dept_name, emp_cnt, SUM(budget) sum_b, GROUPING(emp_cnt) gr

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Marketing’)

GROUP BY ROLLUP (dept_name, emp_cnt);

The result is

If you take a look at the grouping column (gr), you will see that some values are 0 and some are 1. The value 1 indicates that the corresponding NULL in the emp_cnt column specifies a summary value, while the value 0 (in the fifth row) indicates that NULL stands for itself (i.e., it is the NULL value).

3.2. GROUPING_ID Function

The GROUPING_ID function computes the level of grouping. GROUPING_ID can be used only in the SELECT list, HAVING clause, or ORDER BY clause when GROUP BY is specified. Example 24.13 shows the use of the GROUPING_ID function.

Example 24.13

USE sample;

SELECT dept_name, YEAR(date_month), SUM(budget),

GROUPING_ID (dept_name, YEAR(date_month)) AS gr_dept

FROM project_dept

GROUP BY ROLLUP (dept_name, YEAR(date_month));

The result is

The GROUPING_ID function is similar to the GROUPING function, but becomes very useful for determining the summarization of multiple columns, as is the case in Example 24.13. The function returns an integer that, when converted to binary, is a concatenation of the 1s and 0s representing the summarization of each column passed as the parameter of the function. For example, the value 3 of the gr_dept column in the last row of the result means that summarization is done over both the dept_name and date_month columns. The binary value (11)2 is equivalent to the value 3 in the decimal system.

4. Grouping Sets

Grouping sets are an extension to the GROUP BY clause that lets users define several groups in the same query. You use the GROUPING SETS operator to implement grouping sets. Example 24.14 shows the use of this operator.

Example 24.14

Calculate the sum of budgets for the Accounting and Research departments using the combination of values of the dept_name and emp_cnt columns first, and after that using the values of the single column dept_name:

USE sample;

SELECT dept_name, emp_cnt, SUM(budget) sum_budgets

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’)

GROUP BY GROUPING SETS ((dept_name, emp_cnt),(dept_name));

The result is

As you can see from the result set of Example 24.14, the query uses two different groupings to calculate the sum of budgets: first using the combination of values of the dept_name and emp_cnt columns, and second using the values of the single column dept_name. The first three rows of the result set display the sum of budgets for three different groupings of the first two columns (Accounting, 5; Accounting, 6; and Accounting, 10). The fourth row displays the sum of budgets for all Accounting departments. The last three rows display the similar results for the Research department.

You can use the series of grouping sets to replace the ROLLUP and CUBE operators. For instance, the following series of grouping sets:

GROUP BY GROUPING SETS ((dept_name, emp_cnt), (dept_name), ())

is equivalent, except for several duplicated rows, to the following ROLLUP clause:

GROUP BY ROLLUP (dept_name, emp_cnt)

Also,

GROUP BY GROUPING SETS ((dept_name, emp_cnt), (emp_cnt, dept_name), (dept_name), (emp_cnt), ())

is equivalent to the following CUBE clause:

GROUP BY CUBE (dept_name, emp_cnt)

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *