Business Intelligence and Transact-SQL: Window Construct

A window (in relation to SQL/OLAP) defines a partitioned set of rows to which a function is applied. The number of rows that belong to a window is dynamically determined in relation to the user’s specifications. The window construct is specified using the OVER clause.

The standardized window construct has three main parts:

  • Partitioning
  • Ordering and framing
  • Aggregation grouping

NOTE The Database Engine doesn’t support aggregation grouping yet, so this feature is not discussed here.

Before you delve into the window construct and its parts, take a look at the table that will be used for the examples in this chapter. Example 24.1 creates the project_dept table, shown in Table 24-1, which is used in this chapter to demonstrate Transact-SQL extensions concerning SQL/OLAP.

Example 24.1

USE sample;

CREATE TABLE project_dept

( dept_name CHAR( 20 ) NOT NULL,

emp_cnt INT,

budget FLOAT,

date_month DATE );

The project_dept table contains several departments and their employee counts as well as budgets of projects that are controlled by each department. Example 24.2 shows the INSERT statements that are used to insert the rows shown in Table 24-1.

Example 24.2

USE sample;

INSERT INTO project_dept VALUES

(‘Research’, 5, 50000, ‘01.01.2017’);

INSERT INTO project_dept VALUES

(‘Research’, 10, 70000, ‘02.01.2017’);

INSERT INTO project_dept VALUES

(‘Research’, 5, 65000, ‘07.01.2017’);

INSERT INTO project_dept VALUES

(‘Accounting’, 5, 10000, ‘07.01.2017’);

INSERT INTO project_dept VALUES

(‘Accounting’, 10, 40000,’02.01.2017′);

INSERT INTO project_dept VALUES

(‘Accounting’, 6, 30000, ‘01.01.2017’);

INSERT INTO project_dept VALUES

(‘Accounting’, 6, 40000,’02.01.2018′);

INSERT INTO project_dept VALUES

(‘Marketing’, 6, 100000, ‘01.01.2018’);

INSERT INTO project_dept VALUES

(‘Marketing’, 10, 180000,’02.01.2018′);

INSERT INTO project_dept VALUES

(‘Marketing’, 3, 100000,’07.01.2018′);

INSERT INTO project_dept VALUES

(‘Marketing’, NULL, 120000,’01.01.2018′);

1. Partitioning

Partitioning allows you to divide the result set of a query into groups, so that each row from a partition will be displayed separately. If no partitioning is specified, the entire set of rows comprises a single partition. Although the partitioning looks like a grouping using the GROUP BY clause, it is not the same thing. The GROUP BY clause collapses the rows in a partition into a single row, whereas the partitioning within the window construct simply organizes the rows into groups without collapsing them.

The following two examples show the difference between partitioning using the window construct and grouping using the GROUP BY clause. Suppose that you want to calculate several different aggregates concerning employees in each department. Example 24.3 shows how the OVER clause with the PARTITION BY clause can be used to build partitions.

Example 24.3

Using the window construct, build partitions according to the values in the dept_name column and calculate the sum and the average for the Accounting and Research departments:

USE sample;

SELECT dept_name, budget,

SUM( emp_cnt ) OVER( PARTITION BY dept_name ) AS emp_cnt_sum,

AVG( budget ) OVER( PARTITION BY dept_name ) AS budget_avg

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’);

The result is

Example 24.3 uses the OVER clause to define the corresponding window construct. Inside it, the PARTITION BY option is used to specify partitions. (Both partitions in Example 24.3 are grouped using the values in the dept_name column.) Finally, an aggregate function is applied to the partitions. (Example 24.3 calculates two aggregates, the sum of the values in the emp_cnt column and the average value of budgets.) Again, as you can see from the result of the example, the partitioning organizes the rows into groups without collapsing them.

Example 24.4 shows a similar query that uses the GROUP BY clause.

Example 24.4

Group the values in the dept_name column for the Accounting and Research departments and calculate the sum and the average for these two groups:

USE sample;

SELECT dept_name, SUM(emp_cnt) AS cnt, AVG( budget ) AS budget_avg

FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’)

GROUP BY dept_name;

The result is

As already stated, when you use the GROUP BY clause, each group collapses down to one row.

NOTE There is another significant difference between the OVER clause and the GROUP BY clause. As can be seen from Example 24.3, when you use the OVER clause, the corresponding SELECT list can contain any column name from the table. This is obvious, because partitioning organizes the rows into groups without collapsing them. (If you add the budget column in the SELECT list of Example 24.4, you will get an error.)

2. Ordering and Framing

The ordering within the window construct is like the ordering in a query. First, you use the ORDER BY clause to specify the particular order of the rows in the result set. Second, it includes a list of sort keys and indicates whether they should be sorted in ascending or descending order. The most important difference is that ordering inside a window is applied only within each partition.

The Database Engine also supports ordering inside a window construct for aggregate functions. In other words, the OVER clause for aggregate functions can contain the ORDER BY clause, too. Example 24.5 shows this.

Example 24.5

Using the window construct, partition the rows of the project_dept table using the values in the dept_name column, sort the rows in each partition using the values in the budget column, and additionally display the sum of budgets:

USE sample;

SELECT dept_name, budget, emp_cnt,

SUM(budget) OVER(PARTITION BY dept_name ORDER BY budget) AS sum_dept

FROM project_dept;

The query in Example 24.5, which is generally called “cumulative aggregations,” or in this case “cumulative sums,” uses the ORDER BY clause to specify ordering within the particular partition. This functionality can be extended using framing. Framing means that the result can be further narrowed using two boundary points that restrict the set of rows to a subset, as shown in Example 24.6.

Example 24.6

USE sample;

SELECT dept_name, budget, emp_cnt,

SUM(budget) OVER(PARTITION BY dept_name ORDER BY budget

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

AS sum_dept

FROM project_dept;

Example 24.6 uses two clauses, UNBOUNDED PRECEDING and CURRENT ROW, to specify the boundary points of the selected rows. For the query in the example, this means that based on the order of the budget values, the displayed subset of rows includes those prior to the current row that have no low boundary point. (The result set contains 11 rows.)

The frame bounds used in Example 24.6 are not the only ones you can use. The UNBOUNDED FOLLOWING clause means that the specified frame does not have an upper boundary point. Also, both boundary points can be specified using an offset from the current row. In other words, you can use the n PRECEDING or n FOLLOWING clauses to specify n rows before or n rows after the current one, respectively. Therefore, the following frame specifies all together three rows—the current row, the previous one, and the next one:

ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING

The Database Engine also supports two functions related to framing: LEAD and LAG. LEAD has the ability to compute an expression on the next rows (rows that are going to come after the current row). In other words, the LEAD function returns the next nth row value in an order. The function has three parameters: The first one specifies the name of the column to compute the leading row, the second one is the index of the leading row relative to the current row, and the last one is the value to return if the offset points to a row outside of the partition range. (The semantics of the LAG function is similar: it returns the previous nth row value in an order.)

Example 24.6 uses the ROWS clause to limit the rows within a partition by physically specifying the number of rows preceding or following the current row. Alternatively, you can use the RANGE clause, which logically limits the rows within a partition. In other words, when you use the ROWS clause, the exact number of rows will be specified based on the defined frame. On the other hand, the RANGE clause does not define the exact number of rows, because the specified frame can contain duplicates, too.

You can use several columns from a table to build different partitioning schemas in a query, as shown in Example 24.7.

Example 24.7

Using the window construct, build two partitions for the Accounting and Research departments: one using the values of the budget column and the other using the values of the dept_name column. Calculate the sums for the former partition and the averages for the latter partition.

USE sample;

SELECT dept_name, CAST( budget AS INT ) AS budget,

SUM( emp_cnt ) OVER( PARTITION BY budget ) AS emp_cnt_sum,

AVG( budget ) OVER( PARTITION BY dept_name ) AS budget_avg FROM project_dept

WHERE dept_name IN (‘Accounting’, ‘Research’);

The result is

The query in Example 24.7 has two different partitioning schemas: one over the values of the budget column and one over the values of the dept_name column. The former is used to calculate the number of employees in relation to the departments with the same budget. The latter is used to calculate the average value of budgets of departments grouped by their names.

Example 24.8 shows how you can use the NEXT VALUE FOR expression of the CREATE SEQUENCE statement to control the order in which the values are generated using the OVER clause. (For the description of the CREATE SEQUENCE statement, see Chapter 6.)

Example 24.8

USE sample;

CREATE SEQUENCE Seq START WITH 1 INCREMENT BY 1;

GO

CREATE TABLE T1 (col1 CHAR(10), col2 CHAR(10));

GO

INSERT INTO dbo.T1(col1, col2)

SELECT NEXT VALUE FOR Seq OVER(ORDER BY dept_name ASC), budget

FROM (SELECT dept_name, budget

FROM project_dept

ORDER BY budget, dept_name DESC

OFFSET 0 ROWS FETCH FIRST 5 ROWS ONLY) AS D;

The content of the T1 table is as follows:

The first two statements create the Seq sequence and the auxiliary table T1. The following INSERT statement uses a subquery to filter the five departments with the highest budget, and generates sequence values for them. This is done using OFFSET/FETCH, which is described in Chapter 6. (You can find a few other examples using OFFSET/FETCH in the subsection with the same name later in this chapter.)

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *