SQL Server: Columnstore Indices: Performance

As you already know from the first section of this chapter, the use of columnstore indices can reduce I/O significantly, because only a small part of the data has to be transferred from disk into memory. (The Microsoft website states that the columnstore index can achieve query performance gains up to ten times greater than traditional index, as well as ten times the data compression rate over the uncompressed data size.)

In the following two subsections we first take a closer look at a performance comparison of the columnstore index and a corresponding traditional index. After that, a set-at-time execution, called Batch Mode, will be introduced. The aim of this mode is to execute a query that processes millions and billions of rows more effectively.

1. Columnstore Indices vs. Rowstore Indices

Generally, columnstore indices should have significant performance benefits in relation to traditional (B-tree) indices. Example 27.6 uses the AdventureworksDW2016_EXT sample database to compare the CPU and execution time of two identical queries.

NOTE The AdventureworksDW2016_EXT database is another Microsoft sample database. This database is similar to the AdventureworksDW database, with significant extensions in relation to a number of tables as well as the cardinality of existing ones. You can download this database from the following site: https://github.com/Microsoft/sql-server-samples/releases/tag/ adventureworks

Example 27.6

USE AdventureworksDW2016_EXT

GO

DBCC DROPCLEANBUFFERS

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT s.SalesTerritoryRegion, SUM(f.SalesAmount) ‘Total_Sales’,

COUNT(distinct f.Resellerkey) as ‘Resellers’

FROM FactResellerSalesXL_PageCompressed f

INNER JOIN DimDate d ON f.OrderDateKey= d.Datekey

INNER JOIN DimSalesTerritory s on s.SalesTerritoryKey=f.SalesTerritoryKey

INNER JOIN DimEmployee e on e.EmployeeKey=f.EmployeeKey

WHERE FullDateAlternateKey between ‘1/1/2015’ and ‘1/1/2017’

GROUP BY s.SalesTerritoryRegion

ORDER BY Total_Sales

SET STATISTICS IO OFF

SET STATISTICS TIME OFF

The result is

We use the AdventureworksDW_EXT sample database in Examples 27.6 and 27.7 because it contains two tables: FactResellerSalesXL_PageCompressed (Example 27.6) and FactResellerSalesXL_CCI (Example 27.7). The only difference is that the second table has the clustered columnstore index. Therefore, Example 27.6 uses row store to retrieve results, while Example 27.7 displays the same result set using a columnstore index.

Example 27.7

USE AdventureworksDW2016_EXT

DBCC DROPCLEANBUFFERS

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT s.SalesTerritoryRegion, SUM(f.SalesAmount) ‘Total_Sales’,

COUNT(distinct f.Resellerkey) as ‘Resellers’

FROM FactResellerSalesXL_CCI f

INNER JOIN dbo.DimDate d ON f.OrderDateKey= d.Datekey

INNER JOIN dbo.DimSalesTerritory s on s.SalesTerritoryKey=f.SalesTerritoryKey

INNER JOIN dbo.DimEmployee e on e.EmployeeKey=f.EmployeeKey

WHERE FullDateAlternateKey between ‘1/1/2015’ and ‘1/1/2017’

GROUP BY s.SalesTerritoryRegion

ORDER BY Total_Sales

SET STATISTICS IO OFF

SET STATISTICS TIME OFF

The result set is the same as for Example 27.6. The execution time is

SQL Server Execution Times:

CPU time = 250 ms, elapsed time = 1922 ms.

As you can see from the performance parameters of Examples 27.6 and 27.7, the execution time of the query using a columnstore index is eight times faster than the corresponding execution time of the same query using row store.

NOTE You cannot expect that each query will have such huge performance advantages if you use the column store. Generally, using a columnstore index benefits a wide group of queries, but typically the execution time is not several times faster.

2. Batch Mode on Columnstore

When you run a query in the Database Engine, the Query Processor plans how it should actually get the optimal result. This plan is expressed as a tree with operators, where the operators build the nodes of the tree. Each operator with its properties represents a certain processing action applied to a portion of data.

The query optimizer generally uses record-at-time execution, meaning that each query plan operator processes one row at a time. Prior to SQL Server 2012, a unit of data transferred through a tree was a row. With the introduction of columnstore indices, a set-at-time execution, called Batch Mode, was designed. The aim of this mode is to execute a query that processes millions and billions of rows more effectively.

Therefore, the columnstore index allows the system to choose between two modes: Row Mode, which is the conventional execution mode, and Batch Mode, the new execution mode for analytical queries with columnstore indices.

A batch is a structure of 64KB, allocated for a bunch of rows, that contains column vectors and qualifying rows vector. Depending on the number of columns, it may contain up to 1000 rows.

The benefits of Batch Mode are

  • Allows the use of algorithms that are optimized for multicore CPUs
  • Increased memory throughput
  • Significant reduction of database accesses

NOTE The downside of Batch Mode is that not all operators of the execution plan can be executed in that mode. In other words, if there is just one unsupported operator in a particular query plan, the whole query will be executed in row-mode processing.

The following examples show you how a query is executed in Batch Mode. Example 27.8 creates the tables that are necessary for the query.

Example 27.8

USE sample;

SELECT * INTO FactInternetSales

FROM AdventureWorksDW.dbo.FactInternetSales;

GO

INSERT INTO FactInternetSales

SELECT * FROM AdventureWorksDW.dbo.FactInternetSales;

GO 6

The SELECT statement in Example 27.8 creates the FactInternetSales table in the sample database and loads all rows from the FactInternetSales table of the AdventureWorksDW database into it. The subsequent INSERT statement loads the same load six more times. That way, we created a new table in the sample database with 422,786 rows.

Example 27.9 repeats the process of creating and loading two other tables from the AdventureWorksDW database.

Example 27.9

USE sample;

SELECT * INTO DimCustomer

FROM AdventureWorksDW.dbo.DimCustomer;

GO

SELECT * INTO DimDate

FROM AdventureWorksDW.dbo.DimDate;

After the execution of the queries in Example 27.9, the sample database contains two new tables, DimCustomer and DimDate, with the same structure and content as the tables with the same names in the AdventureWorksDW database.

Example 27.10 creates a nonclustered columnstore index.

Example 27.10

USE sample;

CREATE NONCLUSTERED COLUMNSTORE INDEX CLI_CS_IFactInternetSales

ON dbo.FactInternetSales(OrderDateKey, CustomerKey, SalesAmount)

The creation of a columnstore index is a condition for which the query optimizer can choose Batch Mode on Columnstore. (The creation of a columnstore index does not mean that the query optimizer will automatically choose to use Batch Mode for the execution of a query.)

As previously noted, Batch Mode processing uses algorithms that are optimized for multicore CPUs. This means the query optimizer will not choose Batch Mode if the number of processors is one, which is the default value. Therefore, Example 27.11 increases the number of processors to four, using the max degree of parallelism (MAXDOP) configuration option. This is an advanced configuration option that controls the number of processors that are used for the execution of a query in a parallel plan.

Example 27.11

EXEC sp_configure ‘show advanced options’, 1;

GO

RECONFIGURE WITH OVERRIDE;

GO

EXEC sp_configure ‘max degree of parallelism’, 4; GO

RECONFIGURE WITH OVERRIDE;

GO

Example 27.12 shows the query that is processed in Batch Mode using a columnstore index.

Example 27.12

USE sample;

SELECT c.CommuteDistance, d.CalendarYear,

SUM(f.SalesAmount) TotalSales

FROM dbo.FactInternetSales as f

INNER JOIN dbo.DimCustomer as c ON

f.CustomerKey = c.CustomerKey

INNER JOIN dbo.DimDate d ON

d.DateKey = f.OrderDateKey

GROUP BY c.CommuteDistance, d.CalendarYear;

Figure 27-1 shows the execution plan of the query in Example 27.12 in general and the properties of the Columnstore Index Scan operator in particular. Two properties are important in relation to Batch Mode on Columnstore: Estimated Execution Mode and Storage. The former has the value Batch and the latter the value ColumnStore. This means that the query is processed in Batch Mode using the columnstore index called CLI_CS_IFactInternetSales.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *