The Database Engine supports two types of columnstore indices: clustered and nonclustered. The following subsections explain them.
1. Clustered Columnstore Index
This section describes how you can create a clustered columnstore index, load data into it, and modify already loaded data.
1.1. Creation of Clustered Columnstore Index
To create a clustered columnstore index, first create a row store table as a heap or clustered index, and then use the CREATE CLUSTERED COLUMNSTORE INDEX statement to convert the table to a clustered columnstore index.
Example 27.1 shows the creation of a clustered columnstore index. (The syntax of CREATE CLUSTERED COLUMNSTORE INDEX does not include the list of indexed columns, while each such index contains all columns of the corresponding table.)
Example 27.1
USE sample;
SELECT * INTO FactInternetSales
FROM AdventureWorksDW.dbo.FactInternetSales;
GO
CREATE CLUSTERED COLUMNSTORE INDEX
cl_factinternetsales ON FactInternetSales
WITH ( DATA_COMPRESSION = COLUMNSTORE);
Example 27.1 uses the FactInternetSales table from the AdventureWorksDW database and copies its structure and data to the table with the same name in the sample database by using the SELECT … INTO statement. After that, Example 27.1 creates the clustered columnstore index called cl_factinternetsales. As you can see from Example 27.1, the creation of a clustered columnstore index has the same syntax as the traditional CREATE INDEX statement except that it includes the additional CLUSTERED COLUMNSTORE clause.
The DATA_COMPRESSION clause specifies the data compression option for the specified table, partition number, or range of partitions. The options are COLUMNSTORE (the default value) and COLUMN ST ORE_ARCHIVE.
NOTE An important property of a clustered columnstore index is that it can have one or more nonclustered row store indices.
1.2. Clustered Columnstore Index and Data Modification
Column storage significantly improves read operations, but data in that storage is very expensive to update directly, because it is compressed. For this reason, the Database Engine does not update data in compressed row groups during data modification operations. The modification operations are handled by two components in relation to clustered columnstore indices: delta stores and delete bitmaps.
NOTE As you will see in the next sections, both delta stores and delete bitmaps are row store constructs. Therefore, all update operations on them are row store operations, which are significantly faster than the same operations on highly compressed column structures.
Delta Stores If you execute DML statements, new and updated rows are inserted into a delta store, which is a traditional B-tree structure. Delta stores are included in any scan of the clustered columnstore index.
A delta store contains the same columns as the corresponding clustered columnstore index. The B-tree key is a unique integer row ID generated by the system. (A clustered columnstore index does not have unique keys.) A clustered columnstore index can have zero, one, or more delta stores. New delta stores are created automatically as needed to accept inserted rows. A delta store is either open or closed. An open delta store can accept rows to be inserted. A delta store is closed when the number of rows it contains reaches a predefined limit. Every delta store has a state column value of either 1 (open) or 2 (closed).
The Database Engine automatically checks in the background for closed delta stores and converts them into segments. This background process is called Tuple Mover and by default is executed every 5 minutes. Tuple Mover is designed not to block any read operations on data. By contrast, concurrent delete and update operations have to wait until the compression process completes.
NOTE The implementation of delta stores for disk-based tables is done as a set of internal B-tree tables.
The large bulk insert operations are handled differently by the system. During a large bulk load, the rows are stored directly in the clustered columnstore index without passing through a delta store. The operation itself is very efficient, since it stores all inserted rows in memory and applies compression in-memory. After the process is terminated, it stores that data on the disk.
In case of DELETE, an additional structure called a delete bitmap is used, which will be explained next.
Delete Bitmap Each clustered columnstore index has an associated delete bitmap that is consulted during scans to disqualify rows that have been deleted. In other words, a delete bitmap is a storage component that contains information about the deleted rows inside segments. A delete bitmap has two different representations, depending on whether it is in memory or on disk. In memory, it is a bitmap, but on disk, it is stored in the following way: a record of a deleted row is inserted into the B-tree of the corresponding delete bitmap. (This is true only if a row is in a columnstore segment. If the row to be deleted is in a delta store, the row is simply deleted.)
NOTE The implementation of delete bitmaps for disk-based tables is done as a set of internal B-tree tables.
Having discussed delta stores and delete bitmaps, we now can examine exactly what happens when an INSERT, UPDATE, or DELETE statement is executed: First, inserted data is simply added to one of the currently open delta stores. Second, if the deleted row is found inside of a segment, then the deleted bitmap information is updated with the row ID of the respective row. On the other hand, if the deleted row is actually inside of a delta store, then the direct process of removal is executed on the corresponding B-tree structure. Finally, data updates are basically represented as deletes and inserts. In other words, an update operation triggers the insertion of the old version of the row into the delete bitmap and insertion of a new version into the delta store.
2. Nonclustered Columnstore Index
Generally, a nonclustered columnstore index is stored in the same way as a clustered one. Only one columnstore index can be created in this situation. In other words, if a clustered (or nonclustered) columnstore index exists, it must be dropped before the new index can be created. The columnstore index requires extra storage since it contains a copy of the data in the regular (“row store”) table.
2.1. Creation of a Nonclustered Columnstore Index
Generally, the CREATE NONCLUSTERED COLUMNSTORE INDEX statement is used to create a nonclustered columnstore index on a table. The underlying table can be a row store table with or without a clustered index. In all cases, creating a nonclustered columnstore index on a table stores a second copy of the data for the columns in the index.
NOTE SQL Server does not allow you to create clustered and nonclustered columnstore indices at the same time. You have to drop the clustered index in Example 27.1 if you want to create cs_index1 in Example 27.2.
Example 27.2
USE sample;
GO
DROP INDEX
cl_factinternetsales ON factinternetsales;
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX cs_index1
ON FactInternetSales (OrderDateKey, ShipDateKey, UnitPrice);
The CREATE INDEX statement in Example 27.2 creates a columnstore index for three columns of the FactInternetSales table: OrderDateKey, ShipDateKey, and UnitPrice. This means that all values of each of the three columns will be grouped and stored separately.
2.2. Filtered Index
The nonclustered columnstore index definition supports the use of a filter. A filtered index is an index that is specified for a condition in the WHERE clause of the SELECT statement. (You can define a filtered index both for regular indices and for columnstore indices.) Example 27.3 shows creation of such an index. (Again, only one index can be specified for a columnstore. For this reason, drop the index from Example 27.2 before creating the index in Example 27.3.)
Example 27.3
USE sample;
CREATE NONCLUSTERED COLUMNSTORE INDEX i1 ON
FactInternetSales (UnitPrice)
WHERE OrderDateKey IS NULL;
Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.