SQL Server: Disk Storage

The storage architecture of the Database Engine contains several units for storing database objects:

  • Page
  • Extent
  • File
  • Filegroup

NOTE Files and filegroups will not be discussed in this chapter. They are described in Chapter 5.

The main unit of data storage is the page. The size of a page is always 8KB. Each page has a 96-byte header used to store the system information. Data rows are placed on the page immediately after the header.

The Database Engine supports different page types. The most important are

  • Data pages
  • Index pages

NOTE Data and index pages are actually physical parts of a database where the corresponding tables and indices are stored. The content of a database is stored in one or more files, and each file is divided into page units. Therefore, each table or index page (as a database physical unit) can be uniquely identified using a database ID, database file ID, and a page number.

When you create a table or index, the system allocates a fixed amount of space to contain the data belonging to the table or index. When the space fills, the space for additional storage must be allocated. The physical unit of storage in which space is allocated to a table (index) is called an extent. An extent comprises eight contiguous pages, or 64KB. There are two types of extents:

  • Uniform extents Owned by a single table or index
  • Mixed extents Shared by up to eight tables or indices

The system always allocates pages from mixed extents first. After that, if the size of the table (index) is greater than eight pages, it switches to uniform extents.

1. Properties of Data Pages

All types of data pages have a fixed size (8KB) and consist of the following three parts:

  • Page header
  • Space reserved for data
  • Row offset table

NOTE This chapter does not include a separate discussion of the properties of index pages because index pages are almost identical to data pages.

The following sections describe these parts.

1.1. Page Header

Each page has a 96-byte page header used to store the system information, such as page ID, the ID of the database object to which the page belongs, and the previous page and next page in a page chain. As you may have already guessed, the page header is stored at the beginning of each page. Table 15-1 shows the information stored in the page header.

1.2. Space Reserved for Data

The part of the page reserved for data has a variable length that depends on the number and length of rows stored on the page. For each row stored on the page, there is an entry in the space reserved for data and an entry in the row offset table at the end of the page. [A data row cannot span two or more pages, except for values of VARCHAR(MAX) and VARBINARY(MAX) data that are stored in their own specific pages.] Each row is stored subsequently after already-stored rows, until the page is filled. If there is not enough space for a new row of the same table, it is stored on the next page in the chain of pages.

For all tables that have only fixed-length columns, the same number of rows is stored at each page. If a table has at least one variable-length column (a VARCHAR column, for instance), the number of rows per page may differ and the system then stores as many rows per page as will fit on it.

1.3. Row Offset Table

The last part of a page is tightly connected to a space reserved for data, because each row stored on a page has a corresponding entry in the row offset table (see Figure 15-1). The row offset table contains 2-byte entries consisting of the row number and the offset byte address of the row on the page. (The entries in the row offset table are in reverse order from the sequence of the rows on the page.) Suppose that each row of a table is fixed-length, 36 bytes in length.

The first table row is stored at byte offset 96 of a page (because of the page header). The corresponding entry in the row offset table is written in the last 2 bytes of a page, indicating the row number (in the first byte) and the row offset (in the second byte). The next row is stored subsequently in the next 36 bytes of the page. Therefore, the corresponding entry in the row offset table is stored in the third- and fourth-to-last bytes of the page, indicating again the row number (1) and the row offset (132).

2. Types of Data Pages

Data pages are used to store data of a table. There are two types of data pages, each of which is used to store data in a different format:

  • In-row data pages
  • Row-overflow data pages

2.1. In-Row Data Pages

There is nothing special to say about in-row data pages: they are pages in which it is convenient to store data and index information. All data that doesn’t belong to large objects is always stored in-row. Also, VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), and XML values can be stored in-row, if the large value types out of row option of the sp_tableoption system procedure is set to 0. In this case, all such values are stored directly in the data row, up to a limit of 8000 bytes and as long as the value can fit in the record. If the value does not fit in the record, a pointer is stored in-row and the rest is stored out of row in the storage space for large objects.

2.2. Row-Overflow Data Pages

Values of the VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) columns can be stored outside of the actual data page. As you already know, 8KB is the maximum size of a row on a data page, but you can exceed this size limit if you use columns of such large data types. In this case, the system stores the values of these columns in extra pages, which are called row-overflow pages.

The storage in row-overflow pages is done only under certain circumstances. The primary factor is the length of the row: if the row needs more than 8060 bytes, some of the column’s values will be stored on overflow pages. (A value of a column cannot be split between the actual data page and a row-overflow page.)

As an example of how content of a table with large values is stored, Example 15.3 creates such a table and inserts a row into it.

Example 15.3

USE sample;

CREATE TABLE mytable

(col1 VARCHAR(1000),

col2 VARCHAR(3000),

col3 VARCHAR(3000),

col4 VARCHAR(3000));

INSERT INTO mytable

SELECT REPLICATE(‘a’, 1000), REPLICATE(‘b’, 3000),

REPLICATE(‘c’, 3000), REPLICATE(‘d’, 3000);

The CREATE TABLE statement in Example 15.3 creates the mytable table. The subsequent INSERT statement inserts a new row in the table. The length of the inserted row is 10,000 bytes. For this reason, the row doesn’t fit in a page.

The query in Example 15.4 uses several catalog views to display information concerning page type description.

Example 15.4

USE sample;

SELECT p.rows, a.type_desc AS page_type, a.total_pages AS pages

FROM sys.partitions p JOIN sys.allocation_units a ON

p.partition_id = a.container_id

WHERE p.object_id = object_id(‘mytable’);

The result is

As you can see from Example 15.4, the two columns in the SELECT list (type_desc and total_pages) are from the sys.allocation_units catalog view. (This view contains a row for each allocation unit in the database.) Different allocation units can be displayed using the type_desc column of the sys.allocation_units catalog view. (A set of pages of one particular data page type is called an allocation unit.)

The only reason use of the sys.partition view is required is to specify mytable as the corresponding table using the object_id column of the view. The sys.partition view contains one row for each partition of each table or index. (Nonpartitioned tables, such as mytable, have only one partition unit.)

The result of Example 15.4 shows that for the single row of the mytable table, nine data pages plus nine row-overflow pages are allocated by the system.

NOTE The performance of a system can significantly degrade if your queries access many row-overflow data pages.

3. Editing Information Concerning Disk Storage

In earlier versions of SQL Server, the only way to inspect data or index pages of a table has been to use two DBCC commands: DBCC PAGE and DBCC IND. The main problem with these commands is that they were (and still are) undocumented, meaning that the functionality has been added to the system but there is no official support for it.

In SQL Server 2012, Microsoft released a new dynamic management function (DMF), called sys.dm_db_database_page_allocations. The purpose of this DMF is to replace DBCC PAGE and DBCC IND. We will use an example to introduce this DMF.

In Example 15.5, we inspect data and index pages of the employee table.

Example 15.5

USE sample;

SELECT allocated_page_page_id AS page, index_id AS ind,

page_type_desc AS description

FROM sys.dm_db_database_page_allocations

(DB_ID(‘sample’), OBJECT_ID(’employee’),NULL/ NULL, ‘DETAILED’)

WHERE is_allocated = 1;

The result is

As you can see from Example 15.5, the sys.dm_db_database_page_allocations function has five parameters. The first four parameters—@databaseid, @tableid, @indexid, and @partitionid—detail the object whose pages you are interested in. Therefore, in Example 15.5, we analyze the employee table (@tableid) of the sample database (@databaseid) in relation to all indexes (@indexid = NULL) and all partitions (@partitionid = NULL).

@tableid, @indexid, and @partitionid can be NULL, but @databaseid has to be specified. If, for example, you specify the @tableid and @indexid parameters of a particular index but you pass in NULL for the @partitionid parameter, the function will return all pages in all partitions for that particular index. If only the @databaseid parameter is valued, all (table-related) pages in that database are returned.

The last parameter of this DMF, @mode, specifies the amount of information that is displayed and can be either ‘LIMITED’ or ‘DETAILED’. ‘LIMITED’ returns less information.

On the other hand, ‘DETAILED’ uses significantly more resources. Therefore, the value ‘DETAILED’ should be used only on small sets of pages.

The is_allocated column of the DMF in the WHERE clause of the SELECT statement restricts the result set only to the pages that are allocated.

NOTE Before the execution of the DMF in Example 15.5, I created an index for the emp_no column of the employee table to display the information of index pages, too.

There are two problems with the sys.dm_db_database_page_allocations DMF. First, it is undocumented. The other problem is more significant and concerns performance. The function reads the information of all the pages for the entire table or index, even when the filter in the WHERE clause is restricted to a single page. For this reason, the bigger the table (index), the worse the performance of this DMF.

Therefore, Microsoft introduced in SQL Server 2019 a new and improved dynamic management function called sys.dm_db_page_info. This function takes a similar set of arguments as sys.dm_db_database_page_allocations, still allowing you to identify an object by its page, but performs more efficiently. Example 15.6 shows the use of this DMF.

Example 15.6

USE sample;

DECLARE @db_id CHAR(10) = db_id(‘sample’)

SELECT OBJECT_NAME(object_id) table_name, page_id

FROM sys.dm_db_page_info (@db_id,1, 21320, N’LIMITED’);

The result is

The sys.dm_db_page_info DMF has four parameters: @DatabaseId, @FileId, @PageId, and @Mode. The first and the last parameter have the same meaning as the parameters with the same name for the sys.dm_db_database_page_allocations DMF. @FileId is the ID of the file, while @PageId is the ID of the corresponding page. (The page_ID = 21320, which we use as the fourth parameter of the sys.dm_db_page_info view, has been determined in Example 15.5.)

4. Parallel Processing of Tasks

The Database Engine can execute different database tasks in parallel. The following tasks can be parallelized:

  • Bulk load
  • Backup
  • Query execution
  • Indices

The Database Engine allows data to be loaded in parallel using the bcp utility. (For the description of the bcp utility, see the next section.) The table into which the data is loaded must not have any indices, and the load operation must not be logged. (Only applications using the ODBC or OLE DB-based APIs can perform parallel data loads into a single table.)

The Database Engine can back up databases or transaction logs to multiple devices using parallel striped backup. In this case, database pages are read by multiple threads one extent at a time (see also Chapter 16).

The Database Engine provides parallel queries to enhance the query execution. With this feature, the independent parts of a SELECT statement can be executed using several native threads on a computer. Each query that is planned for the parallel execution contains an exchange operator in its query execution plan. (An exchange operator is an operator in a query execution plan that provides process management, data redistribution, and flow control.) For such a query, the Database Engine generates a parallel query execution plan. Parallel queries significantly improve the performance of the SELECT statements that process very large amounts of data.

On computers with multiple processors, the Database Engine automatically uses more processors to perform index operations, such as creation and rebuilding of an index. The number of processors employed to execute a single index statement is determined by the configuration option max degree of parallelism as well as the current workload. If the Database Engine detects that the system is busy, the degree of parallelism is automatically reduced before the statement is executed.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *