The easiest and most natural way to design an entity is to use a single table. Also, if all instances of an entity belong to a table, you don’t need to decide where to store its rows physically, because the database system does this for you. For this reason there is no need for you to do any administrative tasks concerning storage of table data, if you don’t want to.
On the other hand, one of the most frequent causes of poor performance in relational database systems is contention for data that resides on a single I/O device. This is especially true if you have one or more very large tables with millions of rows. In that case, on a system with multiple CPUs, partitioning the table can lead to better performance through parallel operations.
By using data partitioning, you can divide very large tables (and indices too) into smaller parts that are easier to manage. This allows many operations to be performed in parallel, such as loading data and query processing.
Partitioning also improves the availability of the entire table. By placing each partition on its own disk, you can still access one part of the entire data even if one or more disks are unavailable. In that case, all data in the available partitions can be used for read and write operations. The same is true for maintenance operations.
If a table is partitioned, the query optimizer can recognize when the search condition in a query references only rows in certain partitions and therefore can limit its search to those partitions. That way, you can achieve significant performance gains, because the query optimizer has to analyze only a fraction of data from the partitioned table.
NOTE In this chapter, I discuss only horizontal table partitioning. Vertical partitioning is also an issue, but it does not have the same significance as horizontal partitioning.
1. How the Database Engine Partitions Data
A table can be partitioned using any column of the table. Such a column is called the partition key. (It is also possible to use a group of columns for the particular partition key.) The values of the partition key are used to partition table rows to different filegroups.
Two other important notions in relation to partitioning are the partition scheme and partition function. The partition scheme maps the table rows to one or more filegroups. The partition function defines how this mapping is done. In other words, the partition function defines the algorithm that is used to direct the rows to their physical location.
The Database Engine supports only one form of partitioning, called range partitioning. Range partitioning divides table rows into different partitions based on the value of the partition key. Hence, by applying range partitioning you will always know in which partition a particular row will be stored.
NOTE Besides range partitioning, there are several other forms of horizontal partitioning. One of them is called hash partitioning. In contrast to range partitioning, hash partitioning places rows one after another in partitions by applying a hashing function to the partition key. Hash partitioning is not supported by the Database Engine.
The steps for creating partitioned tables using range partitioning are described next.
2. Steps for Creating Partitioned Tables
Before you start to partition database tables, you have to complete the following steps:
- Set partition goals.
- Determine the partition key and number of partitions.
- Create a filegroup for each partition.
- Create the partition function and partition scheme.
- Create partitioned indices (optionally).
All of these steps will be explained in the following sections.
2.1. Set Partition Goals
Partition goals depend on the type of applications that access the table that should be partitioned. There are several different partition goals, each of which could be a single reason to partition a table:
- Improved performance for individual queries
- Reduced contention
- Improved data availability
If your primary goal of table partitioning is to improve performance for individual queries, then distribute all table rows evenly. That way, the database system doesn’t have to wait for data retrieval from a partition that has more rows than other partitions. Also, if such queries access data by performing a table scan against significant portions of a table, then you should partition the table rows only. (Partitioning the corresponding index will just add overhead in such a case.)
Data partitioning can reduce contention when many simultaneous queries perform an index scan to return just a few rows from a table. In this case, you should partition the table and index with a partition scheme that allows each query to eliminate unneeded partitions from its scan. To reach this goal, start by investigating which queries access which parts of the table. Then partition table rows so that different queries access different partitions.
Partitioning improves the availability of the database. By placing each partition on its own filegroup and locating each filegroup on its own disk, you can increase the data availability, because if one disk fails and is no longer accessible, only the data in that partition is unavailable. While the system administrator services the corrupted disk, limited access still exists to other partitions of the table.
2.2. Determine the Partition Key and Number of Partitions
A table can be partitioned using any table column. The values of the partition key are used to partition table rows to different filegroups. For the best performance, each partition should be stored in a separate filegroup, and each filegroup should be stored on a separate disk device.
By spreading the data across several disk devices, you can balance the I/O and improve query performance, availability, and maintenance.
You should partition the data of a table using a column that does not frequently change.
If you use a column that is often modified, any update operation of that column can force the system to move the modified rows from one partition to the other, and this could be time consuming.
2.3. Create a Filegroup for Each Partition
To achieve better performance, higher data availability, and easier maintenance, use different filegroups to separate table data. The number of filegroups to use depends mostly on the hardware you have. When you have multiple CPUs, partition your data so that each CPU can access data on one disk device. If the Database Engine can process multiple partitions in parallel, the processing time of your application will be significantly reduced.
Each data partition must map to a filegroup. To create a filegroup, you use either the CREATE DATABASE statement or ALTER DATABASE statement. Example 26.1 shows the creation of a database called test_partitioned with one primary filegroup and two other filegroups.
NOTE Before you create the test_partitioned database, you have to change the physical addresses of all .mdf and .ndf files in Example 26.1 according to the file system you have on your computer.
Example 26.1
USE master;
CREATE DATABASE test_partitioned
ON PRIMARY
( NAME=’MyDB_Primary’,
FILENAME=
‘d:\mssql\PT_Test_Partitioned_Range_df.mdf’,
SIZE=2000,
MAXSIZE=5000,
FILEGROWTH=1 ) ,
FILEGROUP MyDB_FG1
( NAME = ‘FirstFileGroup’,
FILENAME =
‘d:\mssql\MyDB_FG1.ndf’,
SIZE = 1000MB,
MAXSIZE=2500,
FILEGROWTH=1 ) ,
FILEGROUP MyDB_FG2
( NAME = ‘SecondFileGroup’,
FILENAME =
‘f:\mssql\MyDB_FG2.ndf’,
SIZE = 1000MB,
MAXSIZE=2500,
FILEGROWTH=1 );
Example 26.1 creates a database called test_partitioned, which contains a primary filegroup, MyDB_Primary, and two other filegroups, MyDB_FG1 and MyDB_FG2. The MyDB_FG1 filegroup is stored on the D: drive, while the MyDB_FG2 filegroup is stored on the F: drive.
If you want to add filegroups to an existing database, use the ALTER DATABASE statement. Example 26.2 shows how to create another filegroup for the test_partitioned database.
Example 26.2
USE master;
ALTER DATABASE test_partitioned
ADD FILEGROUP MyDB_FG3
GO
ALTER DATABASE test_partitioned
ADD FILE
( NAME = ‘ThirdFileGroup’,
FILENAME =
‘G:\mssql\MyDB_FG3.ndf’,
SIZE = 1000MB,
MAXSIZE=2500,
FILEGROWTH=1)
TO FILEGROUP MyDB_FG3;
Example 26.2 uses the ALTER DATABASE statement to create an additional filegroup called MyDB_FG3 on the G: drive. The second ALTER DATABASE statement adds a new file to the created filegroup. Notice that the TO FILEGROUP option specifies the name of the filegroup to which the new file will be added.
2.4. Create the Partition Function and Partition Scheme
The next step after creating filegroups is to create the partition function, using the CREATE PARTITION FUNCTION statement. The syntax of the CREATE PARTITION FUNCTION is as follows:
CREATE PARTITION FUNCTION function_name(param_type)
AS RANGE [ LEFT | RIGHT ]
FOR VALUES ([ boundary_value [ ,…n ] )
function_name defines the name of the partition function, while param_type specifies the data type of the partition key .boundary_value specifies one or more boundary values for each partition of a partitioned table or index that uses the partition function.
The CREATE PARTITION FUNCTION statement supports two forms of the RANGE option: RANGE LEFT and RANGE RIGHT. RANGE LEFT determines that the boundary condition is the upper boundary in the first partition. According to this, RANGE RIGHT specifies that the boundary condition is the lower boundary in the last partition (see Example 26.3). If not specified, RANGE LEFT is the default.
Example 26.3 shows the definition of the partition function.
Example 26.3
USE test_partitioned;
CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (500000);
The myRangePF1 partition function specifies that there will be two partitions and that the boundary value is 500,000. This means that all values of the partition key that are less than or equal to 500,000 will be placed in the first partition, while all values greater than 500,000 will be stored in the second partition. (Note that the boundary value is related to the values in the partition key.)
The created partition function is useless if you don’t associate it with specific filegroups. As mentioned earlier in the chapter, you make this association via a partition scheme, and you use the CREATE PARTITION SCHEME statement to specify the association between a partition function and the corresponding filegroups. Example 26.4 shows the creation of the partition scheme for the partition function in Example 26.3.
Example 26.4
USE test_partitioned;
CREATE PARTITION SCHEME myRangePSl
AS PARTITION myRangePF1
TO (MyDB_FG1, MyDB_FG2);
Example 26.4 creates the partition scheme called myRangePS1. According to this scheme, all values to the left of the boundary value (i.e., all values < 500,000) will be stored in the MyDB_FG1 filegroup. Also, all values to the right of the boundary value will be stored in the MyDB_FG2 filegroup.
NOTE When you define a partition scheme, you must be sure to specify a filegroup for each partition, even if multiple partitions will be stored on the same filegroup.
The creation of a partitioned table is slightly different from the creation of a nonpartitioned table. As you might guess, the CREATE TABLE statement must contain the name of the partition scheme and the name of the table column that will be used as the partition key. Example 26.5 shows the enhanced form of the CREATE TABLE statement that is used to define partitioning of the orders table.
Example 26.5
USE test_partitioned;
CREATE TABLE orders
(orderid INTEGER NOT NULL,
orderdate DATETIME,
shippeddate DATETIME,
freight money)
ON myRangePS1 (orderid);
The ON clause at the end of the CREATE TABLE statement is used to specify the already-defined partition scheme (see Example 26.4). The specified partition scheme ties together the column of the table (orderid) with the partition function where the data type (INT) of the partition key is specified (see Example 26.3).
The batch in Example 26.6 loads a million rows into the orders table. You can use the sys.partitions view to edit the information concerning partitions in the orders table.
Example 26.6
USE test_partitioned;
declare @i int , @order_id integer
declare @orderdate datetime
declare @shipped_date datetime
declare @freight money
set @i = 1
set @orderdate = getdate()
set @shipped_date = getdate()
set @freight = 100.00
while @i < 1000001
begin
insert into orders (orderid, orderdate, shippeddate, freight)
values( @i, @orderdate, @shipped_date, @freight)
set @i = @i+1
end
2.5. Create Partitioned Indices
When you partition table data, you can partition the indices that are associated with that table, too. You can partition table indices using the existing partition scheme for that table or a different one. When both the indices and the table use the same partition function and the same partitioning columns (in the same order), the table and index are said to be aligned. When a table and its indices are aligned, the database system can move partitions in and out of partitioned tables very effectively, because the partitioning of both database objects is done with the same algorithm. For this reason, in the most practical cases it is recommended that you use aligned indices.
Example 26.7 shows the creation of a clustered index for the orders table. This index is aligned because it is partitioned using the partition scheme of the orders table.
Example 26.7
USE test_partitioned;
CREATE UNIQUE CLUSTERED INDEX CI_orders
ON orders(orderid)
ON myRangePS1(orderid);
As you can see from Example 26.7, the creation of the partitioned index for the orders table is done using the enhanced form of the CREATE INDEX statement. This form of the CREATE INDEX statement contains an additional ON clause that specifies the partition scheme. If you want to align the index with the table, specify the same partition scheme as for the corresponding table. (The first ON clause is part of the standard syntax of the CREATE INDEX statement and specifies the column for indexing.)
3. Partitioning Techniques for Increasing System Performance
The following partitioning techniques can significantly increase performance of your system:
- Table collocation
- Partition-aware seek operation
- Parallel execution of queries
3.1. Table Collocation
Besides partitioning a table together with the corresponding indices, the Database Engine also supports the partitioning of two tables using the same partition function. This partition form means that rows of both tables that have the same value for the partition key are stored together at a specific location on the disk. This concept of data partitioning is called collocation.
Suppose that, besides the orders table (see Example 26.3), there is also an order_details table, which contains zero, one, or more rows for each unique order ID in the orders table.
If you partition both tables using the same partition function on the join columns orders .orderid and order_details.orderid, the rows of both tables with the same value for the orderid columns will be physically stored together. Suppose that there is a unique order with the identification number 49031 in the orders table and five corresponding rows in the order_details table. In the case of collocation, all six rows will be stored side by side on the disk. (The same procedure will be applied to all rows of these tables with the same value for the orderid columns.)
This technique has significant performance benefits when, accessing more than one table, the data to be joined is located at the same partition. In that case the system doesn’t have to move data between different data partitions.
3.2. Partition-Aware Seek Operation
The internal representation of a partitioned table appears to the query processor as a composite (multicolumn) index with an internal column as the leading column. (This column, called partitionedID, is a hidden computed column used internally by the system to represent the ID of the partition containing a specific row.)
For example, suppose there is a tab table with three columns, col1, col2, and col3. (coll is used to partition the table, while col2 has a clustered index.) The Database Engine treats internally such a table as a nonpartitioned table with the schema tab (partitionID, coll, col2, col3) and with a clustered index on the composite key (partitionedID, col2). This allows the query optimizer to perform seek operations based on the computed column partitionID on any partitioned table or index. That way, the performance of a significant number of queries on partitioned tables can be improved because the partition elimination is done earlier.
3.3. Parallel Execution of Queries
The Database Engine supports execution of parallel threads. In relation to this feature, the system provides two query execution strategies on partitioned objects:
- Single-thread-per-partition strategy The query optimizer assigns one thread per partition to execute a parallel query plan that accesses multiple partitions. One partition is not shared between multiple threads, but multiple partitions can be processed in parallel.
- Multiple-threads-per-partition strategy The query optimizer assigns multiple threads per partition regardless of the number of partitions to be accessed. In other words, all available threads start at the first partition to be accessed and scan forward. As each thread reaches the end of the partition, it moves to the next partition and begins scanning forward. The thread does not wait for the other threads to finish before moving to the next partition.
Which strategy the query optimizer chooses depends on your environment. It chooses the single-thread-per-partition strategy if queries are I/O-bound and include more partitions than the degree of parallelism. It chooses the multiple-threads-per-partition strategy in the following cases:
- Partitions are striped evenly across many disks.
- Your queries use fewer partitions than the number of available threads.
- Partition sizes differ significantly within a single table.
4. Editing Information Concerning Partitioning
You can use the following catalog views to display information concerning partitioning:
- partitions
- partition_schemes
- partition_functions
The sys.partitions view contains a row for each partition of all the tables and some types of indices in the particular database. (All tables and indices in the Database Engine contain at least one partition, whether or not they are explicitly partitioned.)
The following are the most important columns of the sys.partitions catalog view:
- partition_id Specifies the partition ID, which is unique within the current database.
- object_id Defines the ID of the object to which this partition belongs.
- index_id Indicates the ID of the index within that object.
- hobt_id Indicates the ID of the data heap or B-tree that contains the rows for this partition.
- partition_number Indicates a 1-based partition number within the owning index or heap. For nonpartitioned tables and indices, the value of the partition_number column is 1.
Example 26.8 displays a list of all partitioned tables in the test_partitioned database.
Example 26.8
USE test_partitioned;
SELECT DISTINCT t.name
FROM sys.partitions p INNER JOIN sys.tables t
ON p.object_id = t.object_id
where p.partition_number <> 1;
The result is
name
—–
orders
In Example 26.8 the sys.partitions catalog view is joined with sys.tables to get the list of all partitioned tables. Note that if you are looking specifically for partitioned tables, then you will have to filter your query with the condition
sys.partitions.partition_number <> 1
because for nonpartitioned tables, the value of the partition_number column is always 1.
The sys.partition_schemes catalog view contains a row for each data space with type = PS (“partition scheme”). (Generally, the data space can be a filegroup, partition scheme, or FILESTREAM data filegroup.) This view inherits the columns from the sys.data_spaces catalog view.
The sys.partition_functions catalog view contains a row for each partition function belonging to an instance of the Database Engine. The most important columns are name and function_id. The name column specifies the name of the partition function. (This name must be unique within the database.) The function_id column defines the ID of the corresponding partition function. This value is unique within the database.
Example 26.9 shows the use of the sys.partition_functions catalog view.
Example 26.9
For the orders table of the test_partitioned database, find the name of the corresponding partition scheme as well as the name of the partition function used by that scheme. Additionally, display names of all filegroups associated with that partition function.
SELECT ps.name PartScheme,pf.name PartFunc,fg.name FileGroupName
FROM sys.indexes i
JOIN sys.partitions p ON i.object_id=p.object_id
AND i.index_id=p.index_id
JOIN sys.partition_schemes ps on ps.data_space_id=i.data_space_id
JOIN sys.partition_functions pf on pf.function_id=ps.function_id
JOIN sys.allocation_units au ON au.container_id=p.hobt_id
JOIN sys.filegroups fg ON fg.data_space_id=au.data_space_id
WHERE i.object_id = object_id(‘orders’);
The result is
Example 26.9 joins six catalog views to obtain the desired information: sys.indexes, sys.partitions, sys.partition_schemes, sys.partition_functions, sys.allocation_units, and sys.filegroups. To join sys.indexes with sys.partitions, we use the combination of the values of the object_id and index_id columns. Also, the join operation between sys.indexes and sys.partition_schemes is done using the data_space_id column in both tables. (data_space_id is an ID value that uniquely specifies the corresponding data space.)
sys.partition_functions and sys.partition_schemes are connected using the function_id column in both tables. sys.filegroups and sys.allocation_units are joined together using the already mentioned column data_space_id. Finally, to join the sys.partitions and sys.allocation_units views, we use the hobt_id column from the former and the container_id column from the latter. (For the description of the hobt_id column, see the definition of the sys.partitions view earlier in this section.)
5. Guidelines for Partitioning Tables and Indices
The following suggestions are guidelines for partitioning tables and indices:
- Do not partition every table. Partition only those tables that are accessed most frequently.
- Consider partitioning a table if it is a huge one, meaning it contains at least several hundred thousand rows.
- For best performance, use partitioned indices to reduce contention between sessions.
- Balance the number of partitions with the number of processors on your system. If it is not possible for you to establish a 1:1 relationship between the number of partitions and the number of processors, specify the number of partitions as a multiple factor of the number of processors. (For instance, if your computer has four processors, the number of partitions should be divisible by four.)
- Do not partition the data of a table using a column that changes frequently. If you use a column that changes often, any update operation of that column can force the system to move the modified rows from one partition to another, and this could be very time consuming.
- For optimal performance, partition the tables to increase parallelism, but do not partition their indices. Place the indices in a separate filegroup.
Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.