SLQ Server: Tools for Editing the Optimizer Strategy

The Database Engine supports several tools that enable you to edit what the query optimizer is doing. You can use the following tools, among others:

  • SET statement (to display textual or XML execution plans)
  • SQL Server Management Studio (to display graphical execution plans)
  • Dynamic management views and functions
  • SQL Server Profiler (discussed in detail in Chapter 20)

The following sections describe the first three tools.

NOTE Almost all examples in this chapter use the AdventureWorks database. If your system doesn’t contain this database, the introductory part of the book describes how you can download it.

1. SET Statement

To understand the different options of the SET statement, you have to know that there are three different forms for how the execution plan of a query can be displayed:

  • Textual form
  • Using XML
  • Graphical form

The first two forms use the SET statement, so these two forms are discussed in the following subsections. (The graphical form of execution plans is discussed a bit later, in the section “SQL Server Management Studio and Graphical Execution Plans”)

1.1. Textual Execution Plan

The phrase “textual execution plan” means that the execution plan of a query is displayed in text form. Therefore, the output of a textual execution plan is returned in the form of rows. The Database Engine uses vertical bars to show the dependencies between the operations taking place. Textual execution plans can be displayed using the following options of the SET statement:

  • SHOWPLAN_TEXT
  • SHOWPLAN_ALL

Users running a query can display the textual execution plan for the query by activating (setting the option to ON) either SHOWPLAN_TEXT or SHOWPLAN_ALL, before they enter the corresponding SELECT statement. The SHOWPLAN_ALL option displays the same detailed information about the selected execution plan for the query as SHOWPLAN_TEXT with the addition of an estimate of the resource requirements for that statement.

Example 19.1 shows the use of the SET SHOWPLAN_TEXT option.

NOTE Once you activate the SHOWPLAN_TEXT option, all consecutive Transact-SQL statements will not be executed until you deactivate this option with SET SHOWPLAN_TEXT OFF.

Example 19.1

SET SHOWPLAN_TEXT ON;

GO

USE AdventureWorks;

SELECT * FROM HumanResources.Employee e JOIN Person.Address a

ON e.BusinessEntityID = a.AddressID AND e.BusinessEntityID = 10;

GO

SET SHOWPLAN_TEXT OFF;

The following textual plan shows the output of Example 19.1:

|–Nested Loops(Inner Join)

|–Clustered Index Seek

(OBJECT:([AdventureWorks].[Person].[Address].[PK_Address_AddressID] AS [a]),

SEEK:([a].[AddressID]=(10)) ORDERED FORWARD)

|–Compute Scalar

(DEFINE:([e].[OrganizationLevel]=[AdventureWorks].[HumanResources]

.[Employee].[OrganizationLevel] as [e].[OrganizationLevel]))

|–Compute Scalar

(DEFINE:([e].[OrganizationLevel]=[AdventureWorks].[HumanResources]

.[Employee].[OrganizationNode] as [e].[OrganizationNode].GetLevel()))

|–Clustered Index Seek

(OBJECT:([AdventureWorks].[HumanResources].[Employee].[PK_Employee_

BusinessEntityID] AS [e]), SEEK:([e].[BusinessEntityID]=(10)) ORDERED FORWARD)

Before we discuss the execution plan of Example 19.1, you need to understand how to read its textual form. The indentation of operators determines the operator execution: the operator that is indented the furthest is executed first. If two or more operators have the same indentation, they are processed from the top downward. Now, if you take a look at the output of Example 19.1, you will see that there are three operators: Nested Loops, Clustered Index Seek (appears twice), and Compute Scalar (appears twice). (All operators are preceded by a bar, |.)

NOTE The Compute Scalar operator evaluates an expression to produce a computed scalar value. This may then be referenced elsewhere in the query, as in this example.

Further, the Clustered Index Seek operator for the Employee table is executed first. After that, the Compute Scalar operator is applied twice to the Employee table and the Clustered Index Seek operator is applied to the Address table. At the end, both tables (Employee and Address) are joined using the nested loop technique.

NOTE Generally, index access has two forms: index scan and index seek. Index scan processes the entire leaf level of an index tree, while index seek returns index values (or rows) from one or more ranges of an index. (Index seek is described later in the chapter.)

1.2. XML Execution Plan

The phrase “XML execution plan” means that the execution plan of a query is displayed as an XML document. (For more information about XML, see Microsoft Docs.) The most important advantage of using XML execution plans is that such plans can be ported from one system to another, allowing you to use them in another environment. (How to save execution plans in a file is explained a bit later.)

The SET statement has two options in relation to XML:

  • SHOWPLAN_XML
  • STATISTICS XML

The SHOWPLAN_XML option is very similar to the SHOWPLAN_TEXT option, but returns information as a set of XML documents. In other words, if you activate this option, the Database Engine returns detailed information about how the statements are going to be executed in the form of a well-formed XML document, without executing them. Each statement is reflected in the output by a single document. Each document contains the text of the statement, followed by the details of the execution steps.

NOTE Once you activate the SHOWPLAN_XML option, all consecutive Transact-SQL statements will not be executed until you deactivate this option with SET SHOWPLAN_XML OFF.

The main difference between the SHOWPLAN_XML and STATISTICS XML options is that the output of the latter is generated at run time. For this reason, STATISTICS XML includes the result of the SHOWPLAN_XML option as well as additional run-time information.

To save an XML execution plan in a file, in the Results pane, right-click the SQL Server XML Showplan that contains the query plan and choose Save Results As. In the Save <Grid or Text> Results dialog box, in the Save As Type box, click All Files (*.*). In the File Name box, provide a name with the .sqlplan suffix, and then click Save. (You can open a saved XML execution plan by double-clicking it.)

1.3. Other Options of the SET Statement

The SET statement has many other options, which are used in relation to locking, transaction, and date/time statements. Concerning statistics, the Database Engine supports the following three options of the SET statement:

  • STATISTICS IO
  • STATISTICS TIME
  • STATISTICS PROFILE

The STATISTICS IO option causes the system to display statistical information concerning the amount of disk activity generated by the query—for example, the number of read and write I/O operations processed with the query. The STATISTICS TIME option causes the system to display the processing, optimization, and execution time of the query.

When the STATISTICS PROFILE option is activated, each executed query returns its regular result set, followed by an additional result set that shows the profile of the query execution.

2. SQL Server Management Studio and Graphical Execution Plans

A graphical execution plan is the best way to display the execution plan of a query if you are a beginner or want to take a look at different plans in a short time. This form of display uses icons to represent operators in the query plan.

As an example of how graphical execution plans can be initiated and what they look like, Figure 19-2 shows the graphical execution plan for the query in Example 19.1. To display an execution plan in the graphical form, write the query in Query Editor of Management Studio and click the Display Estimated Execution Plan button in the toolbar of SQL Server Management Studio. (The alternative way is to choose Query | Display Estimated Execution Plan.)

If you take a look at Figure 19-2, you will see that there is one icon for each operator of the execution plan. If you move the mouse over one of the icons, its detailed information appears, including the estimated I/O and CPU costs, estimated number of rows and their size, and the cost of the operator. The arrows between the icons represent the data flow. (You can also click an arrow, in which case the related information, such as estimated number of rows and estimated row size, will be displayed.)

To “read” the graphical execution plan of a query, you have to follow the flow of information from right to left and from top to bottom. (It is analogous to the flow of information in a textual execution plan.)

NOTE The query optimizer of the Database Engine uses operators to build a query execution plan. Each operator has its own unique icon, which is used in the graphical display of execution plans. As you can see from Figure 19-2, there are several operators, such as Clustered Index Seek and Compute Scalar, each represented with the corresponding icon.

As its name suggests, clicking the Display Estimated Execution Plan button displays the estimated plan of the query, without executing it. There is another button, Include Actual Execution Plan, that executes the query and additionally displays its execution plan. The actual execution plan contains additional information in relation to the estimated one, such as the actual number of processed rows and the actual number of executions for each operator.

3. Examples of Execution Plans

This section presents several queries related to the AdventureWorks database, together with their execution plans. These examples demonstrate the topics already discussed, enabling you to see how the query optimizer works in practice.

Example 19.2 introduces a new table (new_addresses) in the sample database.

Example 19.2

USE sample;

SELECT * into new_addresses

FROM AdventureWorks.Person.Address;

GO

CREATE INDEX i_stateprov on new_addresses(StateProvinceID)

Example 19.2 copies the content of the Address table from the Person schema of the AdventureWorks database into the new_addresses table of the sample database. This is necessary because the former table contains several indices, which hinders the direct use of the Address table of the AdventureWorks database to show specific properties of the query optimizer. (Besides that, the example creates an index on the StateProvincelD column of that table.)

Example 19.3 shows a query with high selectivity and shows the textual plan that the optimizer chooses in such a case.

Example 19.3

— high selectivity

USE sample;

SELECT * FROM new_addresses a

WHERE a.StateProvinceID = 32;

The graphical output of Example 19.3 is shown in Figure 19-3.

The filter in Example 19.3 selects only one row from the new_addresses table. (The total number of rows in this table is 19614.) For this reason, the selectivity of the expression in the WHERE clause is very high (1/19614). In such a case, as you can see from Figure 19-3, the existing index on the StateProvincelD column is used by the optimizer.

Example 19.4 shows the same query as in Example 19.3, but with another filter.

Example 19.4

— low selectivity

USE sample;

SELECT * FROM new_addresses a

WHERE a.StateProvinceID = 9;

The graphical plan of Example 19.4 is given in Figure 19-4.

Although the query in Example 19.4 differs from the query in Example 19.3 only by a value on the right side of the condition in the WHERE clause, the execution plan that the optimizer chooses differs significantly. In this case, the existing index won’t be used, because the selectivity of the filter is low. (The ratio of the number of rows satisfying the condition to the total number of rows in the table is 4564/19614 = 0.23, or 23 percent.)

Example 19.5 shows the use of the clustered index.

Example 19.5

USE AdventureWorks;

SELECT * FROM HumanResources.Employee

WHERE HumanResources.Employee.BusinessEntityID = 10;

The graphical output of Example 19.5 is given in Figure 19-5.

The query in Example 19.5 uses the PK_Employee_BusinessEntityID clustered index. This clustered index is created implicitly by the system because the BusinessEntityID column is the primary key of the Employee table.

Example 19.6 shows the use of the nested loop technique.

Example 19.6

USE AdventureWorks;

SELECT * FROM HumanResources.Employee e JOIN

Person.Address a

ON e.BusinessEntityID = a.AddressID

AND e.BusinessEntityID = 10;

The graphical output of Example 19.6 is given in Figure 19-6.

The query in Example 19.6 uses the nested loop technique even though the join columns of the tables are at the same time their primary keys. For this reason, one could expect that the merge join technique would be used. The query optimizer decides to use the nested loop technique because there is an additional filter (e.EmployeeID = 10) that reduces the result set of the query to a single row.

Example 19.7 shows the use of the hash join technique.

Example 19.7

USE AdventureWorks;

SELECT * FROM Person.Address a JOIN Person.StateProvince s

ON a.StateProvinceID = s.StateProvinceID;

The graphical output of the query in Example 19.7 is given in Figure 19-7.

Although both join columns in the ON clause are primary keys of the particular tables (Address and StateProvince), the query optimizer doesn’t choose the merge join method. The reason is that all (19,614) rows of the Address table belong to the result set. In such a case, the use of the hash join method is more beneficial than the other two join processing techniques.

3.1. Examples of Execution Plans with Filtered Indices

Filtered indices are exactly what the name suggests: indices specified with a filter in the WHERE clause of the SELECT statement. This feature provides benefits when you frequently retrieve a specific subset of tables’ rows. The first benefit is that by creating a filtered (instead of a regular) index, you create only a subset of index rows of the underlying table, hence saving a significant amount of disk space. Second, rebuilding or reorganizing such indices will be faster.

Example 19.8 shows the creation of a filtered index and its advantage in relation to the corresponding regular index. The corresponding execution plan is displayed in Figure 19-8.

Example 19.8

USE AdventureWorks;

CREATE INDEX i_unitprice ON Sales.SalesOrderDetail(UnitPrice)

WHERE UnitPrice > 1000 ;

SELECT SalesOrderDetailID, UnitPrice

FROM Sales.SalesOrderDetail

WHERE UnitPrice > 2000 ;

Example 19.8 creates a nonclustered index called i_unitprice on the UnitPrice column of the SalesOrderDetail table. This index is a filtered index, because the definition of that index contains the filter (WHERE UnitPrice > 1000). If you take a closer look on the Index Seek operator in Figure 19-8, you will see that the query used the i_unitprice index. The query will benefit from the i_unitprice index, because the index itself restricts the search of the subsequent query (WHERE UnitPrice > 1000 is the condition in the filtered index and WHERE UnitPrice > 2000 is the corresponding condition in the query).

4. Dynamic Management Views and Query Optimizer

There are many dynamic management views (and functions) that are directly related to query optimization. In this section, the following DMVs are discussed:

  • sys.dm_exec_query_optimizer_info
  • sys.dm_exec_query_plan
  • sys.dm_exec_cached_plans
  • sys.dm_exec_query_stats
  • sys.dm_exec_sql_text
  • sys.dm_exec_text_query_plan
  • sys.dm_db_stats_histogram

4.1. sys.dm_exec_query_optimizer_info

The sys.dm_exec_query_optimizer_info view is probably the most important DMV in relation to the work of the query optimizer because it returns detailed statistics about its operation. You can use this view when tuning a workload to identify query optimization problems or improvements.

The sys.dm_exec_query_optimizer_info view contains exactly three columns: counter, occurrence, and value. The counter column specifies the name of the optimizer event, while the occurrence column displays the cumulative number of occurrences of these events. The value of the value column contains additional information concerning events. (Not all events deliver a value value.)

Using this view, you can, for example, display the total number of optimizations, the elapsed time value, and the final cost value to compare the query optimizations of the current workload and any changes observed during the tuning process.

Example 19.9 shows the use of the sys.dm_exec_query_optimizer_info view.

Example 19.9

USE sample;

SELECT counter, occurrence, value

FROM sys.dm_exec_query_optimizer_info

WHERE value IS NOT NULL

AND counter LIKE ‘search 1%’;

The result is

The counter column displays the phases of the optimization process. Therefore, Example 19.9 investigates how many times optimization Phase 1 is executed.

NOTE Because of its complexity, the optimization process is broken into three phases. The first phase (Phase 0) considers only nonparallel execution plans. If the cost of Phase 0 isn’t optimal, Phase 1 will be executed, in which both nonparallel plans and parallel plans are considered. Phase 2 takes into account only parallel plans.

4.2. sys.dm_exec_query_plan

As you already know, execution plans for batches and Transact-SQL statements are placed in the cache. That way, they can be used anytime by the optimizer. You can examine the cache using several DMVs. One of these is the sys.dm_exec_query_plan view, which returns all execution plans that are stored in the cache of your system. (The execution plans are displayed in XML format.)

Each query plan stored in the cache is identified by a unique identifier called a plan handle. The sys.dm_exec_query_plan view requires a plan handle to retrieve the execution plan for a particular Transact-SQL query or batch.

4.3. sys.dm_exec_cached_plans

This view returns a row for each query plan that is cached by the Database Engine for faster query execution. You can use this view to find cached query plans, query text of the cached plans, the amount of memory taken by cached plans, and the reuse count of the cached plans.

The most important columns of this view are cacheobjtype, which specifies the type of object in the cache, and usecount, which determines the number of times this cache object has been used since its inception.

4.4. sys.dm_exec_query_stats

The sys.dm_exec_query_stats view returns aggregate performance statistics for cached query plans. The view contains one row per query statement within the cached plan, and the lifetime of the rows is tied to the plan itself.

Example 19.10 shows the use of the sys.dm_exec_query_stats and sys.dm_exec_cached_ plans views.

Example 19.10

SELECT ecp.objtype AS Object_Type ,

(SELECT t.text FROM sys.dm_exec_sql_text(qs.sql_handle) AS t) AS

Adhoc_Batch ,qs.execution_count AS Counts ,

qs.total_worker_time AS Total_Worker_Time ,

(qs.total_physical_reads / qs.execution_count ) AS Avg_Physical_Reads ,

(qs.total_logical_writes / qs.execution_count ) AS Avg_Logical_Writes ,

(qs.total_logical_reads / qs.execution_count ) AS Avg_Logical_Reads ,

qs.total_elapsed_time AS Total_Elapsed_Time,

(qs.total_elapsed_time / qs.execution_count ) AS Avg_Elapsed_Time ,

qs.last_execution_time AS Last_Exec_Time,

qs.creation_time AS Creation_Time

FROM sys.dm_exec_query_stats AS qs

JOIN sys.dm_exec_cached_plans ecp ON qs.plan_handle =

ecp.plan_handle

ORDER BY Counts DESC;

Example 19.10 joins the sys.dm_exec_query_stats and sys.dm_exec_cached_plans views to return information for execution plans for all cached plans, which are ordered by the count of their execution times. (The displayed information from the different columns is self-explanatory.)

4.5. sys.dm_exec_sql_text and sys.dm_exec_text_query_plan

The previous view, sys.dm_exec_query_stats, can be used with several other DMVs to display different properties of queries. In other words, each DMV that needs the plan handle to identify the query will be “joined” with the sys.dm_exec_query_stats view to display the required information. One such view is sys.dm_exec_sql_text. This view returns the text of the SQL batch that is identified by the specified handle. Example 19.11 shows the use of the sys.dm_exec_sql_text DMV.

Example 19.11

SELECT TOP 5 total_worker_time/execution_count AS [Avg CPU Time],

SUBSTRING(st.text, (qs.statement_start_offset/2)+1,

((CASE qs.statement_end_offset

WHEN -1 THEN DATALENGTH(st.text)

ELSE qs.statement_end_offset

END – qs.statement_start_offset)/2) + 1) AS statement_text

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st

ORDER BY total_worker_time/execution_count DESC;

The first column in the SELECT list of Example 19.11 displays the average time of the top five queries. The second column displays the selected execution plan of the query in XML format.

In contrast to the sys.dm_exec_sql_text view, sys.dm_exec_text_query_plan returns the execution plan of the batch in XML format. Similar to the previous views, this one is specified by the plan handle. (The plan specified by the plan handle can either be cached or currently executing.)

4.6. sys.dm_db_stats_histogram

The sys.dm_db_stats_histogram view returns the statistics histogram for the specified database object (table or indexed view) in the current database. Example 19.12 creates statistics for the State_Name column of the State table.

Example 19.12

USE sample;

CREATE TABLE State

(State_ID int IDENTITY PRIMARY KEY,

State_name varchar(120) NOT NULL);

INSERT State (State_name)

VALUES (‘Idaho’), (‘Iowa’), (‘Indiana’), (‘Texas’);

GO

CREATE STATISTICS State_Stats

ON State (State_Name) ;

GO

SELECT object_id,stats_id,range_high_key,range_rows,equal_rows

FROM sys.dm_db_stats_histogram(OBJECT_ID(‘State’), 2);

The result is

Example 19.12 creates a State table and inserts a row into it. The next statement, CREATE STATISTICS, creates query optimization statistics on one or more columns of a table, an indexed view, or an external table. Therefore, Example 19.12 creates statistics for the State_ Name column of the State table.

NOTE For most queries, the query optimizer already generates the necessary statistics for a execution plan; in a few cases, you need to create additional statistics with CREATE STATISTICS or modify the query design to improve query performance. In Example 19.12 the execution of the CREATE STATISTICS statement is required because the State table just has been created and populated.

After that, the DMV is executed using the value 2 for the stats_id column. The primary key occupies stats_id number 1. For this reason, we call the DMV for stats_id number 2 to get the histogram for the State table.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *