Intelligent Query Processing in SQL Server: Adaptive Query Processing

NOTE The term Adaptive Query Processing in SQL Server 2017 describes the group of techniques that you can use to improve performance of queries in the Database Engine. In SQL Server 2019, Adaptive Query Processing has the same meaning but now is considered a subset of the wider concept called Intelligent Query Processing.

As you already know from Chapter 19, the query optimizer generates a set of execution plans for a given query. During this process, the query optimizer estimates the cost of the various plan options and chooses the plan with the lowest estimated cost. The query execution process takes the plan chosen by the query optimizer and executes it. (All other plans are deleted and cannot be seen by the user.)

Sometimes the execution plan chosen by the query optimizer of the Database Engine is not the best one. Many factors can cause the optimizer to choose a plan that isn’t the best one. For instance, the estimated number of input rows (called cardinality) may be incorrect. This factor is one of most important factors used to determine which execution plan is selected for execution of the particular query.

All the Adaptive Query Processing features allow the query optimizer to generate more accurate query plans with better cardinality. There are altogether three features for adapting to application workload characteristics:

  • Memory Grant Feedback
  • Adaptive Join
  • Interleaved Execution

The following subsections describe these techniques.

1. Memory Grant Feedback

The ideal memory grant size and the minimum required memory needed for execution are two parameters that are used in an execution plan to calculate how all rows should fit in memory. If the required memory needed for execution is incorrectly sized, performance will suffer. Also, insufficient memory grants may cause memory overflow. The Memory Grant Feedback feature attempts to right-size memory grants for queries, correcting for both overestimation and underestimation. By addressing repeating workloads, Memory Grant Feedback recalculates the actual memory required for a query and then updates the grant value for the cached plan. That way, the query optimizer can use the modified values when an identical query statement is executed again.

The recalculation of the granted memory happens when the assigned memory is too large and there is not enough memory. First, if the granted memory is more than two times the size of the actual used memory, Memory Grant Feedback recalculates the memory grant and updates the cached plan. (Plans with memory grants under 1MB will not be recalculated.) Second, for an insufficiently sized memory grant condition, Memory Grant Feedback triggers a recalculation of the memory grant.

1.1. Memory Grant Feedback Caching

By default, the Memory Grant Feedback is stored in the cached plan and consecutive executions of that statement can benefit from updated values. This means that only the values in the cached plan are changed and that other components, such as Query Store, cannot capture these changes. (As you will see later in this chapter, this is a significant limitation that has been removed in SQL Server 2019.)

There are some special cases where the Memory Grant Feedback is not stored in the cached plan. For instance, a SELECT statement using OPTION (RECOMPILE) creates a new plan and does not cache it. Since it is not cached, no Memory Grant Feedback is produced and it is not stored for that compilation and execution. On the other hand, another statement without OPTION (RECOMPILE) using the same query hash will benefit, when re-executed, from the Memory Grant Feedback.

1.2. Tracking Memory Grant Feedback

Memory Grant Feedback can be tracked using several extended events. The two most important events in relation to Memory Grant Feedback are memory_grant_updated by_feedback, which occurs when the memory grant is updated by feedback, and additional_ memory_grant, which occurs when a query tries to get more memory during execution.

For these two extended events, we will use SQL Server Management Studio to create an extended events session called Grant_Memory and create an event file in which to store the messages from both events. (Refer to Chapter 20 for a detailed description of how to create an extended events session.)

On the General page in SQL Server Management Studio, specify the session name Grant_ Memory. Next, on the Events page, you have to select the two extended events mentioned previously, memory_grant_updated by_feedback and additional_memory_grant. To do so, type the phrase memory_grant in the Event Library field and move both events from the left pane to the right pane by selecting each event in turn and clicking the right-pointing arrow between the left and right panes. Move to the Data Storage page and enter the name of the file in which the captured data will be stored. Click OK.

After creation of the Grant_Memory session, you have to start capturing data. To do this, right-click the sessions name (under Management | Extended Events) and click Start Session. All subsequent statements will be tracked in relation to the memory_grant_updated by_ feedback and additional_memory_grant extended events. To end the capturing of data, click the Stop Session button.

Figure 28-1 shows the details of the session after executing two queries in relation to the AdventureWorks database.

As you can see from the session details, nine events are displayed with the corresponding timestamps and the additional memory required. All events are related to the additional_ memory_grant event. This means that the other event has not been fired in the time period between the start and end of the session.

1.3. Enabling/Disabling Memory Grant Feedback

By setting the compatibility level to 150, as shown in the Note at the beginning of this chapter, you enable all features belonging to Adaptive Query Processing. If you want to enable/disable solely Memory Grant Feedback, you can use either the ALTER DATABASE statement or the query hint called DISABLE_BATCH_MODE_MEMORY_GRANT_FEEDBACK. (The value ON disables and the value OFF enables Memory Grant Feedback.)

Example 28.1 shows how you can use the ALTER DATABASE statement to disable only Memory Grant Feedback. (The statement modifies the mode of the current database.)

Example 28.1

ALTER DATABASE SCOPED CONFIGURATION SET

BATCH_MODE_MEMORY_GRANT_FEEDBACK = ON;

2. Adaptive Join

The Adaptive Join feature enables the choice between the Hash Join operator and the Nested Loops operator to be deferred until after the first input has been scanned. In other words, the feature allows the query optimizer to choose dynamically between these two join processing techniques.

In a query execution plan, Adaptive Join is implemented as a new operator with the same name. This operator is an additional operator belonging to the group of join processing technique operators. The three already existing join operators are Nested Loops, usually used for joining a small data amount with a simple input; Merge Join, usually used for joining data streams that are already sorted by the clustered indices; and Hash Join, usually used for joining large, unsorted sets of rows.

The Adaptive Join operator has three inputs. The first input is an outer (build) input, the second is an input if the hash join technique is selected, and the third is an input if the nested loop method is picked. Additionally, the Adaptive Join operator defines a threshold that is used to decide when to switch to a query execution plan with nested loops. Your plan can therefore dynamically switch to a better join strategy during execution. Here is how it works:

  • If the number of selected rows of the join input is small enough, the nested loop join technique would be more optimal than the hash join technique and your execution plan switches to the former technique.
  • If the join input exceeds a specific row count threshold, no switch occurs and your plan continues with the hash join technique.

Example 28.2 provides a good starting point to explain the Adaptive Join feature and to introduce the Adaptive Join operator.

Example 28.2

USE AdventureWorks;

SELECT product.Name, COUNT(history.ProductlD) AS Cnt,

SUM(history.Quantity) AS Sum,

AVG(history.ActualCost) AS Avg

FROM Production.TransactionHistory AS history

JOIN Production.Product AS product

ON product.ProductID = history.ProductID

GROUP BY history.ProductID, product.Name;

Example 28.2 joins two tables, TransactionHistory and Product, from the Production schema of the AdventureWorks database and calculates three different aggregate functions (SUM, COUNT, and AVG) for different columns of the TransactionHistory table. As you can see from the execution plan of the query given in Figure 28-2, the query optimizer uses the hash join technique to join both tables. (The SQL Server’s Hash Match operator in Figure 28-2 corresponds to the hash join technique.)

To enable Adaptive Join, you have to

  1. Create a corresponding nonclustered columnstore index (see Example 28.3).
  2. Enable compatibility level 140 or higher. (Compatibility level 140 specifies SQL Server 2017, while 150 is the compatibility level of SQL Server 2019.)
  3. Enable Live Query Statistics by clicking the icon with the same name in the toolbar of SSMS. (Live Query Statistics is described in Chapter 20.)

Example 28.3 creates the columnstore index, which is necessary to enable the Adaptive Join feature.

Example 28.3

USE AdventureWorks;

CREATE NONCLUSTERED COLUMNSTORE INDEX csi_history

ON Production.TransactionHistory(ProductID,Quantity,ActualCost);

Now, if we execute the query in Example 28.2 again, we get the execution plan with the Adaptive Join operator, as shown in Figure 28-3.

2.1. Adaptive Join: Benefits and Disadvantages

Workloads with frequent oscillations between small and large join input scans will significantly benefit from the Adaptive Join feature. Additionally, if the Adaptive Join operator switches to Nested Loops, it uses the rows already read by Hash Join. In other words, the Adaptive Join operator does not reread the rows from the outer table.

On the other hand, the Adaptive Join operator introduces a higher memory requirement than an equivalent execution plan based on Nested Loops. The additional memory is requested as if the nested loops technique was Hash Join. There is also overhead for the build phase in relation to the corresponding Nested Loops operator.

NOTE You can disable Adaptive Join in a similar way as you disable Memory Grant Feedback, using the BATCH_MODE_ADAPTIVE_JOINS option. (For details, see Example 28.1.)

2.2. Interleaved Execution

As you already know from Chapter 8, there are two types of user-defined functions (UDFs) in the Database Engine:

  • Scalar functions
  • Table-valued functions

Table-valued functions have two forms: inline table-valued functions and multistatement table-valued functions (MSTVFs). An inline table-valued function can contain only one root SELECT statement that is used to describe its result. This is good from an optimization perspective, because the query optimizer may inline a function’s text into a query and can optimize a query as a whole.

MSTVFs can contain procedural language elements and may implement more complex logic. The benefit of MSTVFs is that they are more powerful for code reuse. On the other hand, they are more challenging in relation to the optimization process, because it is hard or impossible to estimate how many rows a function will return (i.e., estimate cardinality) before the query optimizer starts to execute the user-defined function.

Interleaved Execution is intended to correct cardinality estimation in MSTVFs. While selecting data from a table variable does not inhibit parallelism on its own, the low row estimates would often contribute to low query costs, where parallelism wouldn’t be considered.

With Interleaved Execution, cardinality estimation is paused, the subtree for the MSTVF is executed, and optimization is resumed with a more accurate cardinality estimate.

NOTE In SQL Server 2012 and earlier, each MSTVF had, during its execution, a fixed estimate of one row. Starting from SQL Server 2016, the estimate is still fixed but is increased up to 100 rows.

To show how Interleaved Execution works, Example 28.4 introduces an MSTVF.

Example 28.4

USE AdventureWorks;

GO

CREATE FUNCTION GetLastShipped()

RETURNS @CustomerOrder TABLE

(SaleOrderID  INT        NOT   NULL,

CustomerID    INT        NOT   NULL,

OrderDate    DATETIME    NOT   NULL,

OrderQty      INT        NOT   NULL)

AS

BEGIN

INSERT @CustomerOrder

SELECT a.SalesOrderID, a.CustomerID, a.OrderDate, b.OrderQty

FROM Sales.SalesOrderHeader a

INNER JOIN Sales.SalesOrderDetail b

ON a.SalesOrderID = b.SalesOrderID

INNER JOIN Production.Product c

ON b.ProductID = c.ProductID

WHERE a.OrderDate = ( Select Max(SH1.OrderDate)

FROM Sales.SalesOrderHeader As SH1

WHERE SH1.CustomerID = A.CustomerId)

RETURN

END

The GetLastShipped function created in Example 28.4 is an MSTVF because the body of the function is a complex INSERT … SELECT statement. The function inserts several rows in the table variable called @CustomerOrder. All rows of the new table are generated from the SalesOrderHeader and SalesOrderDetail tables of the Sales schema as well as the Product table of the Production schema of the AdventureWorks database.

Example 28.5 executes the GetLastShipped function using a query and retrieves the number of rows from the result set.

Example 28.5

USE AdventureWorks;

SELECT C = COUNT_BIG(*)

FROM GetLastShipped() C

Figure 28-4 shows the corresponding execution plan of the SELECT statement in Example 28.5. Take a close look at the Table Valued Function operator and its properties in the execution plan and you will see that the value of Estimated Number of Rows is 54750. This means that Interleaved Execution has been performed and the correct cardinality of rows in the result set has been estimated.

2.3. Benefits and Disadvantages of Interleaved Execution

The benefits of the Interleaved Execution feature are twofold. First, estimating the correct cardinality is one of the most important parameters for the query optimizer to generate an optimal execution plan for the particular query. Second, in versions of SQL Server prior to SQL Server 2016, the query optimizer of the Database Engine used a fixed estimate of rows (one), which usually led to inefficient query plans. (See the Note before Example 28.4.)

Using the ALTER DATABASE statement, I set the compatibility level to 130, which is the level of SQL Server 2016, and executed the query in Example 28.5 again. Figure 28-5 shows that, in this case, Estimated Number of Rows is set to the fixed value (100).

On the other hand, there is only one (minor) disadvantage of using the Interleaved Execution feature: as with any plan affecting changes, some plans could change such that with better cardinality for the subtree of the corresponding execution plan tree, the result is a worse plan for the query overall.

2.4. Enabling/Disabling Interleaved Execution

If you want to enable/disable only the Interleaved Execution feature, you can use either the ALTER DATABASE statement or the DISABLE_INTERLEAVED_EXECUTION_TV query hint. Example 28.6 shows how you can use the ALTER DATABASE statement to disable solely Interleaved Execution. (The statement modifies the mode of the current database.)

Example 28.6

ALTER DATABASE SCOPED CONFIGURATION SET

INTERLEAVED_EXECUTION_TV = ON;

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *