Intelligent Query Processing in SQL Server: Approximate Query Processing

The APPROX_COUNT_DISTINCT function returns the approximate number of rows that contain distinct values of the given parameter, which is usually a column name. This function provides an alternative to the COUNT(DISTINCT) function, which returns the exact number of rows that contain distinct values for the column. The APPROX_COUNT_DISTINCT function processes large amounts of data significantly faster than COUNT(DISTINCT), with deviation from the exact result up to 3 percent.

Executing the query in Example 28.10 shows the differences between the execution of the APPROX_COUNT_DISTINCT function and the execution of COUNT(DISTINCT).

Example 28.10

USE AdventureWorksDW;

SET STATISTICS TIME ON;

SELECT count(DISTINCT(SalesOrderNumber))

FROM FactInternetSales;

SELECT APPROX_COUNT_DISTINCT(SalesOrderNumber)

FROM FactInternetSales;

Results for COUNT(DISTINCT):

27659 rows

CPU time = 31 ms, elapsed time = 196 ms.

Results for APPROX_COUNT_DISTINCT:

27990 rows

CPU time = 63 ms, elapsed time = 60 ms.

As you can see from the elapsed times for executing the two functions, APPROX_COUNT DISTINCT executes more than three times faster than COUNT(DISTINCT). On the other hand, the deviation from the exact result is about 1 percent.

NOTE The idea behind the APPROX_COUNT_DISTINCT function is to provide aggregations across large data sets where response time is more critical than precision. This idea can be extended to the other aggregate functions as well. Therefore, we can hope that Microsoft will implement similar aggregation functions (APPROX_SUM and APPROX_COUNT, for instance) in a future version of SQL Server.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *