Optimizing Techniques for Data Warehousing with SQL Server: Star Join Optimization

As you already know from Chapter 22, the star schema is a general form for structuring data in a data warehouse. A star schema usually has one fact table, which is connected to several dimension tables. The fact table can have 100 million rows or more, while dimension tables are fairly small relative to the size of the corresponding fact table. Generally, in decision support queries, several dimension tables are joined with the corresponding fact table. The convenient way for the query optimizer to execute such queries is to join each of the dimension tables used in the query with the fact table, using the primary/foreign key relationship. Although this technique is the best one for numerous queries, significant performance gains can be achieved if the query optimizer uses special techniques for particular groups of queries. One such specific technique is called star join optimization.

Before you start to explore this technique, take a look at how the query optimizer executes a query in the traditional way, as shown in Example 26.10.

Example 26.10

USE AdventureWorksDW;

SELECT ProductAlternateKey

FROM FactInternetSales f JOIN DimDate t ON f.OrderDateKey = t.DateKey

JOIN DimProduct d ON d.ProductKey = f.ProductKey

WHERE CalendarYear BETWEEN 2013 AND 2014

AND ProductAlternateKey LIKE ‘BK%’

GROUP BY ProductAlternateKey, CalendarYear;

The execution plan of Example 26.10 is shown in Figure 26-1.

As you can see from the execution plan in Figure 26-1, the query joins first the FactlnternetSales fact table with the DimDate dimension table using the relationship between the primary key in the dimension table (DateKey) and the foreign key in the fact table (DateKey). After that, the second dimension table, DimProduct, is joined with the result of the previous join operation. Both join operations use the hash join method.

The use of the star join optimization technique will be explained using Example 26.11.

Example 26.11

USE AdventureWorksDW;

GO

SELECT F.ProductKey, F.CurrencyKey, D1.CurrencyName, D2.EndDate

FROM dbo.FactInternetSales AS F

JOIN dbo.DimCurrency AS D1 ON F.CurrencyKey = D1.CurrencyKey

JOIN dbo.DimProduct D2 ON F.ProductKey = D2.ProductKey

WHERE D1.CurrencyKey <= 12 AND D2.ListPrice > 50

OPTION (MAXDOP 32);

The query optimizer uses the star join optimization technique only when the fact table is very large in relation to the corresponding dimension tables. To ensure that the query optimizer would apply the star join optimization technique, I significantly enhanced the FactlnternetSales fact table from the AdventureWorksDW database. The original size of this table is approximately 64,000 rows, but for this example I created an additional 500,000 rows by generating random values for the ProductKey, SalesOrderNumber, and SalesOrderLineNo columns. Figure 26-2 shows the execution plan of Example 26.11.

The query optimizer detects that the star join optimization technique can be applied and evaluates the use of bitmap filters. (A bitmap filter is a small to midsize set of values that is used to filter data. Bitmap filters are always stored in memory.)

As you can see from the execution plan in Figure 26-2, the fact table is first scanned using the corresponding clustered index. After that the bitmap filters for both dimension tables are applied. (Their task is to filter out the rows from the fact table.) This has been done using the hash join technique. At the end, the significantly reduced sets of rows from both streams are joined together.

NOTE Do not confuse bitmap filters with bitmap indices! Bitmap indices are persistent structures used in BI as an alternative to B+-tree structures. The Database Engine doesn’t support bitmap indices.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *