SQL Server Analysis Services: Multidimensional Model

This section shows you how to create and process a multidimensional cube. First, though, it is important to understand the terminology related to the Multidimensional model and to be familiar with the properties used to create and process a cube. These topics are addressed in turn next.

1. Multidimensional Model Terminology

The following are the most important terms in relation to the Multidimensional model:

  • Cube
  • Dimension
  • Member
  • Hierarchy
  • Cell
  • Level
  • Measure group
  • Partition

A cube is a multidimensional structure that contains all or part of the data from a data warehouse. Although the term “cube” implies three dimensions, a multidimensional cube generally can have many more dimensions. Each cube contains all other components in the preceding list.

A dimension is a set of logically related attributes (stored together in a dimension table) that closely describes measures (stored in the fact table). For instance, Time, Product, and Customer are the typical dimensions that are part of many BI applications. These three dimensions from the AdventureWorksDW database are used in the example in the following section that demonstrates how to create and process a multidimensional cube.

NOTE One important dimension of a cube is the Measures dimension, which includes all measures defined in the fact table.

Each discrete value in a dimension is called a member. For instance, the members of a Product dimension could be Computers, Disks, and CPUs. Each member can be calculated, meaning that its value is calculated at run time using an expression that is specified during the definition of the member. (Because calculated members are not stored on the disk, they allow you to add new members without increasing the size of a corresponding cube.)

Hierarchies specify groupings of multiple members within each dimension. They are used to refine queries concerning data analysis.

Cells are parts of a multidimensional cube that are identified by coordinates (x-, y-, and z-coordinates, if the cube is three-dimensional). This means that a cell is a set containing members from each dimension. For instance, consider the three-dimensional cube in Chapter 22 (see Figure 22-3) that represents car sales for a single region within a quarter. The cells with the following coordinates belong, among others, to the cube:

  • First quarter, South America, Falcon
  • Third quarter, Asia, Eagle

When you define hierarchies, you define them in terms of their levels. In other words, levels describe the hierarchy from the highest (most summarized) level to the lowest (most detailed) level of data. The following list displays the possible hierarchy levels for the Time dimension:

  • Quarter (Q1, Q2, Q3, Q4)
  • Month (January, February, …)
  • Day (Dayl, Day2, …)

As you already know from Chapter 22, measures are numerical values, such as price or quantity, that appear in a fact table but do not build its primary key. A measure group is a set of measures that together build a logical unit for business purposes. Each measure group is built on the fly, using corresponding metadata information.

A cube can be divided into one or more partitions. Partitions are used by SSAS to manage and store data and aggregations for a measure group in a cube. Every measure group has at least one partition, which is created when the measure group is defined. Partitions are a powerful and flexible means of managing large cubes.

2. Creating a New Project Using Visual Studio 2019

As previously noted, with Visual Studio 2019 (VS 2019), the required functionality to enable Analysis Services projects has moved into the respective Visual Studio extensions. Therefore, you need Visual Studio to create and process multidimensional cubes.

If you have a license for Visual Studio 2019, just install the product. If you don’t already have a license, you can install Visual Studio Community from https://visualstudio.microsoft.com/ downloads. (Visual Studio Community is the most basic edition of Visual Studio and is available free of charge.)

After installation of Visual Studio 2019, you need to install SQL Server Analysis Services extensions for VS 2019. To do so, go to https://marketplace.visualstudio.com/items? itemName=ProBITools.MicrosoftAnalysisServicesModelingProjects.

Click Download to download the corresponding MS Visual Studio extension file (.vsix). Click the file to start the VSIX installer.

Now you have all necessary tools to create Analysis Services projects. Start Visual Studio 2019 and click Create a New Project (see Figure 23-2). On the Create a New Project page, choose Analysis Services Multidimensional and Data Mining Project (see Figure 23-3) and click Next.

The Configure Your New Project page appears. In the Name text box, type the name of the project. (I named my project SSAS_Project1.) In the Location drop-down list box, type or select the folder in which to store the files for the project, or click the Browse button (…) to select a folder. In the Solution Name text box, type the name of your solution (the default name of the solution is identical to the name of the created project).

NOTE The new project is always created in a solution. A solution is the largest management unit and comprises one or more projects.

Click Create, and the Visual Studio initial window for the new project opens. In the right pane, you can see the folders for the tasks, which are discussed in the next section.

3. Creating and Processing of a Multidimensional Cube

Now that you know how to create a BI project and configure it, I will show you how to perform the following steps that are necessary to create and process a multidimensional cube using Visual Studio:

  1. Identify data sources.
  2. Specify data source views.
  3. Create a cube.
  4. Design storage aggregation.
  5. Deploy and process the cube.
  6. Browse the cube.

The following subsections describe these steps.

3.1. Identifying Data Sources

To identify data sources in Visual Studio 2019, right-click the Data Sources folder in the right pane under your project and choose New Data Source. The Data Source Wizard appears, which guides you through the process of creating a data source. (This example uses the AdventureWorksDW database as the data source.) Click Next on the Welcome to the Data Source Wizard page.

First, on the Select How to Define the Connection page, make sure that the Create a Data Source Based on an Existing or New Connection radio button is activated and click New. In the Connection Manager dialog box (see Figure 23-4), select Native OLE DB/SQL Server Native Client 11.0 in the Provider drop-down list, select the name of your database server in the Server Name drop-down list, and select Windows Authentication in the Authentication drop-down list. In the Connect to a Database section, with the Select or Enter a Database Name radio button enabled, choose the AdventureWorksDW database from the drop-down list. Click the Test Connection button to test the connection to the database. If the test is successful, click OK.

The next dialog box that appears could be either Select How to Define the Connection or Impersonation Information. The Select How to Define the Connection dialog box appears only when you connect to the data source for the first time. After selection, click Next.

The next step of the wizard is the Impersonation Information page. These settings determine which user account SSAS uses when connecting to the underlying source of data using Windows authentication. Which setting is appropriate depends on how this data source is being used. Click the Use a Specific Windows User Name and Password radio button and type your username and password in the corresponding fields. Click Next.

Finally, on the Completing the Wizard page, give the new data source a name (for this example, call it BI_Source) and click Finish. The new data source appears in the Solution Explorer pane in the Data Sources folder.

After identifying data sources in general, you have to determine which data in particular you want to select from the data source. In our example, it means that you have to select tables from the AdventureWorksDW database, which will be used to build a cube. This step involves specifying data source views, discussed next.

3.2. Specifying Data Source Views

To define data source views, right-click the Data Source Views folder under your project and choose New Data Source View. The Data Source View Wizard guides you through the steps that are necessary to create a data source view. (This example creates a view called BI_View1, which is based on the Customer and Project tables of the AdventureWorksDW database.) Click Next.

First, on the Select a Data Source page, select the BI_Source data source and click Next. (The name of this data source should appear in the left pane.) On the next wizard page, Select Tables and Views, choose tables that belong to your cube either as dimension tables or fact tables. To choose a table, select its name in the Available Objects pane and click the > button to move it to the Included Objects pane. For this example, choose the tables for customers and products (DimCustomer and DimProduct, respectively) in the AdventureWorksDW database. These tables will be used to build cube dimensions. They build the set of dimension tables used for your star schema.

Next, on the same wizard page, you need to specify one or more fact tables that correspond to the already selected dimension tables. (One fact table, together with the corresponding dimension tables, creates a star schema.) To do so, click the Add Related Tables button below the Included Objects pane. This instructs the system to find tables that are related to the DimCustomer and DimProduct tables. (To find related tables, the system searches all primary key/foreign key relationships that exist in the database for these two dimension tables.)

The system finds several fact tables and adds them to the Included Objects pane. Of these tables, you need only one, FactInternetSales, to build the star schema. Besides the corresponding fact tables, the system also searches for other tables that are created separately for a hierarchy level of the corresponding dimension. One such table to keep is DimProductSubcategory, which incarnates a hierarchy level called Subcategory of the Product dimension. Also keep the DimDate table, because the Time dimension is almost always a part of a cube. (The name of the schema is BI_View1.)

Thus, for the BI_View1 star schema, you need the following five tables (as shown in Figure 23-5):

  • DimCustomer
  • DimProduct
  • FactInternetSales
  • DimProductSubcategory
  • DimDate

Exclude all other system-chosen tables that appear in the right pane by selecting them and clicking the < button. After restructuring the tables, click Next. On the Completing the Wizard page, choose the name for the new data source view and click Finish.

After you click Finish, double-click the BI_View1 view in the right pane to display a graphical representation of the tables in the data schema you have defined, as shown in Figure 23-6. (The graphical representation is done with a tool called Data Source View Designer.)

NOTE Using drag and drop, I changed the design of the data source view in Figure 23-6 so that the tables have the convenient form of the star schema. Notice that the fact table is in the middle and the corresponding dimension tables build the circle around it. (Figure 23-6 actually has the form of a snowflake schema, because the DimProductSubcategory table presents the hierarchy level of the Product dimension.)

Data Source View Designer offers several useful functions. One of them is to view the data in individual tables. To do this, right-click a table and choose Explore Data. The content of the table appears in a separate window.

3.3. Creating a Cube

Before you create a cube, you must specify one or more data sources and create a data source view, as previously described. After that, you can use the Cube Wizard to create a cube.

To create a cube, right-click the Cubes folder under your project and choose New Cube. The welcome page of the Cube Wizard appears. Click Next. On the Select Creation Method page, choose Use Existing Tables, because the data source view exists and can be used to build a cube. Click Next.

On the Select Measure Group Tables page, you select measures from the fact table(s). Therefore, select only the fact table, Fact Internet Sales, and click Next. The wizard chooses all possible measures from the selected fact table and presents them on the Select Measures page. Check only the Total Product Cost column of the Fact Internet Sales table as the single measure (see Figure 23-7). Click Next.

On the Select New Dimensions page, select all three dimensions (Dim Date, Dim Product, and Dim Customer) to be created, based on the available tables. (If the Fact Internet Sales table was also selected, unselect that table.) The final page, Completing the Wizard, shows the summary of all selected measures and dimensions. Click Finish to finish creating the cube called BI_Cube.

3.4. Designing Storage Aggregation

As you already know from Chapter 22, basic data from the fact table can be summarized in advance and stored in persistent tables. This process is called aggregation, and it can significantly enhance the response time of queries, because scanning millions of rows to calculate the aggregation on the fly can take a very long time.

There is a tradeoff between storage requirements and the percentage of possible aggregations that are calculated and stored. Creating all possible aggregations in a cube and storing all of them on the disk results in the fastest possible response time for all queries, because the response to each query is almost immediate. The disadvantage of this approach is that the storage and processing time required for the aggregations can be substantial.

On the other hand, if no aggregations are calculated and stored, you do not need any additional disk storage, but response time for queries concerning aggregate functions will be slow because each aggregate has to be calculated on the fly.

SSAS provides the Aggregation Design Wizard to help you design aggregations optimally. To start the wizard, you have first to start the Cube Designer. (The Cube Designer is used to edit various properties of an existing cube, including the measure groups and measures, cube dimensions, and dimension relationships.) To start it, right-click the cube in Solution Explorer and select View Designer from the context menu. Now, click the Aggregations tab in the main menu of the Cube Designer. In the table that appears in the Cube Designer (Fact Internet Sales), right-click the cell under the Aggregations column and choose Design Aggregations. That starts the Aggregation Design Wizard.

In the first step of the wizard, Review Aggregation Usage, you review aggregation usage settings. In this step, you can include or exclude the attributes that appear on the page. Leave the settings as they are and click Next.

The next step is to specify the number of members in each attribute. You do this on the Specify Object Counts page. For each selected cube object, you have to enter the estimated count value or partition count value, before the wizard starts to create and store the selected aggregations. If you click the Count button, the wizard automatically performs the object counts and displays the obtained counts. Click Next.

In the second-to-last step, the Set Aggregation Options page, choose one of the four options to specify up to what point (or not at all) aggregations should be designed:

  • Estimated storage reaches_MB Specifies the maximum amount of disk storage that should be used for precomputed aggregations. The larger the amount, the more precomputed aggregations that will be created.
  • Performance gain reaches_% Specifies the performance gain that you want to achieve. The higher the percentage of precomputed aggregations, the better the performance.
  • I click Stop Enables you to decide when to stop the design process.
  • Do not design aggregation (0%) Specifies that no precomputed aggregations should be created.

NOTE Generally, you should choose one of the first two alternatives. I prefer the second one, because it is very difficult to estimate the amount of storage for different star schemas and different sets of queries. A value between 65 percent and 90 percent is optimal in most cases.

3.5. Deploying and Processing the Cube

You have to complete two additional steps before you can use a cube that you have created. The first step is to deploy the cube to the Analysis Services database. The second step is to process already deployed structures in the database. The following two subsections give the detailed description of both steps.

Deploying the Cube As you already know, you can create a cube either offline or online. Creating a cube offline means that all steps necessary for its creation are done without a connection to the Analysis Services server. (All steps described up to this point in the chapter can be executed offline.)

Deploying a cube means that all elements of the cube are moved from a “front-end” environment (in this case Visual Studio) to the server environment. The elements of the cube reside on the Analysis Services server as a part of a database. The name of the database is specified in the project definition. If the specified database does not exist, the Analysis Services server creates it.

Although the cube structures are copied to the server during the deployment process, the contents of dimensions and measures are not stored on the server side. To do this, you have to process the cube, which is explained next.

NOTE You don’t have to start the deployment process explicitly. When you process your cube, the system will ask you whether the cube should be deployed.

Processing the Cube If you chose the Save the Aggregations But Do Not Process Them option as the final step in the preceding section and deploy the cube to the SSAS database, you have still to process the cube.

NOTE A cube must be processed when you first create it and each time you modify it. If a cube has a lot of data and precomputed aggregations, processing the cube can be very time consuming.

To process the cube, right-click the name of your cube in the Cubes folder of Solution Explorer and select Process. The system starts processing the cube and displays the progress of this activity. To end the activity, just close the active dialog box(es).

3.6. Browsing the Cube

To browse a cube, right-click the cube name (in the Cubes folder of Solution Explorer) and choose Browse. The Browse view appears. You can add any dimension to the query by right­clicking the dimension name in the left pane and choosing Add to Query. You can also add a measure from that pane in the same way. (Adding measures first is recommended.) Figure 23-9 shows the tabular representation of the total product costs for Internet sales for different customers and products.

The approach is different if you want to calculate values of measures for particular dimensions and their hierarchies. For example, suppose that you want to deliver for customers with customer IDs 11008 and 11015 total product costs for all products they have ordered. In this case, you first drag and drop the measure (Total Product Cost) from the left pane into the editing pane, and then you choose values in the pane above it to restrict the conditions for each dimension (see Figure 23-10). First, in the Dimension column, choose the Dim Customer table, and in the Hierarchy column, choose the primary key of this table (Customer Key). In the Operator column, choose Equal, and in the Filter Expression column, choose both values 11008 and 11015, one after the other.

In the same way, choose the conditions for the Dim Product dimension table. The only difference is that all product values should be included. For this reason, in the Filter Expression column you should choose the root of the dimension. (The root of each dimension is specified by All.)

4. Delivering Data from the Multidimensional Model

Now that you have seen how to build and browse multidimensional cubes using Analysis Services, you are ready to learn how to retrieve data from a cube and deliver it to users using other interfaces. (The primary goal of Visual Studio is to develop BI projects, not to retrieve and deliver data to users.) For this task, there are many other APIs, such as:

  • Power Pivot for Excel
  • Multidimensional Expressions (MDX)
  • OLE DB for OLAP
  • NET
  • Third-party tools

MDX is the only interface that I will discuss in detail, in the following subsection; the other interfaces in the list are briefly described here. (The reason I give more attention to MDX than the other interfaces is that MDX is a tool that is used by many third-party SSAS solutions.)

Power Pivot for Excel is a tool that allows you to analyze data using what is probably the most popular Microsoft tool for such purpose: Microsoft Excel. It is a user-friendly way to perform data analysis using features such as PivotTable, PivotChart views, and slices. The use of Power Pivot for Excel for the Multidimensional model is similar to the use of the same “front end” for the Tabular model, as you will learn in the section “Power Pivot for Excel” later in this chapter.

OLE DB for OLAP is an industry standard for multidimensional data processing, published by Microsoft. It is a set of entities and interfaces that extends the ability of OLE DB to provide access to multidimensional data stores. OLE DB for OLAP enables users to perform data analysis through interactive access to a variety of possible views of the underlying data. Many independent software vendors use the specification of OLE DB for OLAP to implement different interfaces that allow users to access cubes created by SSAS. Additionally, using OLE DB for OLAP, vendors can implement OLAP applications that can uniformly access both relational and nonrelational data stored in diverse information sources, regardless of location or type.

ADOMD (ActiveX Data Objects Multidimensional) is a Microsoft .NET Framework data provider that is designed to communicate with SSAS. With this interface, you can access and manipulate objects in a multidimensional cube, enabling web-based OLAP application development. This interface uses the XML for Analysis protocol to communicate with analytical data sources. Commands are usually sent in MDX. By using ADOMD.NET, you can also view and work with metadata.

After the description of MDX, I will describe Power BI Desktop, which comprises several Microsoft tools, such as Power Pivot for Excel.

4.1. Multidimensional Expressions (MDX)

NOTE The material in this section is complex. Therefore, you might want to skip it on the first reading of the book.

Multidimensional Expressions (MDX) is a language that you can use to query multidimensional data stored in OLAP cubes. (MDX can also be used to create cubes.) In MDX, the SELECT statement specifies a result set that contains a subset of multidimensional data that has been returned from a cube. To specify a result set, an MDX query must contain the following information:

  • One or more axes that you use to specify the result set. (An axis in the SELECT statement corresponds to a dimension of the cube.) You can specify up to 128 axes in an MDX query. You use the ON COLUMNS clause to specify the first axis and the ON ROWS clause to specify the second axis. If you have more than two axes, the alternative syntax is to use numbers: ON AXIS(0) for the first axis, ON AXIS(1) for the second one, and so on.
  • The set of members or tuples to include on each axis of the MDX query. This is written in the SELECT list.
  • The name of the cube that sets the context of the MDX query, specified in the FROM clause of the query.
  • The set of members or tuples to include on the “slicer axis” specified in the WHERE clause (see Examples 23.1 and 23.2).

NOTE The semantic meaning of the WHERE clause in SQL is different from its semantic meaning in MDX. In SQL it means filtering of rows by specified criteria. The WHERE clause in MDX means slicing the cube. While these concepts are somewhat similar, they are not equivalent.

Example 23.1 will be used to explain the syntax of the language. You can execute your MDX queries directly in SQL Server Management Studio.

To establish the connection to the server, click the database (in this case, the BI_Cube cube) under the corresponding SSAS server instance. Click the MDX icon in the menu of Management Studio. After that, the MDX Query Editor opens, which you use to design and execute statements and scripts written in the MDX language. First, type scripts in the Query Editor pane. Then, to execute the scripts, press F5 or click Execute on the toolbar.

Example 23.1

Display for each customer total product costs that are due on May 15, 2013:

SELECT [Measures].MEMBERS ON COLUMNS,

[Dim Customer].[Customer Key].MEMBERS ON ROWS

FROM BI_Cube

WHERE ([Due Date].[Date Key].[20130515])

Example 23.1 queries data from the BI_Cube cube. The SELECT list of the first query axis displays all members of the Measures dimension. In other words, it displays the values of the Total Product Costs column, because the only existing measure in this cube is Total Product Costs. The second query axis displays all members of the Customer Key column of the Dim Customer dimension.

The FROM clause indicates that the data source in this case is the BI_Cube cube. The WHERE clause “slices” through the Due Date dimension according to the key values using the single date value 2013/05/15.

Example 23.2 shows another MDX query.

Example 23.2

Calculate for the customer with the customer key 11005 and for the product with the product key 562 the total product costs that are due in the period between 2013/05/13 and 2013/05/15:

SELECT [Measures].MEMBERS ON COLUMNS

FROM BI_Cube

WHERE ({[Due Date].[Date Key].[20130513]:[Due Date].[Date key].[20130515]},

[Dim Customer].[Customer Key].[11005],

[Dim Product].[Product Key].[562])

The SELECT list in the query in Example 23.2 contains only the members of the Measures dimension. For this reason, the query displays the value of the Total Product Costs column. The WHERE clause in Example 23.2 is more complex than in Example 23.1. First, there are three slices, which are separated using commas. Only one member of the Customer dimension and one member of the Product dimension are used for slicing, while from the Due Date dimension, the dates from 2013/05/13 through 2013/05/15 are sliced. (As you can see from the query, a colon is used to specify a range of dates.)

NOTE This section provides only a concise description of MDX. Refer to Microsoft Docs to learn more about this language.

4.2. Power BI Desktop

Power BI Desktop is an environment that comprises several Microsoft tools that enable you to build advanced queries, models, and reports that visualize data. With Power BI Desktop, you can build data models, create reports, and share your work by publishing to the Power BI service. Power BI Desktop is a free download.

NOTE This section gives you just a concise description of Power BI Desktop. See the section “Data Visualization in R” in Chapter 32 for a demonstration of how you can use this environment.

Microsoft developed Power BI Desktop by integrating the following components:

  • Power Pivot for Excel An environment whose main benefit is that it makes design easy.
  • Power View An add-in to Excel that enables you to visualize data. The main benefit of Power View is that you can create presentation-ready visualizations from the beginning.
  • Power Query Provides a way to retrieve data from a variety of data sources and cleanse the data before the load process.

Therefore, Power BI Desktop allows users to create a personal BI environment by gathering data from different sources, loading the data in the data model, and visualizing the data in the same workspace.

You can publish the content created in Power BI Desktop to PowerBI.com. First, you have to create an account on that site. After that, you can explore the content in many different ways. For instance, you can use existing visualizations to choose the one that best suits your purposes. Also, you can manipulate the model and create new visualizations.

NOTE Access to PowerBI.com is free, but there are certain limitations, published on the PowerBI.com site.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *