SQL Server Machine Learning Services: Data Visualization

As previously mentioned, one of the most important advantages of the R language is that it enables you to visualize output data in many different (graphical) forms. In other words, the R language has powerful built-in functions, which are grouped together in different packages, to help you to create visualizations of your data in a form that best suits your needs.

There are two general ways to integrate R graphics with SQL Server:

  • Output a dataset by using the sp_execute_external_script system stored procedure and apply one of many R packages that support data visualization. In other words, you do data visualization. In other words, you do data visualization inside R.
  • Integrate R into a SQL Server tool, such as Power BI, and use its functions to visualize data.
  • The following sections describe data visualization in R and the integration of R in Power BI.

1. Data Visualization in R

The previous section discussed how data frames can be used to pass input values, store them in a data frame, manipulate them inside the script using different R functions, and finally return the output data in tabular form. In this section, Example 32.5 builds on Example 32.4 to demonstrate how to visualize the output data using a bar chart.

NOTE When executing Example 32.5, you might get an error similar to this:

Msg 39004, Level 16, State 20, Line 448 A ‘R’ script error occurred during execution of ‘sp_execute_external_script’ with HRESULT.

This error is related to the SQL Server Launchpad service. This service is used to start Advanced Analytics Extensions processes, which are necessary for integration of the R system and Python with the Database Engine. The program in Example 32.5 writes the file in the C:\temp directory, which is by default not accessible by the Launchpad service. You can either grant access to this directory, or use another directory; for instance, C:\RScripts.

To grant access to a directory, right-click that directory, select Grant Access To, and click Add Everyone.

Example 32.5

USE AdventureWorks;

GO

DECLARE @rscript NVARCHAR(MAX);

SET @rscript = N’

# Step 1: Import R packages

library(scales)

library(ggplot2)

# Step 2: Specify report file

file <- “C:\\temp\\Figure32_1.tif”

tiff(filename=file, width=1000, height=600)

# Step 3: Specify data frame

purchase <- InputDataSet

cl <- levels(purchase$Units)

c2 <- round(tapply(purchase$total, purchase$Units, sum))

purchasedf <- data.frame(c1, c2)

names(purchasedf) <- c(“Total”, “Units”)

# Step 4: Generate bar chart

barchart <- ggplot(purchasedf, aes(y=Units, x=Total)) +

labs(title=”Total Purchases per Unit Code”, x=”Unit Measure Codes”,

y=”Purchase Amounts”) +

geom_bar(stat=”identity”, color=”blue”, size=1, fill=”lightblue”)

print(barchart)

dev.off()’;

DECLARE @select NVARCHAR(MAX);

SET @select = N’

SELECT h.subtotal AS Total, v.UnitMeasureCode AS Units

FROM Purchasing.PurchaseOrderHeader h

INNER JOIN Purchasing.PurchaseOrderDetail d

ON h.PurchaseOrderID = d.PurchaseOrderID

INNER JOIN Purchasing.ProductVendor v

ON d.ProductID = v.ProductID’;

EXEC sp_execute_external_script

@language = N’R’,

@script = @rscript,

@input_data_1 = @select

WITH RESULT SETS NONE;

As you can see in Example 32.5, the visualization process involves four steps:

  1. Import R packages.
  2. Specify the report file.
  3. Specify the data frame.
  4. Generate a bar chart.

These steps and their corresponding code in Example 32.5 are discussed in the following sections, but first, Example 32.6 shows how you can check whether or not a particular R package is installed on your system, if you are not sure.

Example 32.6

EXEC sp_execute_external_script

@language = N’R’,

@script = N’library(scales)’

Example 32.6 checks whether the scales package is installed on your system. If the package is already installed, you get the message, “Commands completed successfully.” Otherwise, if some of the dependent packages fail, the system displays the name of the package you should install.

1.1. Step 1: Import R Packages

Generally, R packages are collections of functions in the R Services library. During the installation of R Services, SQL Server setup program adds a number of common R packages to your system, but you can install additional packages at any time. Once a package has been installed in the library, you can import it into your program using the library function.

There are several ways to install R packages to an instance of SQL Server. Which method you should choose depends on which version of SQL Server you have and whether or not your server has an Internet connection. The following three approaches are the most convenient ones:

  • Use conventional R package managers You can use standard R tools to install new packages on an instance of SQL Server. The only requirement is that you have administrator rights. The most popular standard tools are RGui, Rterm, and Rcmd.
  • Use RevoScaleR RevoScaleR is a machine learning package in R created by Microsoft. It is available as part of Machine Learning Services in SQL Server. The package contains functions for creating machine learning techniques, such as linear models, decision trees, and K-means clustering, in addition to functions for visualizing data.
  • Use the SQL statement CREATE EXTERNAL LIBRARY This statement makes it possible to add a package or set of packages to an instance or a specific database without running R code directly. In contrast to other methods, this method requires package preparation and additional database permissions.

This section describes the use of the R package manager called RGui. You need to have administrator rights to use the RGui package manager.

First, determine the location of the instance library and navigate to that location. The path for the SQL Server 2019 default instance library directory is as follows:

C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin\x64

Right-click Rgui.exe and select Run As Administrator. In the new console window that opens, type the following at the command prompt:

install.packages(“pacA:age-name”)

As indicated in Example 32.5, you’ll be installing the scales and ggplot2 packages. You can get both packages by installing the ggplot2 package. In other words, when you install this package, R Services downloads several additional packages, including the scales package. Therefore, replace package-name with ggplot2 as the parameter of the install.packages command.

After successful installation of this package, import the scales and ggplot2 packages into your script. As shown in Example 32.5, you do so with the library function:

library(scales)

library(ggplot2)

1.2. Step 2: Specify the Report File

For the bar chart that will be generated, you need to specify the image file that will hold it. As shown in Example 32.5, you do this in your program with the following code:

file <- “C:\\temp\\Figure32_1.tif”

tiff(filename=file, width=1000, height=600)

The first line specifies a string variable to hold the directory and filename of the image file and then uses the <- operator to assign the value to the file variable. (Note that you have to escape the backslashes in the path by doubling them.) The second line invokes the tiff device, which is used to create the .tif file for the corresponding bar chart. (The R language also allows you to create charts in other formats, such as .bmp and .png. For these formats, the corresponding devices are supported, too.) The first argument of the tiff device is filename, which contains the name of the file specified in the previous line of code. Additionally, you specify the width and height of the file, in pixels.

1.3. Step 3: Specify the Data Frame

Concerning the definition of the data frame, the only new line of code in Example 32.5 in relation to Example 32.4 is

names(purchasedf) <- c(“Total”, “Units”)

This statement uses the names function to assign names to the parameters of the purchasedf data frame. These names will be then used in the next step.

1.4. Step 4: Generate a Bar Chart

Several functions from the scales and ggplot2 packages are used to generate bar charts. As you will see in a moment, the main part of the code in this step is related to the bar chart definition. For this purpose, you use the barchart variable and assign different properties of the chart to it.

NOTE Microsoft R Services supports a lot of different possibilities to create graph types, such as histograms, box plots, and scatter plots. The description of all possibilities to plot results is out of the scope of this book and can be found in Microsoft Docs.

As shown in Example 32.5, the definition of all properties of the bar chart is made up of several elements, which are connected using the plus (+) sign. The first element uses the ggplot function to create the foundation for the bar chart:

ggplot(purchasedf, aes(y=Units, x=Total))

This function has two arguments. The first one, purchasedf, is the name of the dataset and delivers the data for the bar chart. The second argument uses the R aes function. This function specifies the aesthetic mappings. In other words, it describes how variables in the data are mapped to visual properties (aesthetics) of elements. The default values for aesthetic mappings are set in the ggplot function, but can be overwritten within a script. In our code, the aes function specifies the chart’s coordinates that are identical to the columns in the purchasedf dataset.

The second element in the definition of the bar chart properties uses the labs function to provide labels for the title and each axis of the bar chart. The meaning of the three parameters of the labs function is straightforward. The first parameter, title, specifies the name of the chart (“Total Purchases per Unit Code”), which is positioned above the chart area. The other two parameters, x and y, specify the titles of the X axis (“Unit Measure Codes”) and Y axis (“Purchase Amounts”). By using the labs function, you can override the default labels that are specified in the ggplot function.

The third element in the definition uses the geom_bar function to specify the intention to create a bar chart:

geom_bar(stat=”identity”, color=”blue”, size=1, fill=”lightblue”)

The function takes four arguments. The stat argument with the “identity” value ensures that data values map correctly to the chart points. The color argument sets the bar outlines to blue, the size argument sets the bar outlines to 1 point, and the fill argument sets the color of the bars to light blue.

NOTE Example 32.5 specifies the values of only a few arguments of the ggplot function. For all other arguments that are not specified, the system uses their default values.

The definition of the bar chart is now complete. The print function sends the bar chart to the .tif file and the dev.off function closes the tiff device. (Note that the last statement in Example 32.5, WITH RESULT SETS NONE, is necessary because there is no result set. In other words, the output is sent to a bar chart and not to a table.) Figure 32-1 shows the output of the R script in Example 32.5.

 

2. Integrate R in Power BI Desktop

Power BI Desktop allows users to create a personal BI environment by gathering data from different sources, loading that data in the data model, and visualizing the data in the same workspace. If you have not installed Power BI Desktop, go to https://powerbi.microsoft.com and download the .exe file. The installation is simple and straightforward.

NOTE Power BI Desktop does not include, deploy, or install the R engine. To run R scripts in Power BI Desktop, you must separately install R on your local computer or use the R engine installed as a part of SQL Server ML Services. See the Chapter 23 section “Power BI” for more information about Power BI Desktop.

To enable R visuals, make sure your local R installation is specified. To do this, open Power BI Desktop and select File | Options and Settings | Options. In the Options window, select R Scripting under Global to open the R Script Options window (see Figure 32-2). Select Other in the Detected R Home Directories field. In the Set an R Home Directory field, specify the path where R_Services is stored. As you can see from Figure 32-2, I use the R installation that is part of SQL Server Machine Learning Services. On my computer, it is stored under C:\Program Files\Microsoft SQL Server\MSSQL15.SQLServer\R_Services. Click OK.

In the next step, you load the data. In the main menu of Power BI Desktop, click the Get Data button. From the list of all possible data sources, choose SQL Server Database. The Connection window for SQL Server appears. Type the name of your server instance and the database name. Click OK. In the Impersonation page, choose the Use My Current Credentials radio button and click Connect.

After that, click Advanced Options to provide the Transact-SQL query. To visualize the output data of the same query of the AdventureWorks database used in Example 32.5, type or copy the following statement in the SQL Statement field, as shown in Figure 32-3:

Click OK to see the preview of the query. Then click Load on the preview window. Now, the system creates the result set called Query1. (You can change the name, if you wish.) The table name and the names of selected columns, Total and Units, appear in the Fields pane in the right part the window.

From the Visualizations pane on the right, click the R Script icon. The R visualization box appears (see Figure 32-4). Drag and drop the Total and Units columns from the Fields pane into the Values box.

Then, scroll down to the R Script Editor located in the lower half of the Power BI screen (the title of which is shown at the bottom of Figure 32-4) and enter the following R code:

# Paste or type your script code here:

library(ggplot2);

a <- qplot(y = Total, x = Units, data = dataset,

color = Units, facets = -Units,

main = “Total Purchases”);

a + scale_x_discrete(“Units”);

a + scale_y_continuous(label = scales::dollar);

Click the Run Script button (>) located on the right of the R Script Editor bar. Now, your Power BI report should look like Figure 32-5.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *