SQL Server Machine Learning Services: Data Visualization with Python

As mentioned earlier in the chapter, one of the advantages of Python is that it supports a large number of open source libraries, which you can use for many different purposes. One of these libraries, matplotlib, provides extensive support to display data in different graphic forms, such as histograms, bar charts, and pie charts.

As with the R language, there are two general ways to integrate Python graphics with SQL Server:

  • Output a dataset by using the sp_execute_external_script system stored procedure and apply one of many Python packages that support data visualization. In other words, you do data visualization inside Python.
  • Integrate Python into a SQL Server tool, such as Power BI, and use its functions to visualize data.

NOTE Data visualization using Power BI in relation to Python is very similar to the same process using the R language. Please read the section “Integrate R in Power BI Desktop” in Chapter 32 for full details.

Example 33.5 uses the result set generated in Example 33.4 and displays it graphically using a bar chart.

NOTE When executing Example 32.5, you might get an error similar to this:

Msg 39004, Level 16, State 20, Line 448 A ‘R’ script error occurred during execution of ‘sp_ execute_external_script’ with HRESULT.

This error is related to the SQL Server Launchpad service. This service is used to start Advanced Analytics Extensions processes, which are necessary for integration of the R system and Python with the Database Engine. The program in Example 33.5 writes the file in the C:\temp directory, which is by default not accessible by the Launchpad service. You can either grant access to this directory or use another directory; for instance, C:\PythonScripts. To grant access to a directory, right-click that directory, select Grant Access To, and click Add Everyone. This also applies later to Example 33.8.

Example 33.5

Use AdventureWorks;

DECLARE @pscript NVARCHAR(MAX);

SET @pscript = N’

import matplotlib

matplotlib.use(“PDF”)

import matplotlib.pyplot as plt

df1 = InputDataSet

df2 = df1.groupby(“Units”, as_index=True).sum()

pt = df2.plot.barh()

# Set title

pt.set_title (label= “Total Purchases per Unit Code”, y=1.1)

# Set labels for x and y axes

pt.set_xlabel(“Purchase Amounts”)

pt.set_ylabel(“Unit Measure Codes”)

# Set names for all items of Unit Code

pt.set_yticklabels (labels=df2.index, fontsize=8, color= “green”)

# Save bar chart to .pdf file

plt.savefig(“c:\\temp\\Figure33_1.pdf”, bbox_inches= “tight”)’;

DECLARE @sql NVARCHAR(MAX);

SET @sql = N’

SELECT CAST(h.subtotal AS FLOAT) AS Total, v.UnitMeasureCode AS Units

FROM Purchasing.PurchaseOrderHeader h

INNER JOIN Purchasing.PurchaseOrderDetail d

ON h.PurchaseOrderID = d.PurchaseOrderID

INNER JOIN Purchasing.ProductVendor v

ON d.ProductID=v.ProductID’;

EXEC sp_execute_external_script

@language = N’Python’,

@script = @pscript,

@input_data_1 = @sql;

GO

First, the following lines of code from Example 33.5 are related to the visualization of data:

import matplotlib

matplotlib.use(“PDF”)

import matplotlib.pyplot as plt

The first line imports the matplotlib library into the script. The second line applies the use function to the library to specify the format of the output file. This example uses the .pdf format to store the bar chart, but in practice you can choose among the many formats that are supported. The third line of the code imports the pyplot plotting framework that is included in the matplotlib library. You can use this framework when you intend to do simple plotting.

After executing the preceding three lines of code, all necessary information regarding the libraries is provided, and you can start setting up the data frame used in the script, as previously described in relation to Example 33.4.

After setup of the data frame, the following lines of code describe several properties of the bar chart:

pt = df2.plot.barh()

pt.set_title (label= “Total Purchases per Unit Code”, y=1.1)

pt.set_xlabel(“Purchase Amounts”)

pt.set_ylabel(“Unit Measure Codes”)

First, the barh function uses the modified input data and generates a graphic that contains a bar chart, which is then saved to the pt variable. After that, several properties of the bar chart are specified using subplots. One of the important features of the matplotlib library is that you can add multiple plots, called subplots, within a graphic. These subplots are helpful when you want to show different data presentations in a single view. So, the final three lines of code are regarded as the definition of subplots: the set_title function specifies the title of the bar chart, while set_xlabel and set_ylabel define the names of the X and Y coordinates, respectively. The second parameter of the set_title function specifies the font size.

After the properties of the bar chart are configured, the following line of code sets the tick labels in the bar chart:

pt.set_yticklabels (labels=df2.index, fontsize = 8, color= “green”)

The set_yticklabels function is used to adjust the tick labels in relation to the Y axis. The first argument of the function, labels, specifies which column will be used as the index of the result. Therefore, labels = df2.index specifies that the values of the UnitMeasureCode column will be used as the names of the tick labels on the Y axis in the bar chart.

NOTE The labels parameter is related to the as_index option of the groupby function (see Example 33.3). The use of the labels parameter implicitly specifies that an existing column will be the index of the output. Therefore, this is equivalent to the specification of as_index=True.

The next line of code in Example 33,

plt.savefig(“c:\\temp\\Figure33_1.pdf”, bbox_inches= “tight”)’;

uses the savefig function to store the graphic on a disk. (This function belongs to the pyplot framework.) The first parameter of the function specifies the file in which the corresponding bar chart will be stored. (Note that you have to escape the backslashes in the file path by doubling them and, as mentioned earlier, change the path name if your system does not have the C:\temp directory.)

The second argument of the savefig function, bbox_inches, is used to control how many whitespace characters are generated around the displayed graphics. The default value for bbox_inches is None, meaning that no action will take place. (The other meaningful value is tight, which removes unnecessary whitespace characters.) Figure 33-1 shows the output of Example 33.5.

NOTE If you want to create a visualization other than a bar chart, simply call another function by specifying the data frame and the name of the new function. For instance, if you want to display the data in Example 33.5 as a histogram, use the hist function instead of the barh function. 

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *