SQL Server Graph Databases: Querying Graph Data Using Relational Queries

This section demonstrates how Transact-SQL can be used to write queries over graphs. Graph query languages in general, and the query language of SQL Server Graph Databases in particular, are better suited to query connected data than Transact-SQL. In other words, query latency in a graph database is proportional to how much of the graph you choose to explore in a query, and is not proportional to the amount of data stored, which is the most important issue of Transact-SQL. The next five examples help to explain this.

First, Example 31.27 creates a relational table called Employee1, which has the same structure as the Employee node table (see Example 31.1).

Example 31.27

USE graph_db;

CREATE TABLE dbo.Employee1 (

ID INT NOT NULL PRIMARY KEY,

name VARCHAR (100) NULL,

sex char (10) NULL);

INSERT INTO Employee1 VALUES (1,’Matthew Smith’,’Male’);

INSERT INTO Employee1 VALUES (2,’Ann Jones’,’Female’);

INSERT INTO Employee1 VALUES (3,’John Barrimore’,’Male’);

INSERT INTO Employee1 VALUES (4,’James James’,’Male’);

INSERT INTO Employee1 VALUES (5,’Elsa Bertoni’,’Female’);

INSERT INTO Employee1 VALUES (6,’Elke Hansel’,’Female’);

The only syntactical difference between the Employee1 and the Employee table is that the former does not have the AS NODE option. (The six INSERT statements in Example 31.27 are identical to the corresponding statements for the Employee table in Example 31.2.)

To understand performance issues of performing queries over graphs in a relational database system such as the Database Engine, we will look at several examples concerning the Is_Liked relationship, introduced in the earlier section “Recursive Relationships.” To implement such a recursive relationship using relational tables, you have to create a new table, as shown in Example 31.28.

Example 31.28

USE graph_db;

CREATE TABLE Employee1_Friend

(EmployeelD INT NOT NULL,

FriendID INT);

INSERT INTO Employee1_Friend VALUES (1,2)

INSERT INTO Employee1_Friend VALUES (1,3)

INSERT INTO Employee1_Friend VALUES (1,4)

INSERT INTO Employee1_Friend VALUES (2,3)

INSERT INTO Employee1_Friend VALUES (2,5)

INSERT INTO Employee1_Friend VALUES (2,6)

INSERT INTO Employee1_Friend VALUES (3,4)

INSERT INTO Employee1_Friend VALUES (4,5)

INSERT INTO Employee1_Friend VALUES (4,6)

INSERT INTO Employee1_Friend VALUES (5,6)

The Employee1_Friend table has two columns. The first, EmployeelD, represents the ID of an employee, while the second, FriendID, is the ID of another employee who likes the employee listed in the EmployeelD column. Therefore, the first INSERT statement in Example 31.28 can be interpreted as the employee with ID = 2 (Ann Jones) likes the employee with ID = 1 (Matthew Smith).

The next three examples show how you can implement the queries from Examples 31.15, 31.16, and 31.17 using T-SQL. Example 31.29 implements the query from Example 31.15.

Example 31.29

— Corresponds to Example 31.15

USE graph_db;

SELECT E1.name

FROM Employee1 E1 JOIN Employee1_Friend

ON Employee1_Friend.FriendID = E1.ID

JOIN Employee1 E2

ON Employee1_Friend.EmployeeID = E2.ID

WHERE E2.name = ‘Matthew Smith’;

Concerning performance, the query in Example 31.29 is rather inexpensive because the number of qualified rows is restricted with the condition in the WHERE clause (E2.name = ‘Matthew Smith’). If you create an index for the name column of the Employeel table, the query in Example 31.29 will be executed fairly quickly.

As you already know from Example 31.15, a recursive relationship such as Is_Liked must not be reflexive, meaning that an employee can be liked by another one, but the former must not share the same feelings. For this reason, Examples 31.15 and 31.16 display the different result sets.

The relational implementation of Example 31.16 is given in Example 31.30 and the performance of this query is similar to the performance of Example 31.29. (The result of Example 31.30 does not contain any rows.)

Example 31.30

— Corresponds to Example 31.16

USE graph_db;

SELECT E1.name

FROM Employee1 E1 JOIN Employee1_Friend

ON Employee1_Friend.EmployeeID = E1.ID

JOIN Employee1 E2

ON Employee1_Friend.FriendID = E2.ID

WHERE E2.name = ‘Matthew Smith’;

Example 31.31, which answers the query “Find employees who like the employees who like Ann Jones,” is syntactically and computationally complex.

Example 31.31

— Corresponds to Example 31.17

USE graph_db;

SELECT E1.name AS EMP_Name, E2.name AS FriendOfFriend

FROM Employee1_Friend Ef1 JOIN Employee1 E1

ON Ef1.EmployeeID = E1.ID

JOIN Employee1_Friend Ef2 ON Ef2.EmployeeID = Ef1.FriendID

JOIN Employee1 E2 ON Ef2.FriendID = E2.ID

WHERE E1.name = ‘Ann Jones’ AND Ef2.FriendID <> E1.ID;

In Example 31.31, the number of JOIN operations is high and the WHERE clause contains the inequality (Ef2.FriendID <> E1.ID). As you already know from the section “Query Analysis” in Chapter 19, an expression with the NOT (<>) operator cannot be used by the optimizer as a search argument, and the only access the optimizer uses in this case is the table scan. Therefore, the execution of this query will be slow.

Things get more complex if you dig deeper into the graph with the Is_Liked relationship. Though it is possible to get an answer to the query given in Example 31.31 in a fairly short period of time, queries that extend to more degrees of liking will have very poor performance. Therefore, the higher the degree of liking, the poorer the performance.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *