SQL Server Graph Databases: Creating Node Tables and Edge Tables

As you will see in the following examples, the creation of node tables and edge tables is straightforward. In other words, if you know the syntax of the CREATE TABLE statement for regular relational tables, it will be very easy for you to create node tables and edge tables. All examples that follow use the model of a graph shown in Figure 31-1.

The graph in Figure 31-1 contains three nodes: Employee, Company, and City. The Employee entity has four attributes (properties): ID, Name, Age, and Sex. The Company entity has four attributes: ID, Name, Sector, and City, while City has three attributes: ID, Cityname, and Statename.

The graph contains three edges: WorksIn, LocatedIn, and LivesIn. WorksIn has a property called Starts that specifies the year in which the particular employee started to work for the specified company. Similarly, the LivesIn edge has the Since property that specifies the date on which an employee moved to a particular city. (The third edge does not have any additional properties.) All three relationships are nonrecursive, meaning that they all connect two different nodes.

1. Creating Node Tables

A node table can be created in any user-defined database. The creation of such a table is very similar to creating a regular relational table, with one extension. Example 31.1 shows the creation of three node tables, corresponding to the entities shown in Figure 31-1.

Example 31.1

CREATE DATABASE graph_db;

GO;

USE graph_db;

CREATE TABLE dbo.Company (

ID INT NOT NULL PRIMARY KEY,

name VARCHAR (100) NULL,

sector VARCHAR(25) NULL,

city VARCHAR (100) NULL) AS NODE;

CREATE TABLE dbo.Employee (

ID INT NOT NULL PRIMARY KEY,

name VARCHAR (100) NULL,

age INT NULL,

sex char (10) NULL) AS NODE;

CREATE TABLE dbo.City (

ID INT NOT NULL PRIMARY KEY,

name VARCHAR(100) NULL,

stateName VARCHAR(100) NULL) AS NODE; 

The most important extension concerning node tables is the AS NODE clause, written at the end of the CREATE TABLE statement. This clause defines the corresponding table as a node table. When you specify this clause, the system adds two new columns of the BIT data type to the sys.tables catalog view: is_node and is_edge. For a node table, the value of is_node is set to 1, and the value of is_edge is set to 0. (A detailed description of metadata concerning graph databases is provided in the section “Editing Information Concerning Graph Databases” at the end of this chapter.)

Whenever you create a node table, along with the user-defined columns, an implicit $node id column is created, which uniquely identifies each instance of the corresponding node table. The values in the $node_id column are automatically generated and are a combination of the object_id value of that node table and an internally generated value of the BIGINT data type. (When you display the values of the $node_id column, the corresponding computed values are displayed as JSON strings.) Also, $node_id is a pseudo column that maps to an internal name with a hex string appended to it. In other words, when you select $node_id from the table, the column name appears as $node_id_\hex_string.

After creation of node tables, you have to load data into them. As Example 31.2 shows, inserting rows into node tables works the same way as for any other regular table.

Example 31.2

USE graph_db;

INSERT INTO Employee (ID,Name,Sex)

VALUES (1,’Matthew Smith’,’Male’);

INSERT INTO Employee (ID,Name,Sex)

VALUES (2,’Ann Jones’,’Female’);

INSERT INTO Employee (ID,Name,Sex)

VALUES (3,’John Barrimore’,’Male’);

INSERT INTO Employee (ID,Name,Sex)

VALUES (4,’James James’,’Male’);

INSERT INTO Employee (ID,Name,Sex)

VALUES (5,’Elsa Bertoni’,’Female’);

INSERT INTO Employee (ID,Name,Sex)

VALUES (6,’Elke Hansel’,’Female’);

 

INSERT INTO Company VALUES (1,’Comp_A’,’Pharma’,’Kansas City’);

INSERT INTO Company VALUES (2,’Comp_B’,’Manufacturing’,’Hoboken’);

INSERT INTO Company VALUES (3,’Comp_C’,’Pharma’,’Indianopolis’);

INSERT INTO Company VALUES (4,’Comp_D’,’IT’,’Lexington’);

INSERT INTO Company VALUES (5,’Comp_E’,’IT’,’Madison’);

 

INSERT INTO City VALUES (1,’Kansas City’,’Kansas’);

INSERT INTO City VALUES (2,’Hoboken’,’New Jersey’);

INSERT INTO City VALUES (3,’Indianopolis’,’Indiana’);

INSERT INTO City VALUES (4,’Lexington’,’Kentucky’);

INSERT INTO City VALUES (5,’Minneapolis’,’Wisconsin’);

INSERT INTO City VALUES (6,’Madison’,’Wisconsin’);

2. Creating Edge Tables

An edge table represents a relationship between two graph nodes. Therefore, each row of an edge table contains instances of the corresponding relationship. An edge table has, among others, three hidden columns: $edge_id, $from_id, and $to_id. The values of the $edge_id column specify the unique IDs and are stored as JSON documents. (For the description of JSON, see Chapter 29.) The other two columns represent the references between the instances of the relationship—that is, the rows of the edge table to the instances of both connected node tables. The $from_id column stores the $node_id values of the nodes from which the edges originate, and the $to_id column stores the $node_id values of the nodes at which the edges terminate.

NOTE A hidden column is a column that exists in the table but cannot be selected. Besides the three hidden columns specified, there are several other hidden columns in relation to SQL Server Graph Databases. One of these columns is graph_id, which is used internally by the SQL Server system to manage graph data in the proper way (and is described later in the chapter, in the section “Graph Databases: System Functions”).

Similar to node tables, for each edge table, the system adds two new columns to the sys .tables catalog view: is_node and is_edge. As expected, in contrast to node tables, the values of the is_node and is_edge columns are 0 and 1, respectively.

NOTE Generally, graphs can be undirected or directed. In an undirected graph, all the edges are bidirectional. In a directed graph, all the edges point in one direction. Single edges can be thought of in the same way: an undirected edge is bidirectional, while a directed edge points in a specified direction. Referring to Figure 31-1 earlier in the chapter, the edge LivesIn is directed, while the WorksIn and LocatedIn edges are undirected.

Example 31.3 shows the creation of the three edge tables: WorksIn, LocatedIn, and LivesIn.

Example 31.3

USE graph_db;

CREATE TABLE WorksIn (starts INT) AS EDGE;

CREATE TABLE LocatedIn AS EDGE;

CREATE TABLE LivesIn (Since DATE NULL

CONSTRAINT Emp_to_City CONNECTION (Employee TO City)) AS EDGE;

The creation of the first two edge tables in Example 31.3 is straightforward. You just create a table in the regular way and append the AS EDGE clause. The creation of the LivesIn edge is different. The reason is that the first two edges (WorksIn and LocatedIn) are undirected edges, meaning that you can traverse them in both directions. For instance, in the case of the LocatedIn edge, you can traverse from the Company node to the City node and vice versa. In the case of the LivesIn edge, you can traverse only from the Employee node to the City node, because this edge is directed.

The creation of directed edges is possible in SQL Server 2019 by using edge constraints. With this feature, you can apply restrictions during the creation of an edge by using the CONNECTION clause. This clause is used in Example 31.3 to specify that you can traverse the LivesIn edge only from the Employee node to the City node. (This is explained further in the discussion after Example 31.6 in the next section.)

After creating graph objects, you can examine them using Object Explorer. You will see that there is a new subfolder called Graph in the Tables folder. All graph objects will be inside this subfolder. Note that names of auto-generated fields include a GUID, but you can reference these fields with their short names. (A short name is a pseudo column and you can use it in queries.)

3. Inserting Data into Edge Tables

In contrast to the creation of edge tables, which looks like the creation of regular relational tables (extended with the AS EDGE clause), loading data into edge tables is different than loading data into relational tables. Remember that an edge table represents a relationship between two nodes in a graph. For this reason, each INSERT statement, which loads an instance of such a relationship, must specify both the instance of the node where the relationship originates and the instance of the node where the relationship terminates. To do this, you use the already mentioned $node_id values from the $from_id and $to_id columns. (As you will see in Example 31.4, we use subqueries to solve this problem.)

Example 31.4 shows the insertion of rows into the WorksIn edge table.

Example 31.4

–To insert data into an edge table we need to provide the reference for

— the $from_id and $to_id as a reference point to both nodes

USE graph_db;

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 1),

(SELECT $node_id FROM Company WHERE id = 1), 2015);

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 2),

(SELECT $node_id FROM Company WHERE id = 2), 2018);

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 3),

(SELECT $node_id FROM Company WHERE id = 3), 2015);

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 4),

(SELECT $node_id FROM Company WHERE id = 3), 2016);

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 5),

(SELECT $node_id FROM Company WHERE id = 3), 2017);

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 6),

(SELECT $node_id FROM Company WHERE id = 4), 2018);

The first INSERT statement in Example 31.4 defines a relationship in the WorksIn edge table between the instance of the Employee node (entity) with the ID value 1 and the instance of the Company node (entity) with the ID value 1. To add data to the $from_id column, you must specify the $node_id value associated with the Employee entity. One way to get this value is to include a subquery that targets the entity, using its primary key value. You can take the same approach for the $to_id column in relation to the Company entity.

NOTE Insertion of the data in this way demonstrates why it is useful to add primary keys to each node table (see Example 31.1), but not necessary for the edge tables. The primary keys on the node tables make it much easier to provide the $node_id value to the INSERT statements in Example 31.4.

Examples 31.5 and 31.6 show the insertion of rows in the other two edge tables.

Example 31.5

USE graph_db;

INSERT INTO LocatedIn VALUES ((SELECT $node_id FROM Company WHERE id = 1),

(SELECT $node_id FROM City WHERE id=2))

INSERT INTO LocatedIn VALUES ((SELECT $node_id FROM Company WHERE id = 2),

(SELECT $node_id FROM City WHERE id=1));

INSERT INTO LocatedIn VALUES ((SELECT $node_id FROM Company WHERE id = 3),

(SELECT $node_id FROM City WHERE id=3));

INSERT INTO LocatedIn VALUES ((SELECT $node_id FROM Company WHERE id = 4),

(SELECT $node_id FROM City WHERE id=2));

Example 31.6

USE graph_db;

INSERT INTO LivesIn VALUES ((SELECT $node_id FROM Employee WHERE id = 1),

(SELECT $node_id FROM City WHERE id=6), ‘1.1.2018’);

INSERT INTO LivesIn VALUES ((SELECT $node_id FROM Employee WHERE id = 2),

(SELECT $node_id FROM City WHERE id=5), ‘2.1.2018’);

INSERT INTO LivesIn VALUES ((select $node_id FROM Employee WHERE id = 3),

(SELECT $node_id FROM City WHERE id=4), ‘3.1.2018’);

INSERT INTO LivesIn VALUES ((SELECT $node_id FROM Employee WHERE id = 4),

(SELECT $node_id FROM City WHERE id=2), ‘4.1.2018’);

INSERT INTO LivesIn VALUES ((SELECT $node_id FROM Employee WHERE id = 5),

(SELECT $node_id FROM City WHERE id=3), ‘5.1.2018’)

INSERT INTO LivesIn VALUES ((SELECT $node_id FROM Employee WHERE id = 6),

(SELECT $node_id FROM City WHERE id=1), ‘6.1.2018’);

All the INSERT statements in Example 31.6 succeed because they all insert edge instances that connect employees to cities. However, if you try to insert an edge instance the other way around (from the City node to the Employee node), like the following example:

INSERT INTO LivesIn VALUES ((SELECT $node_id FROM City WHERE id = 1),

(SELECT $node_id FROM Employee WHERE id=6),’6.1.2018′);

the INSERT statement fails, because traversing from the City node to the Employee node is forbidden in the specification of the LivesIn table (see Example 31.3), and you get the following error message:

Msg 547, Level 16, State 0, Line 1

The INSERT statement conflicted with the EDGE constraint “Emp_to_City”.

The conflict occurred in database “graph_db”, table “dbo.LivesIn”.

NOTE The CONNECTION constraint is similar to all other constraints that you can specify for a regular table. For this reason, you can drop such a constraint using the DROP CONSTRAINT clause of the ALTER TABLE statement (see Example 5.26 in Chapter 5).

3.1. Some Remarks to Directed and Undirected Relationships

As you already know, there are two different forms of relationships: undirected and directed.

An undirected relationship specifies that the instances of two corresponding nodes can be connected in both directions, while in the case of directed edges the connection is possible only in one direction, which is specified with the corresponding edge constraint.

The implementation of edge constraint for SQL Server Graph Databases has been very important, because that way the main requirements in relation to directed relationships have been fulfilled.

This is not true for undirected relationships. Note that edges are actually binary relationships between two particular nodes. If you take a look at Example 31.3, you will see that the creation of the WorksIn edge table does not have any reference to the corresponding tables (Employee and Company), which are part of this relationship. Therefore, you can write any syntactically correct, but semantically wrong, INSERT statement and both of them will be accepted by the system. In other words, the first INSERT statement in Example 31.5

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 1),

(SELECT $node_id FROM Company WHERE id = 1), 2015);

is semantically and syntactically correct. But, if you change “Company” to “City” such as shown here,

INSERT INTO WorksIn VALUES ((SELECT $node_id FROM Employee WHERE id = 1),

(SELECT $node_id FROM City WHERE id = 1), 2015);

the system will insert this statement even though it is semantically wrong. This makes a burden for the programmer, who has to worry about the correct names of the node tables during insertion of rows for a particular edge table.

For instance, the code to create an edge of a graph in the Cypher language, which is an alter ego of the query language of SQL Server Graph Databases, contains explicitly the names of the nodes:

CREATE (p:Person)-[:LIKES]->(t:Technology)

Therefore, in this case, the system takes care whether the logically correct names of nodes are specified and can use these names to check the semantic meaning of the INSERT statements used during insertion of rows. (Cypher is a declarative query language implemented to efficiently query graphs. It is a query language of another graph database system called neo4J.)

We can hope that the design of edge tables will be improved in one of the next versions of SQL Server to consider this issue.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *