SQL Server Graph Databases: A General Introduction

Generally, a graph is a collection of nodes and edges, or, in less formal language, a set of entities and the relationships that connect them. Therefore, nodes of a graph are entities, such as Employee, Product, and Customer, and edges are relationships. How the nodes relate to the world is specified by edges. For instance, employees sell products, and products are bought by customers. Using graphs, you can model a wide variety of different scenarios from the real world, anything from the design of social networks to the construction of space shuttles.

1. Graph Databases: Models

A graph database is a database management system that supports the usual properties of DBMSs (see Chapter 1) using a graph data model as its underlying model. Graph databases are generally built for use with online transaction processing (OLTP) systems. Two main properties of graph database technology are how the data is stored and how the data is processed.

The underlying storage of a graph database can be native or nonnative. Native storage is optimized and designed for storing and managing graphs. A graph database is classified as nonnative when the storage comes from a “non-graph” database model, such as a relational, columnar, or object-oriented model. (The nonnative approach can lead to performance problems, because the storage engine of non-graph database models is not optimized for graphs.)

The processing engine of a graph database doesn’t necessarily need to support native graph processing. The key element of graph technology in relation to data processing is whether or not index-free adjacency is supported. (Index-free adjacency means that every node contains a direct pointer to its adjacent node. That way, creation of indices is not necessary, at least for queries that reference a node and its adjacent ones.)

1.1. Property Graph Model

The most popular model in relation to graphs is called the property graph model. Besides this model, there are two others: Resource Description Framework (RDF) and hypergraphs. The discussion of these two models is outside the scope of this book.

A property graph has the following characteristic: it contains nodes and edges. Each node and each edge has a unique identifier. Also, each node has a collection of properties (key-value pairs). Finally, a node has a set of outgoing and incoming edges.

Edges are named and always have a start and an end node. Each edge has a label that denotes its connection type between two nodes. Edges, like nodes, can also contain properties.

1.2. Advantages of Graph Databases

The power of graph databases becomes apparent in relation to two issues:

  • Performance
  • Flexibility

It is well known that SQL does not deal very well with connected data. The reason is that the number of join operations (inner joins and self-referencing joins) increases significantly as the dataset gets bigger and the traversal of a graph goes deeper and deeper. (This is discussed further in the section “Querying Graph Data Using Relational Queries” at the end of this chapter.) In contrast to relational database systems, the performance of a graph database system remains constant, even when the dataset significantly grows. This is because queries are localized to a particular subgraph of the entire graph. As a result, the execution time for each query is proportional only to the size of the subgraph traversed to satisfy that query.

One of the main advantages of graphs is that they are additive, meaning that you can add new substructures (nodes, relationships, and subgraphs) to an existing graph without disturbing existing user applications in their functioning. This flexibility has positive implications for developers, because they can significantly increase their productivity. Also, because of the graph model’s flexibility, you don’t have to model your domain in depth, making it significantly easier to change business requirements later, if necessary.

2. SQL Server Graph Databases: An Introduction

As previously introduced, a graph is a collection of nodes and relationships (edges) between nodes. A graph in SQL Server Graph Databases is a collection of node tables and edge tables, where nodes are stored in node tables and relationships between nodes are stored in edge tables. Using these two forms of tables, SQL Server Graph Databases supports several general features of graph databases:

  • The nodes and edges of a graph are first-class citizens and can have attributes or properties associated with them. (An object is considered a “first-class citizen” of a system if it can be stored in variables and data structures and can be passed as a parameter to a subroutine.)
  • Pattern matching can be expressed easily. (As you will see shortly, pattern matching in SQL Server Graph Databases is expressed using the MATCH function.)
  • Edges can be specified as directed or undirected. (Directed edges are specified using edge constraints.)
  • Polymorphic queries can be expressed easily. (Polymorphic queries in relation to graphs are queries that return instances of a particular node as well as instances of all subgraphs of that node.) Note that node instances are logically the same as instances of entities, as explained in Chapter 1.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *