An introduction to graphs: Definitions

A graph G is usually defined in terms of two sets:

A set of vertices V: independent, distinct entities that can appear in any multiplicity. A graph can have 1, 2, 100, or any number of vertices but, in general, graphs don’t support duplicate vertices.
A set of edges E connecting vertices: an edge is defined by a pair of vertices, the first one usually denoted as the source vertex, and the second one called the destination vertex.

So, we write G=(V,E) to make it clear that the graph is made of certain sets of vertices and edges; for instance, the graph in figure 14.2 is formally written as

G = ([V₁, V₂ , V₃, V₄], [(V₁,V₂),(V₁,V₃),(V₂,V₄)])

An edge whose source and destination are the same is called a loop (see figure 14.3). Simple graphs can’t have any loops, nor can they have multiple edges between the same pair of nodes. Conversely, multigraphs can have any number of edges between two distinct vertices. Both simple graphs and multigraphs can be extended to permit loops.

We won’t bother with multigraphs in this book; instead we’ll focus on simple graphs, usually without loops.

We can express the previous definitions more formally.

Given the set of edges E

For simple graphs, E ⊆ {(x, y) | (x, y) ∈ V² ∧ x # y}
For simple graphs supporting loops, E ⊆ V²

It’s also possible to associate a weight to each edge. In this case, the graph is called a weighted graph or, equivalently, a network, where each edge becomes a triplet, and the set of graph’s edges becomes

For simple graphs, E ⊆ {(x,y,w) | (x,y) ∈ V² ∧ w ∈ R ∧ x # y}
For simple graphs supporting loops, E ⊆ V²* R

Figure 14.3 shows an example of a weighted graph with loops.

1. Implementing graphs

The previous section formally defines graphs; however, when we move from theory to practice, we often have to face new issues and cope with constraints.

As much as mathematical notation makes it clear how we should represent graphs on paper, we need to decide, for instance, what’s the best way to store them into a data structure.

There are several questions to answer, depending on the context: Should we store labels for vertices and edges, or should we just assign an index to vertices using natural numbers, and enumerate edges following the natural ordering of pairs of indices?

While storing vertices is relatively easy (using lists, and possibly a dictionary to associate each vertex to its label), there is also another question that goes beyond any context: How should we store edges?

That question is not as trivial as it might seem: the caveat is that at some point we will want to check to see if there is an edge between two vertices, or maybe find all outgoing edges of a certain vertex. If we just store all edges in a single list, either sorted or unsorted, then we will have to scan the whole list to find out our answers.

Even with a sorted list, that means accessing O( log (|E|)) elements for the operations in the previous paragraph, and 0(|E|) for listing all edges going into a vertex.

No surprise, it turns out we can do better. There are two main strategies to store a graph’s edges:

Adjacency lists—For each vertex v, we store a list of the edges (v, u), where v is the source and u, the destination, is another vertex in G.
Adjacency matrix—It’s a |V|x|V| matrix, where the generic cell (i,j) contains the weight of the edge going from the i-th vertex to the j-th vertex (or true/ false, or 1/0, in case of un-weighted graphs, to state the presence/absence of an unweighted edge between those two vertices).

Before examining pros and cons of both strategies, let’s illustrate them with an example. Given the graph in figure 14.2, its adjacency list representation is the following dictionary mapping vertices to lists of edges:

1 -> [(1,2), (1,3)]

2 -> [(2,4)]

3 -> []

4 -> []

The adjacency matrix representation is the following:

As you can see, they are very different. One aspect that stands out immediately is that in the adjacency matrix, most cells are filled with 0s. This stems from the fact that the graph in figure 14.2 only has a few edges, of the many possible.

We know that because edges are pairs of vertices, in a simple graph the maximum number of edges is 0(|V|²). What’s the minimum number, though?

It can be anything; a graph can even (hypothetically) have no edges at all. A connected graph, however, must have at least |v|-1 edges.

Keeping this in mind, we’ll now provide a definition, one that will be handy later in this section:

A graph G=(V,E) is said to be sparse if |e|=0(|V|); G is said to be dense when |E|=O(|V|²).

In other words, sparse graphs have a number of edges comparable to the number of vertices, which are therefore loosely connected to each other, while in dense graphs, each vertex is connected to most of the other vertices.

Table 14.1 summarizes the pros and cons of the two different representations of graphs. In a few words, the adjacency list representation is better for sparse graphs because it requires a lot less memory, and because for sparse graphs |e| * |v| , and thus most operations can be performed efficiently.

For dense graphs, conversely, since the number of edges is close to the maximum possible, the adjacency matrix representation is more compact and efficient. Moreover, this representation allows more efficient implementation of some algorithms on graphs, such as the search of connected components or transitive closure.

In general, when no assumption can be made on the graph, and unless it is otherwise required by the context, the adjacency list representation is preferred because it is more flexible and it supports adding new vertices more easily.

2. Graphs as algebraic types

There is another aspect of graph representation that is orthogonal to the way we store edges: consistency.

To be fair, inconsistencies are much more likely to happen with the adjacency list representation, but they are still possible in certain situations, even using the adjacency matrix.

The problem is the following. Regardless of its representation, consider the following graph: G=([1,2], [(1,2), (1,3), (2,2)]).

The graph G has two vertices, [1,2], but it has an edge whose destination is the vertex “3”. This can happen for any reason; for instance, sloppiness in deleting vertex “3”, or an error while adding edges.

Moreover, the graph has a loop, the edge (2,2). What if G was supposed to be a simple graph without loops?

Of course, we can add validation to our Graph class’ methods to prevent these situations, but the data structure itself can’t guarantee that these errors won’t happen.

To overcome these limitations, we can define our graphs as an algebraic type; this way, we define graphs as one of the following:

The empty graphs.
A singleton, a single vertex with no edges.
The connection between two graphs G and G’. We define one or more edges whose source is in G and destination is in G’.
The union of two graphs G=(V,E) and G’ = (V’,E’). We just compute the union of both the vertices and edges set to obtain G”=(V ∪ V’, E ∪ E’).

This representation prevents the inconsistencies we talked about and guarantees that we won’t get malformed graphs, but also allows us to formally define algorithms as transformations on graphs and mathematically prove their correctness.

Trying to understand graphs as an algebraic type is a useful exercise to help you gain a deeper understanding of this data structure. At the same time, we need to acknowledge that considering the overhead for these operations, practical uses are limited and mostly relegated to functional languages providing pattern matching on types, such as Scala, Haskell, or Clojure.^[2]

3. Pseudo-code

To complete the discussion, listing 14.1 provides an overview of the class that we’ll use for graphs in this book. It uses adjacency lists and models edges and vertices as classes, allowing us to implement different types of graphs by changing the details of these models (for instance, allowing weighted edges).

As for concrete implementation, a Java version can be found on the book repo on GitHub,^[3] and a JavaScript version is provided by the JsGraphs library; the latter will also implement the algorithms described in the next sections.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Data structures and algorithms