Disjoint sets: Applications

Applications for disjoint set are ubiquitous, and the reason they have been studied at length is exactly due to the number of cases where they prove useful.

1. Graphs: Connected components

For undirected graphs, there is a simple algorithm that uses disjoint sets to keep track of their connected components, that is, areas of the graph that are interconnected.

While connected components are usually computed using Depth First Search (DFS), we can use a disjoint set to keep track of the components while we scan all the graph’s edges. An example is shown in listing 5.11.

At the end, each partition of vertices in disjointSet will be a connected component.

It’s worth noting that this algorithm can not be used for directed graphs and strongly connected components.

2. Graphs: Kruskal’s algorithm for minimum spanning tree

A spanning tree for a connected undirected graph G is a tree whose nodes are the vertex of the graphs, and whose edges are a subset of G’s edges. If G is connected, then it cer­tainly has at least one spanning tree—possibly many, if it also has cycles (see figure 5.13). Among all possible spanning trees, a minimum spanning tree (MST) is the one for which the sum of edges’ weights is minimal.

Kruskal’s algorithm is beyond the scope of this book. Here it suffices to say that it constructs the MST for a graph by

  1. Starting with each vertex in a difference set.
  2. Keeping a disjoint set of the graph vertices.
  3. Going through the graph’s edges in order of increasing weight.
  4. For each edge, if its extremes are not in the same partition, merge their components.
  5. If all vertices belong to the same component, stop.

The MST will be defined by the list of edges that triggers, at point (4), merge calls for the disjoint set.

3. Clustering

Clustering is the most-used unsupervised learning algorithm. The problem here is that we would like to get a partitioning of a set of points into a few, usually disjoint sub­sets, as shown in figure 5.14.

There are several types of clustering algorithms. Although a taxonomy of cluster­ing is beyond the scope of this chapter (we will devote chapter 12 to this topic), here we will mention one particular class of these algorithms.

Agglomerative hierarchical clustering starts with each point in its own cluster (partition) and continuously merges two points (and their clusters) until all clusters are merged into one; figure 5.15 shows an example of how it works. The algorithm keeps a history of this process, and it is possible to get a snapshot of the clusters created at any of the steps. The exact point where this snapshot is taken is controlled by a few parameters and determines the result of the algorithm.

The description of the algorithm should ring a bell: at each step we need to find two points belonging to two different clusters. You can easily imagine what the best data structure is to compute and find that information. In chapter 13, we’ll see a prac­tical application of disjoint set as part of a distributed clustering algorithm.

4. Unification

Unification is the process of solving equations between symbolic expressions. One of the ways of solving such an equation is finding terms on both sides that are equivalent, and removing them from the equation.

Of course, solution strategies depend on which expressions (terms) can appear in the equation and how you can compare them or when they are considered to be equal. For instance, they might be evaluated and considered equal if they have the same value, or they might be symbolic and considered equal if they are equivalent, possibly net of some variable substitution.

As you can imagine, the disjoint set data structure is perfect for high-performance algorithms solving this problem.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *