Disjoint sets: Naive solution

The most immediate solution for our problem is to represent each partition with a list (or array), as illustrated in figure 5.4. For each element, we need to keep track of the pointer to the head of the list.

To find out if two elements are in the same partition, we need to retrieve the list for one ele­ment, and check if it is the same list as for the other.

To merge two partitions, P1 and P2, modeled with two lists, L1 and L2, we need to update the next pointer of the last element in Li so that it points to the head of L2 (or vice versa, with the last element of L2 pointing to the head of L1).

This operation, which is shown in figure 5.5, can be done in constant time by keeping an extra (constant space) pointer to the tail of each list.

Unfortunately, though, we aren’t done just yet: for every element in L2, we also need to update its list pointer in our map to point to the head of the new merged list.

This operation requires linear time in the worst case, because we might have to update up to n-1 elements (where n is the total number of elements in the Uni­verse U).

There is one way to slightly improve the expected number of assignments we have to perform: by always appending the shortest of the two lists, we will make sure that we won’t have to update more than n/2 elements’ pointers. Unfortunately, this does not improve the asymptotic execution time.

Let’s delve into code to better explain how this works.

1. Implementing naive solution

Let’s start with the pseudo-code for the constructor, and the class definition (see listing 5.1). All methods in the next sections will be assumed to be class methods for DisjointSet.

Initialization is simple: we will check that the list passed as argument contains no duplicates and initialize the disjoint set with its elements.

In a real implementation, you should worry about how elements are compared. Depending on the programming language, it can use referential equality, an equality operator, or a method defined on the elements’ class. The following code is only meant to illustrate how a basic solution works, so we won’t worry about the details.

First, we initialize the associative array that is going to index the elements and map them to the partition they belong to (line #2).

Next, wejust go through initialSet’s elements, one by one, check they are defined and that there is no duplicate, and then initialize their partition to the singleton con­taining the element itself (initially each element is disjoint from every other element).

Now that we have taken care of initializing our disjoint set, we can provide a couple of useful methods. For example, we can add a size public property, simply defined as the number of entries stored in the local partitions map.

You can find examples of such methods implemented in our repo. Here, instead, we will focus on the main API methods, starting with the add method, illustrated in figure 5.6, whose pseudo-code is shown in listing 5.2.

This method is used to allow the Universe to grow, with new (unique) elements that can be added at any time. Every time we add a new element, we just add a brand-new partition containing that element alone. But, of course, we need to check that the argument passed to add is well-defined, so that it’s not null and not a duplicate of another element already in our Universe.

Now we get to the really interesting stuff: first and foremost, the findPartition method, shown in listing 5.3.

In this basic implementation, the method is particularly trivial: after the usual valida­tion (including checking that the element is actually stored in the disjoint set), we just need to return the partition containing elem.

As we mentioned, this implementation of the findPartition method only requires constant time (assuming that the hash for elem can be computed in constant time).

Another easy-to-implement method, shown in listing 5.4, is the one checking if two elements belong to the same partition.

We just need to reuse findPartition, call it for both elements, and check whether both calls return the same partition. Note that by reusing findPartition, we can make sure that the implementation of areDisjoint won’t need to change, no matter how our elements are stored or findPartition is implemented (as long as its inter­face remains the same, and partitions can be compared with the inequality operator).

Moreover, we decided to implement a check for elements belonging to different partitions rather than for elements belonging to the same one: this is because of how disjoint sets are normally used. We are normally interested in checking whether two elements don’t belong to the same partition, and if that’s the case, we merge the two partitions. But depending on the way you are going to use this container, it is possible that the other way around is more convenient, and there is nothing preventing you from defining a samePartition method instead.

All the methods we have seen so far run in constant time with respect to the size of the container. Now, it’s time to implement the method merging two partitions, shown in listing 5.5 (and illustrated in figure 5.5). As we have seen, the merge method requires O (n) assignments in the worst case.

This method is more complex than the previous ones. And yet, by reusing findParti- tion, it still looks quite simple.

We first check if elements belong to the same partition by calling findPartition on both and checking the result. Those calls also take care of validating the input.

Once we’ve established that we actually need to perform an action, we proceed and merge the two sets, correcting the pointers in the partitions map when needed. If the partitions were implemented with linked lists instead of Set, we could have just appended the head of a list to the tail of the other. Sets, instead, force us to actually add elements one by one. An extra linear number of assignments is needed (worst case), but this doesn’t change the order of the function’s runtime; we still need to update the references for elements of one of the two lists (that is, sets) anyway.

Here, we show the simplest code, always pouring the first partition’s elements into the second one. On our repo on GitHub you can find a slightly better version that checks which set is smaller and adds its elements to the larger sets; however, this is just a constant-time improvement on the simplest version and the running time remains linear in the minimum of the sets’ sizes.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *