Treap Applications: Randomized treaps

So, we are now able to implement our inventory, keep track of the products in stock, and extract the ones closest to running out of stock. That would certainly impress everyone at the next family reunion!

Hopefully, our example helped you understand how treaps work, but . . . I have a confession to make: treaps are not really used as a way to index multidimensional data.

We’ll see in the next chapters, and in particular in chapter 7, when we talk about cache, that there are better ways to address problems equivalent to the example we presented in this chapter.

Let me be clear: using treaps as both trees and heaps is possible, perfectly legal, and can even offer decent performance, under certain conditions, although in the general case we have seen that keeping data organized by both criteria will likely pro­duce an unbalanced tree (which means linear-time operations).

But that’s not why treaps were invented, nor is it the main way they are used today. In the end, the point is that there are better ways to index multidimensional data and better ways to use treaps. Instead, we’ll see that we can use treaps as a building block to implement a different, and efficient, data structure.

1. Balanced trees

One aspect that we stressed is that unbalanced treaps tend to have long paths, whose length can be, in the worst-case scenario, in the order of O(n) nodes.

Conversely, when discussing heaps, we saw that balanced trees, like heaps, have logarithmic height, making all operations particularly convenient to run.

With heaps, however, the catch is that we trade the benefit of balanced trees in exchange for restricting to a limited set of operations. We can’t efficiently search a heap for an element’s key, or retrieve the maximum or minimum key,[1] or delete or update a random element (without knowing beforehand its position in the heap) in sublinear running time.

Nevertheless, there are, in algorithm literature, many other examples of balanced trees, data structures that guarantee that the height of the tree will be logarithmic, even in the worst case. Some examples that we mentioned in section 3.2 are 2-3 trees[2] (shown in figure 3.12) and red-black trees (figure 3.13).

The algorithms to maintain the constraints for these trees, however, tend to be quite complicated, so much so that many textbooks on algorithms, for instance, omit the delete method altogether.

Turns out, quite surprisingly, that we can use treaps, which seems quite unbal­anced, to obtain tendentially balanced11 binary search trees using a set of easier and cleaner algorithms (in comparison to red-black trees and the like).

Figure 3.12 A 2-3 tree containing the keys we used in our grocery store example. Nodes in 2-3 trees can contain one or two keys, sorted ascendingly, and respectively 2 or 3 links. Besides left and right children, 3-nodes also have a middle child. All keys K in the subtree pointed to by a middle link must obey this condition: K > K >= K2, where K and K2 are the first and second key, respectively, of the 3-node. 2-3 trees are guaranteed to be balanced by the way insertion is performed: keys are added to the leaves, and when a leaf grows to three elements, it’s split and the middle element is bubbled up to the parent node (which is also possibly recursively split). It is guaranteed that the height of a 2-3 tree with n keys is between log2 (n) and log3 (n).

As we saw in the introduction to this chapter, plain BSTs also suffer this same problem, having their structure depend on the order in which elements are inserted.

And if we go back to the last section, we saw treaps can be skewed if the particular com­bination of keys and priorities, and the order in which elements are inserted, is particularly unlucky, because rotations can cause the tree to get even more unbalanced (see figure 3.9).

So here is the idea: we can use rotations to rebalance the tree. If we strip priorities from their meaning (in our example, forget about the units in stock for each prod­uct), we can, in theory, update the priority value of each node so that fixing the heap’s invariants will produce a more balanced tree.

Figure 3.14 Rebalancing a treap by carefully updating its priorities. If we change the priority of the node with key “Milk” to a value smaller than its parent (but larger than the root, in this case), we can fix the heap invariants with a right rotation. Incidentally, by doing so we will get a perfectly balanced tree.

We need to be clear now: By dis­carding the meaning of the prior­ity field, we are implementing something different than what we had in section 3.3. In particular, this new data structure will no lon­ger adhere to the priority queue public interface, and it will not offer any top or peek method.

Instead, it will just be a binary search tree that internally uses the concepts we developed for treaps to maintain its structure balanced. Table 3.3 shows the methods and contract for BSTs. The data structure we are about to introduce will adhere to this API.

2. Introducing randomization

If getting better results using simpler algorithms sounds too good to be true to you . . well, you might be partially right, meaning that there is a price to pay for simplicity.

Updating priorities to keep the tree balanced seemed easy on our small example, but doing it systematically on a large tree becomes difficult and expensive.

Difficult because it becomes like solving a puzzle: every time we rotate an internal node, we potentially cause lower levels in the subtrees pushed down to become more unbalanced, so it’s not trivial to come up with the right order of rotations to obtain the best possible tree structure.

Expensive, because we need to keep track of the height of each subtree, and because coming up with a sequence of rotations requires extra work.

In the previous section, we used the term tendentially balanced to describe the result we can get. This has probably already revealed a key point to the eyes of the most observant readers: we are talking about introducing a randomized element in our data structures.

Randomness will be a constant factor in this first part of the book. We’ll see several data structures leveraging it, including Bloom filters. To help all readers be comfort­able with the topic, we prepared a short introduction to randomized algorithms in appendix F; feel free to take a look before delving into this section.

In the original work by Aragon and Raimund, treaps were introduced as a means to obtain “randomized balanced search trees.” They used the same idea we described in section 3.4.1, leveraging priorities to force a balanced tree structure, but they avoided all the complexity of manually setting these values by using a uniform ran­dom numbers generator to choose the values for the nodes’ priorities.

Figure 3.15 shows a more balanced version of the tree produced at the end of fig­ure 3.9, by replacing those priorities with randomly generated real numbers. It’s also possible to use random integers for priorities, but using real numbers reduces the pos­sibility of ties and improves the final result.

We will see in section 3.5 that, if the priorities are drawn from a uniform distribution, the expected height of the tree is logarithmic in the number of nodes.

The best part is that we have already written almost all the code needed to imple­ment this new data structure. We can internally use a treap (listing 3.9) and all its API methods will just be wrappers for treap’s methods, except for insert; that’s the only one for which we need to write an extra line of code.

Listing 3.9 Class RandomizedTreap

class RandomizedTreap

#type Treap

treap

#type RandomNumberGenerator

randomGenerator

function RandomizedTreap()

this.treap ← new Treap

As it’s shown in listing 3.10, all we need to do is generate a random priority when we insert a new key in our Randomized Treap.

You can find a Java implementation for RTs on the book’s repo on GitHub.

3. Applications of Randomized Treaps

As we saw in the last section, Randomized Treaps are the main application for treaps. Now the question is, what are the most common applications for Randomized Treaps?

In general, we can use a Randomized Treap everywhere we would use a BST. In particular, they are indicated for situations where we need a balanced tree, but we are fine with having guarantees on the average case only, not on the worst case.

Another aspect to consider is that, as always when randomization and “average case” bounds are involved, the guarantees are more likely to hold when the number of entries is larger, while for smaller trees, it’s easier to obtain skewed structures. (For small trees, however, the performance difference between slightly skewed and bal­anced trees will also be less relevant, obviously.)

BSTs, in turn, are often used to implement dictionaries and sets. We will see more about these structures in chapter 4.

Other examples include keeping data read from a stream sorted, counting the number of elements smaller (larger) than any given element of a dynamic set, and in general all applications where we need to keep a dynamic set of elements in sorted order, while supporting fast search, insertion, and removal.

Practical examples of real-world code using BSTs are, for instance, managing a set of virtual memory areas (VMAs) in operating system kernels and keeping track of packet IP verification IDs. For the latter, a hash table would be faster, but it would also be vulnerable to worst-case input attacks, where an attacker could send packets from IPs hashing to the same value: this would degenerate the hash table into an unsorted list (if concatenation is used), transforming the hash table in a bottleneck and possi­bly slowing down packet resolution or the whole kernel.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)