Treap: Using randomization to balance binary search trees

. . . between a tree and a heap!

Treap is just the portmanteau of tree and heap. Binary search trees, in fact, offer the best average performance across all standard operations: insert, remove, and search (and also min and max).

Heaps, on the other hand, allow us to efficiently keep track of priorities using a tree-like structure. Since binary heaps are also binary trees, the two structures seem compatible; we only need to find a way to make them co-exist in the same structure, and we could get the best of both.

It’s easier said than done, however! If we have a set of unidimensional data, we can’t enforce BST’s and heap’s invariants at the same time:

  • Either we add a “horizontal” constraint (given a node N, with two children L, its left child, and R, its right child, then all keys in the left subtree—rooted at L— must be smaller than N’s key, and all keys in the right subtree—rooted at R— must be larger than n’s key).
  • Or we add a “vertical” constraint: the key in the root of any subtree must be the smallest of the subtree.

Anyway, we are in luck, because each of our entries has two values: its name and the stock inventory. The idea, therefore, is to enforce BST’s constraints on the names, and heap’s constraint on the quantities, obtaining something like figure 3.2.

In this example, product names are treated as the keys of a binary search tree, so they define a total ordering (from left to right in the figure).

The inventory quantities, instead, are treated as priorities of a heap, so they define a partial ordering from top to bottom. For priorities, like all heaps, we have a partial ordering, meaning that only nodes on the same path from root to leaves are ordered with respect to their priority. In figure 3.2 you can see that children nodes always have a higher stock count than their parents, but there is no ordering between siblings.

Figure 3.2 An example of a treap, with strings as BST keys and integers as heap priorities. Note that the heap, in this case, is a min-heap, so smaller numbers go on top. For a few links close to the root, we also show the range of keys that can be hosted in the tree’s branch rooted at the node they point to.

This kind of tree offers an easy way to query entries by key (by the product names, in the example). While we can’t easily run a query on priorities, we can efficiently locate the element with the highest priority.[3] It will always be at the root of the tree!

Extracting the top element however . . . it’s going to be more complicated than with heaps! We can’t just replace it with a heap’s leaf and push it down, because we need to take into account the BST’s constraints as well.

Likewise, when we insert (or delete) a node, we can’tjust use the simple BST algo­rithm. If we just search for the position that the new key would hold in the tree and add it as a leaf, as shown in figure 3.3, the BST constraint will still be abided by, but the new node’s priority might violate the heap’s invariants.

Listing 3.1 introduces a possible implementation for the treap’s main structure. We will use an auxiliary class to model the tree’s nodes, and this will be instrumental in our implementation. You might have noticed we are using explicit links to the node’s children, differently from what we did with heaps in chapter 2. We’ll go back to dis­cussing this choice in more detail in section 3.3.2.

In this implementation, the Treap class is mostly a wrapper for the root of the actual tree; each node of the tree holds two attributes for a key (that can be of any type, as long as there is a total ordering defined on the possible values) and a priority, that we’ll assume to be a double precision number in this case. (An integer or any type with a total ordering could work too, but we’ll see in the next section that a double works better.)

Moreover, nodes will hold pointers (or references) to two children, left and right, and their parent.

The constructor for a node will just set the key and priority attributes from its argu­ments, and initialize left and right pointers to null, effectively creating a leaf. The two branches can then be set after construction, or, alternatively, an overloaded version of the constructor also taking the two children can be provided.

1. Rotations

How do we get out of the impasse? There is one operation on binary search trees that can help us: rotations. Figure 3.4 illustrates how rotations can heal (or break!) the constraints on a treap. Rotations are common operations on many versions of BSTs, such as red-black trees or 2-3 trees.

A rotation, in a binary search tree, is a transformation whose goal is inverting the parent-child relation between two nodes of the tree, y and X in figure 3.4. We want the child node to become the parent node and vice versa, but we can’t just swap those two nodes: otherwise, in the general case where the keys of the two nodes are different, we would end up violating the ordering of keys.

Instead, what we need to do is remove the whole subtree rooted at the parent, replace it with the (smaller) subtree rooted at the child, and then find a way to plug back in the removed nodes in this new subtree.

How are we going to do that? As you can see in figure 3.4, first we need to distin­guish two cases, depending on whether the child node is a left or a right child. The two cases are symmetrical, so we’ll mainly focus on the former.

Listings 3.2 and 3.3 show the pseudocode for right and left rotations, explicating the details of the operations we described a few lines prior. Figure 3.5 illustrates the steps needed for right rotations, where the child node X, the pivot of the rotation, is the left child of its parent y.

We need to remove Y from the tree, update Y’s parent P (lines #4-11), replacing Y with node X as its child (either left or right, see lines #8-#11); at this point, Y is discon­nected from the tree, and with Y its whole right subtree.

Y’s left subtree, instead, is empty because we disconnected X and moved it. We can then move X’s right subtree and assign it to Y’s left child (line #14), as shown in the lower-left section of figure 3.5. This certainly doesn’t violate the key ordering, because (assuming there was no violation before the rotation) key[Y]>=key[Y.left] and key[Y]>=key[Y.left.right]. In other words, since X was the left child of node Y, then the right subtree of node X is still in y’s left subtree, and all keys in a node’s left subtree are smaller, or at most equal, to the node’s own key. You can also use figure 3.2 as a reference.

All that’s left to do now is reconnect y to the main tree: we can assign it to X’s right child (line #15), and we won’t have any violation. In fact, we already know that Y (and its right sub-tree) have larger keys than X’s, and for what concerns y’s left subtree, it was constructed using the former right subtree of X, and by definition all those keys too are larger than X’s.

We saw how to perform a rotation. It’s nothing fancy; it’s just about updating a few links in the tree. The only mystery at this point might be, why is it called a rotation?

Figure 3.6 tries to answer this question, interpreting the steps we saw in listing 3.2 and figure 3.5 from a different point of view. Let me remark that this is just an infor­mal way to illustrate how a rotation works. When you are going to implement this method, you’d better refer to listing 3.2 and figure 3.5.

First, let’s assume we call rotate on node X, and node Y is x’s parent. Once again, we analyze right rotation, so X is a left child of Y.

If we consider the subtree rooted at Y, we can visually “rotate” it clockwise (hence “right rotation”), pivoting on node X, until X looks like the root of this tree; hence, all other nodes appear to be below X, as shown in figure 3.6.

The result should look something like the top-right quadrant of figure 3.6. Of course, in order for this to be a valid BST rooted at node X, we need to make a few changes. For instance, there seems to be an edge from a child to its parent, from Y to X: but that’s not allowed in trees, so we need to revert the direction of the edge. If we just did that, though, X would end up with three children, and that’s also not allowed in a binary tree; to fix it, we can transfer the link between X and its right child to y. Both changes are shown in the bottom-left portion of figure 3.6.

At this point, the subtree is structurally fixed, and as a last step we can just enhance its visual representation to make it also look a little better.

You can imagine the tree structure like some kind of bolt and strings dangling structure, and then the whole operation can be described as grabbing the tree by node X and letting all the other nodes dangle from it, with the caveat that we need to also move X’s right child to node Y.

Before closing the discussion on rotations, it’s important to note that rotations always preserve BST constraints, but they do not preserve heap’s invariants. Rotations, in fact, can be used to fix broken treaps, but if applied to a valid tree, they will break the priority constraints on the node to which they are applied.

2. A few design questions

Treaps are heaps, which in turn are special trees with a dual array representation. As we saw in chapter 2, we can implement a heap using an array, a more space-efficient representation that also exploits locality of reference.

Can we also implement a treap using an array? I encourage you to take a minute and think about this question, before moving on and reading the answer. What would be the pros and cons of using an array versus a tree, and what could be the pain points of using a tree?

The issue with the array representation is that it’s not particularly flexible. It works well if we only swap random elements and remove/add only from the array’s tail; if, instead, we need to move elements around, it’s a disaster! For instance, even inserting a new element in the middle of the array causes all the elements after it to be moved, for an average O(n) swaps (see figure 3.7).

The key point with heaps is that they are complete, balanced, and left-aligned trees, which is possible because heaps don’t keep a total ordering on keys, so we can add and remove elements from the tail of the array, and then bubble up/push down a sin­gle element of the heap to reinstate the heap’s properties (see chapter 2).

Treaps, on the other hand, are also binary search trees, which do keep a total ordering on keys. That’s why we need rotations when we insert or delete new elements from a treap. As we described in section 3.2.1, a rotation implies moving a whole sub­tree from the right subtree of a node X to the left subtree of its parent Y (or vice versa). As you can imagine, this is the kind of operation that is easily performed in constant time when using pointers on a tree’s nodes, but it can become excruciatingly painful on arrays (like, linear-time painful).

And that’s why the array representation is not used for treaps (or for BSTs).

Another design question you might ask (and also should ask) before getting on with the implementation concerns the branching factor for the heap. We saw in chapter 2 that heaps can have branching factors other than 2, and in section 2.10 we also saw that a heap with a branching factor of 4 or higher sensibly outperforms a binary heap (at least in our example application). Can we also implement a treap with a generic branching factor greater than 2?

Unfortunately, it’s not that simple. First and foremost, we are using binary search trees, so a tree with a branching factor of 2: if the heap’s branching factor didn’t match the BST’s, it would be a mess!

Then you might suggest using ternary search trees, or their generalization; how­ever, that would make the rotation operations much more complicated, which means the code of the implementation would become terribly complicated and unclean (which likely also means slower!). Moreover, we would have a harder time keeping the tree balanced, unless we use something like a 2-3 tree, but that’s already guaranteed to be a balanced tree in the first place.

3. Implementing search

Now that we have a better idea of how a treap is going to be represented in memory and how rotations work, we can move to the implementation of the main API’s meth­ods. You can also find a Java implementation of treaps in the book’s repo on GitHub.[4]

We can start from the search method that’s the easiest to describe. In fact, it’s just the plain search method implemented in binary search trees: we traverse the tree from the root until we find the key we are looking for or reach a leaf without find­ing it.

As with plain BSTs, we only traverse one branch of each subtree, going left or right depending on how the target key compares to the current node’s key.

Listing 3.4 shows the implementation of the internal method taking a node as input and traversing its subtree; this version uses recursion (a technique described in appendix E). It’s worth repeating that although recursion often results in cleaner code when applied to iterative data structures such as trees, recursive methods can cause stack overflow if the depth of the recursion is significant. In this particular case, some programming languages’ compilers will be able to apply tail call optimization and transform recursion into an explicit loop, while translating the code into machine language.[5] Generally, however, it might be worth considering directly writing the explicit loop equivalent even in the higher level language, especially if you are not sure about your compiler support for tail recursion optimization, or the conditions where it can be applied.

The API method contains for the Treap class just calls method search on the root and returns false or true depending on whether the result is null or not.

4. Insert

While searching a key in a treap is relatively easy, inserting a new entry is a completely different story. As we mentioned in section 3.3.1, using BST insertion won’t work in the general case, because while the new entry’s key would end up in the right place in the tree, its priority might violate the heap’s invariants, being larger than its parent (figure 3.8).

There is no reason to despair, though! We have a way to fix the heap’s invariants, and we have actually already seen the solution: performing a rotation on the node vio­lating the priority constraints.

At a high level, the insert method has just two steps: insert the new node as a leaf, and then check if its priority is higher than its parent. If that’s the case, we need to bubble the new node up, but we can’t just swap it with its parent, like we would in a heap.

Using figure 3.6 as a reference, we need to take the subtree rooted in the new node’s parent and then rotate it so that the new node becomes the root of this subtree (because it’s certainly going to be the node with the highest priority).

Listing 3.5 describes the pseudocode for the insertion method, while figures 3.8 and 3.9 illustrate an example of inserting a new entry to add “Beer” to the inventory with 20 units in stock.

First, we need to find the right place to insert the new entry in our existing inventory. This is done with a traversal of the tree, exactly like what happens with search, only keeping track of the parent of the current node in order to be able to add the new leaf. Notice that we implemented this traversal using an explicit loop here, instead of recursion, to show to our readers how this approach works.

As we can see in the top half of figure 3.8, the first step is traversing the tree to search the right spot where we can add the new node as a leaf. We go left when we tra­verse “Flour” and “Butter,” then right at “Bacon” (lines #5-10 of listing 3.5).

Notice that for brevity we used a contracted naming notation in the figure. The newly added node, corresponding to variable newNode in listing 3.5, is denoted as X in the figures, and its parent with P.

At this point, when we exit the while loop, the temporary variable parent points to the node with key “Bacon”; therefore the conditions at lines #11 and #14 are false, and we add the new node as a right child of parent, as shown in the bottom half of fig­ure 3.8.

Looking at the example, we can also notice how the new node has a higher priority (a lower number of units in stock) than its parent; therefore, we enter the loop at line #19, and perform a left rotation. After the first iteration of the loop and the left rota­tion, the “Beer” node still has higher priority than its new parent, “Butter,” as shown in the top half of figure 3.9. Therefore, we enter a second iteration of the loop, this time performing a right rotation, because node X is now a left child of P.

Since now (bottom half of figure 3.9) no invariant is violated anymore, we can exit the loop. And since the new node wasn’t bubbled up all the way to the root, the check at line #24 fails, and we don’t need to do anything else.

What’s the running time for insert? Adding a new leaf requires O(h), because we need to traverse the tree from its root to a leaf. Then we can bubble up the new node at most to the root, and at each rotation we move the node one level up, so we can perform at most h rotations. Each rotation requires a constant number of pointers to be updated, so that bubbling up the new node and the whole method finally requires O(h) steps.

5. Delete

Deleting a key from a treap is a conceptually simpler operation, although it requires a completely different approach with respect to BSTs. In binary search trees, we replace the removed node with its successor (or predecessor), but this approach wouldn’t work in treaps, because this replacement could have a smaller priority than its new children, and in that case, it would need to be pushed down. Moreover, in the general case for BSTs, the successor of a node is not a leaf, and so it needs to be recursively removed.

A simpler approach consists of preemptively pushing down the node to be removed, all the way until it reaches a leaf. As a leaf, it can then be disconnected from the tree without any effect.

Conceptually, it’s like assigning the lowest possible priority to the node to be removed, and fixing the heap’s invariants by pushing down the node. The operation won’t stop until the node with an infinite (negative) priority reaches a leaf.

This is illustrated in figure 3.10 and described in listing 3.6.

In listing 3.6, in particular, we see why it was useful to have method search return the node where the key was found. We can reuse it now to write the remove method, whose first step is indeed searching the key to remove and then, if it was found, this method will take over from the node that needs to be pushed down.

Special care, as always, needs to be paid to be sure we remove the root.

Let’s follow how the algorithm works using our example. Suppose we want to remove “Butter” from our inventory (for instance, because we won’t sell it anymore or we sold all of it).

The first step, shown in figure 3.10 (A), is searching for the key “Butter” in the tree (line #2 in listing 3.6). Once we find the node (that’s obviously not null, line #3), as usual marked with X in the figures, we verify that it’s neither the root nor a leaf (hence, the check at line #5 returns false), so we can enter the while loop at line #8.

At line #9, we choose X’s left child, denoted with l in the figure, as its highest-priority child, so we perform a right rotation (line #10), which produces the tree shown in fig­ure 3.10 (B).

Here in the figure we changed the priority of the node being pushed down to +ot,[6] but in the code we don’t actually need to do that; we can just push down the node without checking priorities, until it becomes a leaf.

At this point X is not yet a leaf, although it just has one child, its (also former) right child R; therefore, we will enter another iteration of the while loop, and this time per­form a left rotation, producing the tree shown in figure 3.10 (C). One more left rota­tion, and X finally becomes a leaf.

At this point we exit the while loop, and at line #15 we are sure that node is not the root (otherwise, we would have caught this case at line #5), so it will have a non-null parent. We still need to disconnect the node from the tree by removing the pointer from its parent, and to do so we need to check whether it was a left or right child.

Once the right link has been set to null, we are done and the key was successfully removed.

If we compare this method with the plain BST’s version, the positive aspect is that we don’t have to call remove recursively on the successor (or predecessor) of the node that will be removed. We just perform one removal, although possibly with several rotations. And that’s actually one negative aspect: if we delete a node close to the root, we will need to push it down for several layers until it reaches a leaf.

The worst-case running time for the remove algorithm is, in other words, O(h), where h is the height of the treap. As a consequence, it becomes particularly import­ant, as you can imagine, that the height of the tree is kept as small as possible.

As you can see from our examples, using the treap for storing both keys and mean­ingful priorities might tend to produce an unbalanced tree, and removing a node might make the tree even more unbalanced, because of the many rotations starting from an already bad situation.

6. Top,peek, and update

The remaining methods in class Treap’s API are easier to implement. Method peek is trivial to implement; it’s exactly the same as for regular heaps, the only difference being in how we access the heap’s root.

If we also need to implement method top, to make sure our treap can seamlessly replace a heap, we can leverage the remove method and write almost a one-liner, as shown in listing 3.7.

Besides validating the treap’s status, checking that it’s not empty, we just need to retrieve the key stored in the root and then remove it from the treap.

Similarly, if we need to update the priority associated with a key, we can follow the same logic as for plain heaps, bubbling up the updated node (when increasing prior­ity, or pushing down, when we lower priority). The only difference is that instead of just swapping nodes we need to perform rotations to move the updated node. Imple­mentation of this method is left as an exercise (or you can check it out on the book’s repo).

7. Min,max

The last methods left to implement in our API are min and max, returning the mini­mum and maximum key stored in the treap.

These keys are stored respectively in the left-most and right-most nodes of the tree. Be careful, though; these nodes are not necessarily going to be leaves, as shown in fig­ure 3.11.

Listing 3.8 shows a possible implementation of method min. Exactly as in BSTs, we just traverse the tree, always taking the left branch until we reach a node whose left child is null. Method max is symmetric; you just need to replace node.left with node.right.

8. Performance recap

This concludes our discussion on the implementation of treaps. In the next sections, we’ll discuss applications of treaps and analyze them in more detail.

For now, let’s recap the running time of the treap’s methods, shown in table 3.2. Notice that

  • All operations only depend on the height of the tree, rather than on the num­ber of elements. Of course, in the worst case, O(h)=O(n) for skewed trees.
  • We omitted the space analysis, because all these methods only require constant extra space.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *