Treap: Performance analysis and profiling

As we saw in section 3.3, all API methods on Randomized Treaps require time propor­tional to the height of a tree. As we know (see the introduction to this chapter and appendix C), in the worst case the height of a binary tree can be linear in the number of elements and, in fact, one of the problems with BSTs is that there are specific sequences of insertions that will certainly cause a tree to be skewed. This issue makes binary search trees particularly vulnerable to attacks when used as dictionaries, because all an attacker needs to degrade the data structure’s performance is to send an ordered sequence, which will cause the tree to degenerate into a linked list, having a single path from the root to a leaf containing all the elements.

Randomized Treaps offer a two-fold improvement. First and foremost, introducing randomness in the assignment of priorities prevents[1] attackers from being able to exploit known sequences. But also, as we promised in section 3.4, it will give us on average a more balanced tree than plain binary search trees.

What does “on average” mean? And how much of an improvement can we get? There are two ways to answer these questions. From a theoretical point of view, we can analyze the expected height of a Randomized Treap, and mathematically prove that, on average, the height will be logarithmic.

But also, from a practical angle, we can just run a simulation to verify that what we expect is true and compare the height of a BST versus a Randomized Treap with the same elements.

1. Theory: Expected height

To analyze the expected height of a random data structure, we need to introduce some concepts about statistics.

First and foremost, we will need to use the concept of expected value for a random variable V. We can informally define it as the mean value (not the most likely value) that variable will assume over a large set of occurrences.

More formally, if V can assume values in a finite, countable set {v1 v2, . . . vM}, each with probability {p1 p2, . . . , pM}, then we define the expected value of V as

For our Randomized Treap, we will define a random variable Dk for the depth of a given node Nk, where the index k e {0, . . . n-1} denotes the index of the node’s key in the sorted set: Nk has the k-th smallest key in the tree.

In simple terms, Dk counts how many ancestors there are for the node holding the k-th smallest key, Nk. Yet another way to see this number is the following: How many nodes are in the path from the tree’s root to Nk?

Formally:

We can denote the event “Ni is an ancestor of Nk” with the indicator (a binary variable) Aki this way, given any pairs of nodes Ni and Nk, Aki == 1 means that Nk is in the path between Nk and the root, while Aki ==  0 means that either they are in different branches or, at most, Ni is a descendent of Nk.

Then the expected value for Dk becomes

To compute the probability P(Aki), we need to introduce a new variable, and a lemma (an intermediate result).

We define N(i,k) = N(k,i) = {Ni, Ni+1,                       . . . Nk-1, Nk} as the subset of treap nodes whose keys are between the i-th and k-th smallest[2] of the whole tree.

Obviously, N(0,n-1) = N(n-1,0) contains all the nodes in the treap. Figure 3.16 shows a few examples of these subsets to help you visualize them.

Notice that the successor and predecessor of any node n are always on a path between N and the root, or n and a leaf. In other words, to find the predecessor—or successor—of a node, you only need to look either in the subtree rooted at n, where this node will be the left-most, or right-most for successors, or in the path between N and the root.

It can be proved that the following lemma holds:

For all i#k, 0<i,k<n-1, N1 is an ancestor of Nk if and only if Ni has the smallest priority among all nodes in N(i,k) .

We will not prove this lemma here. If you’d like to take the challenge, it can be easily proven by induction.

So, armed with the lemma, we can compute the probability that the node with the i-th smallest key becomes an ancestor of the node with the k-th smallest key. Since we assume that the priorities are drawn from a uniform continuous set, as with all real numbers between 0 and 1, then each node in a subset of nodes is equally likely to hold the smallest priority.

Therefore, for each i#k we can write the probability of i being an ancestor of k as

while for i=k, instead, the probability is simply 0 (a node can’t be its own ancestor). Replacing these values in the formula for the expected value of Dk, we get

The middle term of the sum obviously evaluates to 0, while for the first term of the sum, we notice that when i=0, the denominator becomes equal to k-1, and it dimin­ishes of 1 unit as i increases until, for i=k-1, it becomes equal to 2.

Similar considerations can be made for the last term of the formula, giving us:

The two summations in the previous formula are, in fact, both partial sums of the har­monic series, denoted as Hn; Hn<ln(n), where ln is the natural logarithm. We finally obtain

This result guarantees that, over a large number of attempts, the mean value for the height of a Randomized Treap is O(log(n) ), logarithmic in the number of keys stored (independently of the order keys are added or removed, and on the distribution of keys).

2. Profiling height

You could object that it’s just a guarantee on the average among several attempts. What if we get really unlucky on a crucial run? To get better insight into the actual performance of this data structure, we can run a little profiling, just like we did in sec­tion 2.10 for d-heaps. This time, we’ll use a profiling tool for Java, JProfiler.16

We could even omit using a profiling tool, because our tests are going to compare an implementation of a plain BST versus a Randomized Treap, and after performing the same sequence of operations on two instances of those two containers, they’ll check how the heights of the two trees compare.

This measure will give us the gist of the asymptotic improvement of balanced trees over BSTs, because, as we discussed, operations on binary trees (balanced or not) require a number of steps proportional to the height of the tree.

We also know, however, that asymptotic analysis discards the constant coefficients, hiding the code complexity that usually comes with more advanced algorithms; there­fore, having an indication of the actual running time will provide more thorough infor­mation that can help us choose the best implementation for our applications.

In the tests that you can find on the book’s GitHub repo, we try three different scenarios:

  • We create large trees of increasing (random) size, whose keys are random inte­gers, with an initial sequence of insertions, followed by random removals and insertions (with a rate of 1:1).
  • We proceed similarly to the previous step, but the possible key values are limited to a small subset (for instance, just 0 . . . 100, forcing several duplicates in the tree).
  • We insert an ordered sequence of numbers into the trees.

The results of the first test we ran are shown in figure 3.17. You can see that the height of both trees is growing logarithmically. At first this seems rather discouraging, as there seems to be no improvement given by using Randomized Treaps over plain BSTs.

We need, however, to make a couple of considerations.

First, let’s clarify the rules of the game: given a target size for the tree, n, then we add (to both containers) the same n integers, chosen at random, without any limit. After these insertions, we perform another n operations. Each of them can remove an existing key (randomly chosen) or add a new random integer to it. Obviously, we repeat this test for growing sizes.

We used an efficient implementation of BSTs (you can find it on the book’s repo[5]) that limits the skewing effect of removing elements.[6] This expedient already improves the balance of BSTs and reduces the gap with balanced trees.

Finally, there is one consideration to make. In our experiment, we add completely random keys since the range of values is so large compared to the number of elements in the container. The expected number of duplicates is negligible, and we can assume that all sequences of keys to insert will be drawn with the same probability. In this scenario, it’s unlikely that an adversarial sequence, causing the height to be super­logarithmic, will be chosen (the chances are beyond slim already for n ~= 100).

Basically, we are using the same concept of Randomized Treaps, moving the ran­domness into the generator of the sequences of keys to insert; unfortunately, however, we don’t always get to choose the data to add to our containers!

And so, in order to verify this hypothesis, we can run a different experiment, reducing the influence of the random generator by limiting the set of possible keys. The results of limiting the keys to values in the range 0 . . . 1000 are shown in figure 3.18. Now we can immediately notice a difference between the two data structures: the BST grows linearly, with a slope of approximately 10-3, while the Randomized Treap still shows logarithmic height.

There are two reasons for this difference:

  • Having a high duplicates rate (larger and larger as n is growing) produces sequences of insertions with longer streaks of sorted data.
  • BSTs are naturally left-leaning when duplicates are allowed. As you can also see in listing 3.5, we need to break ties when inserting new keys, and we decided to go left every time we find a duplicate. Unfortunately, in this case we need a deterministic decision. We can’t use the same workaround as for deletions, and so when input sequences contain a large number of duplicates, as in this test, BSTs become tangibly skewed.

This is already a nice result for treaps, because real-world inputs can easily contain sev­eral duplicates in many applications.

But what about those applications that don’t allow duplicates? Are we safe from adversarial sequences in those situations? Truth to be told, we haven’t clarified how important the role of ordering is. What better way to do it than trying the worst possi­ble case—a totally ordered sequence? Figure 3.19 shows the results for this test. Here we stripped out all randomness and just added to the containers all integers from 0 to n-1, no calls to remove performed.

As expected, for BSTs the height is always not just linear in the number of nodes, but equal to the number of nodes (this time the slope is exactly 1), because the tree degenerates to a linked list, as expected.

Randomized Treaps, instead, keep performing well with logarithmic height, just like for the other tests, even in the most unfavorable case.

We can then conclude that if the parameter to improve is the height of the tree, Randomized Treaps do have an advantage in comparison to BSTs, and it is true that they keep a logarithmic height in all situations; therefore, their performance is com­parable to more complex data structures such as red-black trees.

3. Profiling running time

The height of the tree, however, is not the only criteria we are interested in. We want to know if there is a catch, and what’s the price we have to pay in terms of running time and memory usage (figure 3.20).

To find out this, we run a proper profiling of the first test (with unbounded integers as keys, so with low or no duplicates) using JProfiler and recording the CPU time. The profiling used the implementations of BSTs and Randomized Treaps provided in the book’s repo. It’s worth noting that this profiling only gives us information on the implementations we examine, but we could get different results on optimized or dif­ferently designed software.

The results for our profiling run are shown in figure 3.20, where we can see that for insertion (method add) the cumulative running time spent for Randomized-Treap is almost twice as much as for BST:: add; for method remove; instead, the ratio becomes 3.5 times.

From this first test, it seems that we pay a high price, in the general case, for the overhead due to the greater complexity of Randomized Treaps. This result is some­how expected, because when the height of the trees is approximately the same, the code for treaps is substantially more complex than for BSTs.

Should wejust throw away our Randomi zedTreap class? Well, not so fast. Let’s see what happens when we profile the second test case introduced in section 3.5.2, the one where we still add random integers to the containers, but limited to the range [0, 1000].

In the previous section, we saw that, in this case, BST’s height grows linearly, while for Randomized Treaps we still have a logarithmic growth.

Figure 3.21 shows the result of this profiling. We can immediately see that the situ­ation has radically changed, and BSTs perform tremendously worse now. So much worse that BST::add takes 8-fold the running time of RandomizedTreap::add. For the method remove the ratio is even worse; we are talking about almost a 15-fold speed-up using Randomized Treaps—that’s exactly the situation where we would want to use our new, fancy, balanced data structure!

For the sake of completeness, let’s also take a look at the worst-case scenario for BSTs. Figure 3.22 shows the profiling of the last test case introduced in section 3.5.2, where we insert an ordered sequence in our containers. In this case, I believe that the results don’t even need to be commented, because we are talking about thousands of seconds versus microseconds (we had to test on a smaller set because BSTs performance was so degraded as to be unbearably slow).

All things considered, these results would suggest that if we are not sure about how uniform and duplicates-free the data we have to hold are, we should consider using Randomized Treaps. Instead, if we are sure the data will have lots of duplicates or possibly be close to sorted, then we definitely want to avoid using plain BSTs and resort to a balanced tree.

4. Profiling memory usage

So much for CPU usage, you might say, but what about memory usage? Because maybe Randomized Treaps are faster in some situations, but they require so much space that you won’t be able to store them in memory for large datasets.

First, we make a consideration: memory usage will be approximately the same for all the test cases we have introduced in the previous sections (when comparing con­tainers of the same size, of course). This is because the number of nodes in both trees won’t change with their height; these trees do not support compression, and balanced and skewed trees will always need n nodes to store n keys.

Once we have established that, we can therefore be happy by profiling memory allocation for the most generic case, where the two trees are both approximately bal­anced. Figure 3.23 shows the cumulative memory allocated for the whole test for instances of the two classes.

We can see that RandomizedTreap requires slightly more than twice the memory of BST. This is obviously not ideal, but is to be expected, considering that each node of a treap will hold a key (an Integer, in this test), plus a Double for the priority.

If we try a different kind of data for the keys—for instance, String—we can see that the difference becomes much smaller, as shown in figure 3.24. It’s just a ratio of 1.25 when storing strings between four and ten characters as keys.

5. Conclusions

The analysis of the comparative performance and height of BSTs and Randomized Treaps suggests that, while the latter requires slightly more memory and can be slower in the generic case, when we don’t have any guarantee on the uniform distribution of keys or on the order of the operations, using BSTs carries a far greater risk of becom­ing a bottleneck.

If you remember, when we introduced data structures in chapter 1 we made it clear: knowing the right data structure to use is more about avoiding the wrong choices than finding the perfect data structure. This is exactly the same case. We (as developers) need to be aware of the situations where we need to use a balanced tree to avoid attacks or just degraded performance.

It’s worth reiterating that the first part of the analysis, focusing on the height of the trees, has general value[8] and is independent of the programming language used. The analysis of running time and memory usage, instead, only has value for this implemen­tation, programming language, design choices, and so on. All these aspects can, in theory, be optimized for your application’s specific requirements.

My advice, as always, is to carefully analyze requirements, understand what’s critical in your software and where you need certain guarantees about time and memory, and then test and profile the critical sections. Avoid wasting time on non-critical sections; usually you’ll find that the Pareto principle holds for software, and you can get an 80% performance gain by optimizing 20% of your code. Although the exact ratio may vary, the overall principle that you can get a significant improvement by optimizing the most critical parts of your application will likely hold.

Try to get a balance between clean code, time used to develop it, and efficiency. “Premature optimization is the root of all evil,” as stated by Donald Knuth, because trying to optimize all of your code will likely distract your team from finding the criti­cal issues, and produce less-clean, less-readable, and less-maintainable code.

Always make sure to try to write clean code first, and then optimize the bottlenecks and the critical sections, especially those on which you have a service-level agreement, with requirements about time/memory used.

To give you a concrete example, the Java implementation we provide on the book’s repo makes heavy use of the Optional class (to avoid using null, and provide a nicer interface and a better way to handle unsuccessful searches/operations), and conse­quently also a lot of lambda functions.

If we profile in more detail the memory usage, disabling the filter on the package (to speed-up profiling, you usually want to avoid recording standard libraries, etc.), the final result is quite surprising, as we can see in figure 3.25.

You can see that most space is used by instances of Optional and lambdas (implicitly created in Optional::map, etc.).

A complementary example for performance could be supporting multi-threading. If your application does not involve (ever) sharing these containers among different threads, you can avoid making your implementation thread-safe and save the over­head needed to create and synchronize locks.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *