The first implementation of StringContainer that we will illustrate is the trie; incidentally, all the other data structures that we will show in the next sections are based on tries, so we couldn’t choose to start anywhere else.

The first thing you should know about trie is that it’s actually pronounced “try.” Its author, Rene de la Briandais, chose this term because it was similar to tree, but also because it’s part of the word retrieval, which is the main purpose of this container; its peculiar pronunciation is partly meant as an indented pun, and partly meant to avoid confusion with “tree.”

Tries were originally developed as a compact, efficient way to search strings in files; the idea behind this data structure is, as we saw in the previous section, providing a way to reduce redundancy by storing common prefixes of strings just once.

This couldn’t be achieved using a plain binary search tree, or with just a binary tree, so a paradigm shift was needed: de la Briandais then used n-ary trees, where edges are marked with all the characters in an alphabet, and nodes just connect paths.

Nodes also have a small but crucial function: they store a tiny bit of information stating if the path from root to current node corresponds to a key stored in the trie.

Let’s take a look at figure 6.2 before moving to a more formal description. It shows the structure of a typical trie, containing the words “a”, “an”, “at”, and “I”.

If you feel that figure 6.2 is a bit confusing, you are right. In their classic implementation, a trie’s nodes have one edge for each possible character in the alphabet used: some of these edges point to other nodes, but most of them (especially in the lower levels of the tree) are null references.^{[2]}

Nevertheless, this specific representation of tries looks terrible on paper: too many links and too many nodes that result in chaos.

That’s why we will use an alternative representation, shown in figure 6.3. We’ll only show links to actual trie nodes, omitting links to null.

Formally, given an alphabet Σ with |Σ |=k symbols, a trie is a k-ary tree where each node has (at most) k children, each one marked with a different character in X; links to children can point to another node, or to null.

Unlike k-ary search trees, though, no node in the tree actually stores the key associated with it. Instead, the characters held in the trie are actually stored in the edges between a node and its children.

Listing 6.1 illustrates a possible implementation of a Trie class (in object-oriented pseudo-code); you can also take a look at a full implementation on the book’s repo on GitHub.

Listing 6.1 Class Trie

**#type** Char[] Alphabet

class Node

**#type** boolean keyNode

**#type** HashMap<Char, Node>

children

**function** Node(storesKey)

** for** char in Alphabet **do**

**this**.children[char] ← **null **

**this**.keyNode ← storesKey

**class** Trie

**#type** Node

root

**function** Trie()

root ← **new** Node(**false**)

In the simplest version, trie’s nodes can only hold a tiny piece of information, just true or false. When marked with true, a node N is called a *key node* because it means that the sequence of characters in the edges of the path from the root to N corresponds to a word that is actually stored in the trie. Nodes holding false are called *intermediate nodes* because they correspond to intermediate characters of one or more words stored in the trie.

As you can see, tries go beyond the usual duality between leaves and inner nodes, introducing another (orthogonal) distinction. It turns out, though, that all leaves in a well-formed, minimal trie are key nodes: a leaf would make no sense as an intermediate node, as we will see.

The fact that words are stored in paths means that all the descendants of a node share a common prefix: the path from the root to their common parent. For instance, if we look at figure 6.3, we can see that all nodes share the prefix “an”, and all but one node share “ant”. These two words, incidentally, are also stored in the trie, because the nodes at the end of their path are key nodes.

The root is, to all extents, an intermediate node associated with the empty string; it would be a key node only if the empty string belonged to the corpus contained in the trie.

While for spell checkers storing a Boolean in each node could be enough (after all, we only need to know whether a word is in a dictionary), tries are often used to store or index words in texts. If that’s the case, we often need to know either how many occurrences of a word appear in the text or the positions where they appear. In the former situation we can store a counter in each node and only key nodes will have a positive value. In the latter we will instead use a list of positions, storing the index in the text where each occurrence starts.

### 1. Why is it better again?

Let’s address space first: why is the trie in figure 6.3 better than the binary search tree in figure 6.1?

Let’s do some quick math! But first, we need to make some assumptions:

- We only consider ASCII strings and characters, so we have to account for 1 byte for each char (Unicode wouldn’t change much; rather, it would make the savings obtained by using the cheapest option even greater) plus, in BSTs, 1 byte for each string’s terminator.
- We only explicitly store links to actual nodes in tries and account for a fixed number of bytes for null-pointers in BSTs (the same space taken by non-null references).
- As we mentioned, each node in a trie has |Σ| links, where |Σ| is the alphabet size. This means that in tries, and especially in nodes in the lower levels, most links are null, and indeed in listing 6.1 it’s also possible to see how all those links are initialized in Node’s
- For each node in a trie, we account for a fixed amount of space to store the children list (we can imagine that we use a hash table to store the link to children), plus a variable amount depending on the number of actual children.
- Each link in a tree will require 8 bytes (64 bit references), and each link in a trie will require 9 bytes (8 for the reference plus 1 for the character to which it’s associated).
- Each node in the BST will require as many bytes as the number of characters in its key, plus 4 bytes for the Node object itself.
- Each node in the trie requires 1 bit (to hold the Boolean) plus the same constant amount as for the BST; let’s round up to 5 bytes.

Given these premises, we note that the BST in figure 6.1 has 9 nodes (whose keys are strings) and consequently 2*9=18 links, while the trie in figure 6.3 has 19 nodes and 18 links. For the BST, the root node contains the key “anthem” and requires 27 bytes (4 for the Node itself, 7 for the string, 2*8 for the links). Likewise, its left child, with key “an”, requires 23 bytes. You see how it’s computed; it’s 21 bytes per node, plus the length of the string. For the whole tree, considering it has 9 nodes and requires a total of 47 bytes for the keys, we need 227 bytes.

Let’s now check the trie: each node requires 5 bytes, and each link 9 bytes—a total of 257 bytes.

So, in practice, this trie might require a little more memory than the corresponding BST. These quantities depend on many factors; first of all, the overhead for objects. Since this trie has more nodes, the more this overhead is, the larger the delta will be.

Obviously, however, the shape of the tree and the actual number of nodes also play a big role. In the example in figure 6.3, only a short prefix is shared among the keys. It turns out that tries are more efficient when holding keys with a large shared prefix. The example in figure 6.4 shows how the balance can be in favor of tries in these cases. As you can see, in figure 6.4 most trie nodes are black (key nodes), and when the ratio of key nodes versus intermediate nodes is higher, intuitively it means that the efficiency of the trie is also higher, because when a path has more than one key node, we are storing at least two words in a single path (one of which is a prefix of the other). In a BST they would require two BST nodes storing both of the strings separately.

Another sign of more-efficient storage is when there are deep nodes branching out. In that case, the trie is “compressing” the space needed for two strings with a common prefix by storing the common prefix only once.

In the end, with just nine words stored in this second example, using the same assumptions as shown previously, the difference becomes 220 bytes versus 134 bytes, with tries saving almost 40% of the space. If we consider an 8-byte overhead for nodes, the difference would be 256 versus 178 bytes, and the savings would be around 30%, which is still impressive. For large trees containing dictionaries or indexing large texts, we would be talking about hundreds of megabytes.

Figure 6.4 shows the best-case scenario for tries, but obviously things are not always looking this good. Figure 6.5 shows a different edge case, close to the worst-case scenario, where the longest prefix in a (degenerate) trie is the Figure 6.5 A degenerate trie, where no empty string. In cases like this, information string shares a prefix with another string ends up being stored very inefficiently; luckily, however, these edge cases are incredibly unlikely in real-world applications.

So much for space consumption. At worst, tries can be considered comparable to binary search trees. What about their running time? We will answer this question while looking at the individual methods in order to develop an understanding of how these results are derived.

### 2. Search

Let’s start with search. Assuming we have built a proper trie, how do we check whether it contains a certain key?

Turns out, it’s not too difficult, compared to a BST. The main difference is that we need to walk the tree one character (of the searched key) at a time, following the link that is marked precisely with that character.

Both strings and tries are recursive structures, whose unit of iteration is the single character; each string, in fact, can be described as either

- The empty string, “”
- The concatenation of a character c and a string s’: s=c+s’, where s’ is a string one character shorter than s and can possibly be the empty string

**NOTE** In most programming languages single quote characters wouldn’t be allowed in variable names, so we use tail in listing 6.2 as a substitute name for s’ , which denotes the tail of current string s in the figures.

For instance, the string “home” is made of the character ‘h’ concatenated to the string “ome”, which in turn is ‘o’ + “me”, and so on, until we get to “e”, which can be written as the character ‘e’ concatenated to the empty string “” .

A trie, on the other end, stores strings as paths from the root to key nodes. We can describe a trie T as a root node connected to (at most) |∑| shorter tries. If a sub-trie T’ is connected to the root by an edge marked with character c (c e ∑), then for all strings s in T’, c + s belongs to T.

For instance, in figure 6.3, the root has only one outgoing edge, marked with ‘a’; considering T’ as the only sub-trie of the root, T’ contains the word “n”, and this means that T contains ‘a’ + “n” = “an”.

Since both strings and tries are recursive, it’s natural to define the search method recursively. We can consider just the case where we search the first character of a string s=c+s’ starting from the root R of the trie T. If c, the first character of s, matches an outgoing of R, then we can search s’ in the (sub)trie T’. We assume the root of the sub-trie is not null (as we’ll see shortly, it’s a reasonable assumption).

If at any point s is the empty string, then we have traversed the whole path in the trie corresponding to s. We then need to check the current node to verify whether our string s is stored in the tree.

If, instead, at some point the current node doesn’t have an outgoing edge matching current character c, then we are sure string s is not stored in the trie.

We’ll illustrate these examples in figures 6.6 and 6.7, but first, let’s take a look at the implementation of the search method that we will then use as a reference while describing these examples.

In listing 6.2 we show a recursive implementation of the search method. At each call, we need to check if the first character in the substring searched matches an outgoing edge of the current node, and then recursively search the tail of the current string in the subtree referenced by that edge. Figures 6.6 and 6.7 shows the parallel between moving “right” in the string searched and traversing down the trie.

The same method can obviously be implemented using explicit loops. This implementation of the search method is likely suitable for a compiler’s tail-call optimization,^{[6]} but as discussed in appendix E and chapter 3, if you are not comfortable with recursion or are not sure that your compiler will apply tail-call optimization, my advice is to write the iterative version of these methods to avoid stack overflow.

Listing 6.3 shows the trie’s API counterpart, which in turn will call the method in listing 6.2. We will omit these wrapper methods for the other operations when they are as trivial as for search.

As already mentioned, there are two possible cases of unsuccessful search in tries: in the first one, shown in figure 6.6, we traverse the trie until we get to a node that doesn’t have an outgoing edge for the next character. In the example, a call to trie.search(“any”), this happens when we get to the key’s last character, ‘y’ (as shown in the right diagram in figure 6.6). In listing 6.2, this corresponds to the condition in the if at line #5 returning true.

The other possible case for an unsuccessful search is that we always find a suitable outgoing edge until we recursively call search on the empty string. When this happens, it means we’ve reached the trie’s node whose path from the root spells out the searched key. For instance, in figure 6.7 we traverse the tree link by link and check the string key character by character, until the condition at line #2 of listing 6.2 returns true and we go to line #3.

The result of line #3 is the only difference between a successful and an unsuccessful search. Looking at the example in figure 6.7, a successful search for the word “ant” would follow the exact same steps, with the only difference being that the final node (denoted with T in the rightmost diagram) would have to be a key node.

Notice that in listing 6.3, we can avoid checking for empty tries or handling the root as a special case, because in the trie constructor in listing 6.1, we create the root as an empty node. This (together with careful implementation of all the methods) will also support our assumption that the node argument in search (and all the other methods) is never null.

The search method is the most important method for tries, because all the other methods will be based on it. Search is so crucial to those implementations that we provide (in listing 6.4) a variant, searchNode, that returns the node found, rather than just true or false. We’ll see in the next sections how this can be used to implement remove.

Performance-wise, how fast is search? The number of recursive calls we make is limited by the smaller of two values: the maximum height of the trie and the length of the search string. The latter is usually shorter than the former, but either way, for a string of length m we can be sure that no more than O(m) calls are going to be made, regardless of the number of keys stored in the trie.

The key is, then, how long each step takes. It turns out there are three factors influencing this cost:

- The cost of comparing two characters: this can be assumed to be O(1).
- The cost of finding the next node: for an alphabet Σ of size k, this can be, depending on the implementation:
- Constant (amortized or worst-case), O(1), using hash tables for edges
- Logarithmic worst-case O (log(k)), using a balanced tree
- Linear worst-case O(k), using plain arrays

Amortized constant time can reasonably be assumed in most cases.

- The cost of following a link and of splitting a string into head+tail. This point is the one where we need to be really careful. The naive approach of extracting a substring for each node would be a performance disaster in most programming languages. Strings are usually implemented as immutable objects, so extracting a substring would require linear time and extra space, O(m), for each call. Luckily, the solution is simple: we can pass to the recursive call a reference to the beginning of the string and the index of the next character. This way, even this operation can be considered O(1) .

Since each call can be implemented in such a way as to require amortized constant time, the whole search takes O(m) amortized running time.

### 3. Insert

Like search, insert can be better defined recursively. We can identify two different scenarios for this method:

- The trie already has a path corresponding to the key to be inserted. If that’s the case, we only have to change the last node in the path, making it a key node (or, if we are indexing a text, adding a new entry to the list of indices for the word).
- The trie only has a path for a substring of the key. In this case, we will have to add new nodes to the trie.

Figure 6.8 illustrates an example of the former situation, where a call to insert (“anthem”) on the trie in figure 6.7 will result in adding a new branch to one of the leaves of the tree.

The method, as shown in listing 6.5, mainly consists of two steps. In the first step, we traverse the tree, following links corresponding to the next character in the key to add, until we either have consumed the whole input string, or we get to a node where there is no outgoing edge matching the key’s next character.

Using listing 6.5 as a reference, we can see that the first step of the algorithm is implemented in lines #1 to #7, where we keep traversing the tree using the characters in the key to insert to choose the next branch to traverse. This is exactly the same as for search, with one difference: if we consume all characters in the input string (meaning that we traversed the whole path from the root, and reached the target trie node) we just have to set the node at the end of the path to a key node. This corresponds to the first scenario described at the beginning of the section (not shown in figure 6.8).

When we reach the status shown in the last diagram of the left half of figure 6.8, it means that the condition at line #6 of listing 6.5 has become false. If that’s the case, we need to jump to line #9 and add a brand new branch to the tree for the remaining characters in the string: by doing so, we are adding a suffix to the string matching the path from root to current node (in the example, we will add the suffix “hem” to the string “ant” already in the trie).

This last operation is implemented in a different utility method that is shown in listing 6.6 using (surprise!) recursion.

The definition of the method is quite straightforward. As shown in the right half of figure 6.8, we just consume one character of the remaining string, create a new edge marked with this character and a brand new, empty node N at the other side of the edge, and then recursively add the tail of the string to the tree T rooted at N.

Similarly to what we did for search, we can prove that insert also takes O(m) amortized time, if the creation of a new node can be performed in constant time (which is the case if we use hash tables for edges, but not if we use plain arrays).

### 4. Remove

When it comes to removing a key from a trie, we are in the position to choose. We can go for an easier, cheaper algorithm that will cause the tree to grow beyond what’s necessary, or we can implement the full method, more complicated and possibly slower in practice, but with the lowest impact on memory.

The difference between the two alternatives is that the first one simply unmarks a key node, making it an intermediate node, and doesn’t worry about the tree structure. This can be easily implemented reusing the searchNode method in listing 6.4, shown in listing 6.7.

What’s the issue with this method? Take a look at figure 6.8. If we just transform a key node N into an intermediate node, we can have two possible situations:

- N is an inner node, so it will have children storing keys that can only be reached by passing through N itself.
- N is a leaf, which means that there is no key stored in the trie that can only be reached through N.

The case where N is a leaf is illustrated in figure 6.9. After “unmarking” the key node at the end of the path, we can see that the trie has a “dangling” branch that contains no key. While it is perfectly fine to leave such a branch, because all the methods for manipulating tries will still work, if data is dynamic and there is a large ratio of deletions on the trie, the amount of memory wasted in dangling branches can become significant.

The solution, in these cases, is implementing a pruning method that will remove dangling nodes while backtracking the path traversed during search. If we have the guarantee that the trie is “clean,” meaning there were no dangling branches before removing the current node, there are just two conditions for which we would stop pruning while backtracking:

- We reach a key node. Obviously, we can’t remove a node holding a key.
- We reach an inner intermediate node. After removing the last edge in the path being backtracked, if the current node becomes a leaf, it can be removed; otherwise, if this node has other children, then all its sub-branches will hold at least a key (because of our premise) and therefore the current node corresponds to an intermediate character in one or more strings stored in the trie, and it can’t be deleted.

Listing 6.8 shows the implementation of this method performing deletion + pruning. Using figure 6.9 as a reference, you can see that, after turning it to an intermediate node (line #4), the node at the end of the path “anti” becomes a worthless leaf. We backtrack to its parent and remove the edge marked with ‘i’ (lines #9 to #12 in listing 6.8), and then the node at the end of the path for “ant”, which also is an intermediate node, becomes a leaf too, and can thus be removed.

When we backtrack once more, we can see that its parent is both a key node, and has another child, so we can’t prune the tree anymore.

This version of removal obviously needs a different implementation of the trie method than the naive one shown in listing 6.7. In this case, though, the API method is even simpler than the one shown in listing 6.7; basically, itjust becomes a wrapper.

Performance-wise, the same considerations made for search and insert apply for remove. If we implement the method with pruning, the number of operations on the tree is at worst 2*m, twice as much as for the naive version without pruning, for which at most m edges are traversed. Execution time is also probably going to be more than two-fold, because, as you can see from the code, the delta in code complexity is relevant.

The tradeoff for faster running time, however, is that the tree can grow significantly if we don’t prune dangling branches; the best choice depends on your requirements and context. If you have a dynamic set and expect many calls to delete, you’d better use pruning. If, instead, the ratio of insertion/removal is low, or you expect to have to add back deleted strings frequently or shortly after, then you are better off with the faster (though messier) removal shown in listing 6.7.

### 5. Longest prefix

With remove, we completed our overview of the classic operations on containers. As we mentioned, though, tries can also provide two new kind of operations, which are the most exciting parts of this data structure.

In this section we’ll focus on the method that returns the longest prefix of the searched string; given an input string s, we traverse the trie following the path corresponding to characters in s (as long as we can), and return the longest key we found.

Sometimes, even if a key wasn’t stored in our trie, we are just interested in getting its longest prefix. We’ll see an example in the applications section.

The search for the longest prefix is almost entirely the same as the search method. The only difference is that in a recursive implementation, when we backtrack we need to check whether we have already found a key, and if we haven’t and the current node is a key node, we’ll have to return the current node’s key. This also means that we have to return a string, not just true or false, and at each call keep track of the path traversed, because we need to know what we can return. Since backtracking walks the path backward, the first key node we find while backtracking will hold the longest prefix.

Listing 6.9 clarifies these concepts by showing the implementation of this method. If you recall how insert works, it can be rethought as a two-step operation: finding the longest common prefix of the key to be inserted already in the trie, and then adding a branch with the remaining characters. As an exercise, try to rewrite the pseudocode for insert by leveraging method longestPrefix.

As with the other methods we have described so far, this operation is also linear in the length of the searched string: O(m), if |s|==m.

### 6. Keys matching a prefix

The last method we are going to describe returns all the keys matching a certain prefix.

If you stop and think about the definition of a trie, the implementation for this method will flow almost naturally. Even the alias for this data structure, prefix tree, suggests a solution. We have seen, in fact, that tries compactly store strings sharing the same prefix, because each string is translated in a path from the root to a key node, and strings sharing the same prefix will result in sharing the same path in the trie.

For instance, in figure 6.3, all the strings “and”, “ant”, “anthem” share a portion of their path, corresponding to the common prefix “an”.

Listing 6.10 shows the implementation of this method. Not surprisingly, it’s one more method that leverages the searchNode method defined in listing 6.4.

Clearly, there is a new method that we still need to define: allKeys, the method that traverses a (sub)trie and collect all its keys; this method, shown in listing 6.11, is a traversal of the whole subtree. We traverse all branches for each node, and we only stop following a path when we reach a leaf. We also need to pass the (string corresponding to the) path traversed so far, up to node, as the second argument, because we will need that to know which key we should return.

When we run the asymptotic analysis for this method, we need to be especially careful with line #6. Depending on the programming language and data type used, concatenating lists can be quite expensive, if it’s not done right.

The most efficient way to accumulate the keys found would be to pass a third parameter to the method, an accumulator, to which we would add each key only once and in one place, line #4.

Under this assumption, the running time for method allKeys is O(j), for a trie with j nodes, and therefore the worst-case upper bound for the method keysStartingWith is O(m+j), for a trie with j nodes and a prefix with m characters.

The caveat is that it’s hard to know or even estimate how many nodes a trie will have based on the number of keys it stores. If, however, we know it contains n keys whose maximum length is M, then the worst-case (loose) bound for a non-empty string is O(m + n* (M-m)), corresponding to a degenerate trie where all words share exactly the prefix searched, and no further character.

In the example shown in figure 6.10, we search for all keys matching prefix “ant”, so n=6, m=3 and M=8 (the length of the longest key, “antidote”).

If, instead, we search all keys starting whose prefixes include the empty string, this will return all keys in the tree, and the running time will be O (n*M), which would also be the worst-case upper bound for the method.

### 7. When should we use tries?

Now that we have described all the main methods on tries, it feels like taking a moment to recap would be a good idea. Table 6.2 shows the performance of tries on these methods, compared to the equivalent methods for balanced BSTs.

Table 6.2 helps answer the question about performance that we put on hold in section 6.2.1. We saw when a trie would require less memory than a BST; now we also know that it would almost always be faster.

Remember that while in general we express the running time for BSTs in terms of n, the number of entries stored, in this case we can’t assume that the cost to compare two keys is O(1), but it’s rather O(m), depending on the length m of the shortest of the two keys.

The third column in table 6.2 shows the results for a particular variant of BSTs where we store a hash of the string, together with the key itself, in each node. This approach, which requires an extra O(n) memory to store these fields, allows for a fast two-pass comparison. Given a search string w, we compute h (w) before starting the search. Then for each node we first check whether h (w) matches the node’s hash (which requires constant time), and only when it does do we perform a proper strings comparison.

Before moving on, let’s also recap the pros and cons of using tries, and when we should prefer a trie over a BST.

On the pros side, compared to using BSTs or hash tables

- The search time only depends on the length of the searched string.
- Search misses only involve examining a few characters (in particular, just the longest common prefix between the search string and the corpus stored in the tree).
- There are no collisions of unique keys in a trie.
- There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
- A trie can provide an alphabetical ordering of the entries by key.

As appalling as this list looks, as we have repeated many times, unfortunately there is no perfect data structure. So even tries do have some downsides:

- Tries can be slower than hash tables at looking up data whenever a container is too big to fit in memory. Hash tables would need fewer disk accesses, even down to a single access, while a trie would require O(m) disk reads for a string of length m.
- Hash tables are usually allocated in a single big and contiguous chunk of memory, while trie nodes can span the whole heap. So, the former would better exploit the principle of locality.
- A trie’s ideal use case is storing text strings. We could, in theory, stringify any value, from numbers to objects, and store it. Yet, if we were to store floating point numbers, for instance, there are some edge cases that can produce long meaningless paths, such as periodic or transcendent numbers, or results of certain floating points operations such as 0.1+0.2, due to issues with double pre-cision representation.
- Tries have memory overhead for nodes and references. As we have seen, some implementations require each node to store an array of | X | edges, where X is the alphabet used—even if the node has few or no children at all.

In summary, the advice could be to use tries when you have to frequently perform prefix searches (longestPrefix or keysWithPrefix). Use hash tables when data is stored on slow supports like disk or whenever memory locality is important. In all intermediate cases, profiling can help you make the best decision.

Tries offer extremely good performance for many string-based operations. Due to their structure, though, they are meant to store an array of children for each node. This can quickly become expensive. The total number of edges for a trie with n elements can swing anywhere between |X |*n and |X |*n*m, where m is the average word length, depending on the degree of overlap of common prefixes.

We have seen that we can use associative arrays, dictionaries in particular, to implement nodes, only storing edges that are not null. Of course, this solution comes at a cost: not only the cost to access each edge (that can be the cost of hashing the character plus the cost of resolving key conflicts), but also the cost of resizing the dictionary when new edges are added.

Source: Rocca Marcello La (2021), *Advanced Algorithms and Data Structures*, Manning Publications (2021)