Use case: Introducing synchronization

So far, we have always assumed that our cache objects were used in a single-threaded environment.

What do you think would happen if we used code in listing 7.6 in a concurrent environment? If you are familiar with concurrent code, you can already imagine that we might have issues with race conditions.

Let’s take another look at that code, copied over to listing 7.8 for your conve­nience, with some minimal simplifications.

Listing 7.8 Simplified version of LFU cache set

function LFUCache::set(key, value)

if (this.hashTable.contains(kev))then                        #1

node ← this.hashTable.get(key)                          #2

node.setValue(value)                                    #3

return node.updatePriority(node.getCounter())                #4

elsif getSizell >= this.maxSize                                   #5

evictOneEntry()                                              #6

newNode ← this.elements.add(key, value,1)                         #7

hashTable.set(key, newNode)                                       #8

return true

Let’s imagine we have two concurrent calls to add two new entries (neither of them is in the cache yet) and our cache already contains maxSize-1 elements. In other words, the next element will fill the cache to its maximum capacity.

Now those two requests will proceed in parallel and suppose call A is the first one to get to line 5 in listing 7.8. The condition is false; then it moves to line 7. But before call A executes line 7, call B gets to line 5, and—guess what?—the if condition is still false, so call B will move on to line 7 as well and execute it and line 8.

As a result, the cache will hold one element more than allowed. Figure 7.12 shows this situation. The good news is that on line 5, we already do things the right way and check if current size is greater than or equal to the max allowed size. If we were just checking for equality, this race condition would have caused the cache to grow indefi­nitely, likely until it overflowed your system’s heap.

Figure 7.12 A shared cache used in a multithreaded environment, without implementing synchronization. (A) Two threads concurrently try to add a new element to a cache with only one free spot. When the LFUCache: :set method inside each thread concurrently checks for cache size (line 5 in listing 7.8), they both see it’s not full. (B) Without synchronization, both threads add an element to the cache (line 7), without evicting. Depending on the implementation of the cache (and of elements.add), this could lead to overflow of the cache or just to one of the two values being overwritten by the other. It’s important to understand that even if the cache handles purging internally, as it should be (the app should only worry about reading/writing elements; only the cache should decide when purging is necessary), if cache’s methods are not synchronized, race conditions can and do happen inside those methods.

Likewise, line 1 could bring to race conditions if two parallel calls to set the same key happen simultaneously.

It also gets worse than this. For instance, think about how we might want to fix this problem. You might be tempted to add another check right before line 7, but would that work? It would somehow lower the odds that things might go berserk, but it wouldn’t really solve the problem.

The issue here is that the set operation is not atomic: while we are in the process of setting a new entry (or updating an old one), at the same time we might be perform­ing the same operation in another thread for another entry, the same operation for the same entry, or even an entirely different operation, such as evicting an element from the cache.

When we operate in a single-threaded environment, calling cache’s methods syn­chronously, things go smoothly because each method call is completed before anyone else can change the cache, so we have the illusion of atomicity.

In a multi-threaded environment, we need to be careful and explicitly regulate the execution of these methods and access to the shared resources, so that all methods that mutate the state of an object are executed simulating full isolation (so no one else can modify the resource, and perhaps also no one else can read from it). Figure 7.13 can give you an idea of how we can correct the workflow to prevent the race condi­tions previously discussed.

For composite data structures such as caches, we need to be twice as careful, because we need to make sure that also the basic data structures upon which they’re built are thread-safe.

And thus, for LFU, we need to ensure that both the priority queue and the hash table support concurrent execution.

Modern programming languages provide plenty of mechanisms to ensure thread- safety, from locks to semaphores to latches.

An extensive covering of this topic would require a book on its own, so unfortu­nately we will have to make do with an example. We’ll show how to solve this problem in Java, providing the full code for thread-safe LRU/LFU in our repo on GitHub.

1. Solving concurrency (in Java)

If we are making another exception in this chapter, showing some code in a specific programming language rather than pseudo-code, it’s because we feel that these con­cepts can only by tackled effectively within the concrete context of a real language.

As many programming languages support the same concepts, it should be easy to port this logic to your favorite environment.

We actually need to start from the class constructor, implemented in listing 7.9, because there will be new elements to be added.

Listing 7.9 Java implementation, LFU cache constructor

LFUCache(int maxSize) {

this.maxSize = maxSize;                                                    #1

this.hashTable=new ConcurrentHashMap<Key,    PriorityQueueNode >(maxSize); #2

this.elements = new ConcurrentHeap<Pair<Key, Value>, Integer>();           #3

ReentrantReadWriteLock lock = new ReentrantReadWriteLock();                #4

this.readLock = lock.readLock();                                           #5

this.writeLock = lock.writeLock();

}

There are a few noticeable changes with respect to listing 7.5, besides, of course, switching to Java syntax.

At lines 2 and 3, notice how we used a ConcurrentHashMap instead of a simple HashMap, as you might have expected, and the same goes for the ConcurrentHeap28 class. We want our internal data structures to be synchronized as well, to guarantee atomicity and isolation. Technically, if we handle the synchronization of LRUCache’s methods correctly, there will be no concurrent access at all to its internal fields, so we could also be fine with the regular, single-thread data structures. If we use their con­current versions, we need to be super-careful, because that could lead to bugs and deadlocks.29

2. Introducing locks

When we get to line 4, though, we see a new entry, a brand-new attribute for our class: an object of type ReentrantReadWriteLock. Before explaining what a reentrant lock is, we should make one thing clear: this is the mechanism that we are going to use to han­dle concurrent access to LFUCache’s methods. Java provides several different ways to do so, from declaring methods as synchronized to using semaphores, latches, and so on of these mechanisms. Which one is the best depends—as always—on the context, and sometimes more than one method could work. In this case, I prefer using a reen­trant read/write lock rather than declaring methods as synchronized, because it gives us more flexibility in deciding when we should use a lock on read and when on write. As we’ll see, there might be a big impact on performance if we don’t get this right.

But first things first: we still need to define what a lock is, not to mention what reentrant means.

A lock is a concurrency mechanism whose name is pretty self-explanatory. You probably have already heard of database locks, and the principle is exactly the same for code. If an object (an instance of class) has some code wrapped inside a lock, this means that for that instance, no matter how many calls are made to that method, the locked portion of code can only be executed by one call at a time—all the others will have to wait for their turn.

In our example, it means that of the two calls to set, while the first is executed, the second is left waiting.

As you can see, this is a powerful mechanism, but it’s also dangerous, because fail­ing to release a lock can leave all other calls hanging forever (deadlock) and even just using locks excessively can degrade performance sensibly, introducing unnecessarily long delays.

Locks basically move execution from parallel to synchronous (hence the term syn­chronization) by using waiting queues to regulate access to a shared resource.

Explaining in-depth how a lock works and how deadlocks can happen and can be prevented usually takes a couple of chapters in a book on operating systems, so it’s out of our scope. We encourage interested readers to go on and read more on the subject because it’s becoming more and more important for modern programming and web applications, but also compute-intensive applications that run in parallel on GPUs.

Now, what does reentrant mean? As the word suggests, with a reentrant lock a thread can enter (or lock) a resource more than once, and every time it does, a counter is incremented by one, while when it unlocks the resource, the counter is decremented by one, and the lock is actually released when this counter gets to zero. Why is it important to have a reentrant lock? Because if the same thread tries to lock the same resource more than once, we could get into a deadlock.

No, seriously, why is it important to us? You will have to wait a few paragraphs until we will show you in practice.

3. Acquiring a lock

So, now that we have an idea of what a lock is, we are ready to take a look at the modi­fied version of the set method in listing 7.10.

As you can see, as soon as we enter the method, we lock our resource (the whole cache) for writing. What does it mean? When a write lock is set on a resource, the thread holding the lock is the only one that can write to and read from that resource. It means that all other threads, either trying to read or write on the resource, are going to have to wait until the current lock holder releases it.

The second instruction here is a try, with a finally block at the end of the method (lines marked with 4 and 5), where we release the lock. This is necessary to prevent deadlock: if any of the operations inside the try fail, we still release the lock before exiting the function, so that other threads can acquire the lock and not be blocked anymore.

Listing 7.10 Java concurrent LFU cache set

public boolean set(Key key, Value value) {

writeLock.lock();                                                                          #1

try {                                                                                      #2

if (this.hashTable.contains(key)) {

PriorityQueueNode node = this.hashTable.get(key); node.setValue(value);

return node.updatePriority(node.getCounter());

} else if (this .getSize() >= this.maxSize) {

this.evictOneEntry();                                                             #3

}

PriorityQueueNode newNode = this.elements.add(key, value, 1);

this.hashTable.put(key, newNode);

return true;

} finally {                                                                            #4

writeLock.unlock();                                                               #5

}

}

In these examples, we are still using the hash table to store references to the priority queue nodes, the way we were storing references to the linked list nodes for LRU. While for LRU this makes sense, in order to keep the constant running time for all operations, as explained in section 7.4.2, for LFU cache we can store just values in the hash table. This makes implementation simpler, as shown on our repo, but possibly at the cost of changing the running time for the operation.

Take a minute to think about it before reading the explanation. The reason, as you may have figured out, is simple: both in set and get we call updatePriority on the priority queue. If we have a reference to the node to update, pushdown and bubbleUp will require logarithmic time, as shown in chapter 2, sections 2.6.1 and 2.6.2.

Finding an element in a heap, however, would normally require linear time. Alter­natively, in section 2.6.8 we showed that it is possible to use an auxiliary hash table inside our priority queue and have search performed in amortized constant time, so that updatePriority would then run in amortized logarithmic time.

We previously hinted at the fact that we should go easy on using locks, so the ques­tion arises: Could we acquire the lock later in the function?

That depends. We need to make sure that while we check the hash table for key, no other thread is writing on it. If we use a concurrent structure for the hash table, we could move the write lock acquisition inside the first branch of the if, but then we would also need to change the rest of the function (because evicting one entry and adding the new one should all happen as an atomic operation) and be careful with how we use the hash table, since we introduce a second resource on which we lock, and that might lead to deadlock.

So, the short answer is, let’s keep it simple (also because we wouldn’t have much of an advantage anyway).

4. Reentrant locks

Now we can finally explain how a reentrant lock prevents deadlock in our case. At the line marked with #3 in listing 7.10, the current thread calls evictOneEntry to remove one item from the cache. We decided to keep the method private in our example, for good reason, but let’s now imagine that we need to have it as a public method, guaran­teeing the option to clients to free space in the cache. (Maybe because of memory or garbage collection issues we need to shrink the cache dynamically—we certainly would rather not restart it altogether and lose all of its content.)

In that case, we need to make evictOneEntry synchronized as well, and hence acquire a write lock inside of it. With a non-reentrant lock, evictOneEntry would try to acquire the write lock, but set has already acquired it, and can’t free it until evictOne­Entry finishes. Long story short, our thread is stuck waiting for a resource that will never be freed, and all other threads will soon be stuck as well. That is basically a recipe for deadlock, and it’s that easy to get yourself into such a situation. That’s why you should be careful with synchronization mechanisms.

With reentrant locks instead, since both evictOneEntry and set belong to the same thread, evictOneEntry on line 1 is able to acquire the lock that is held by set and can go on with its execution. Deadlock avoided, at least this time!

5. Read locks

We haven’t yet talked about read locks. We mentioned how important they are for per­formance, but why? Let’s revisit the get method in listing 7.11.

We acquire a read lock before reading data from the cache’s internal fields and release it as the very last step before exiting the function.

It seems exactly like what we do with set, right? The only change is that we use a read lock instead, so the difference must be there—and it actually is.

Listing 7.11 LFU cache get, Java version

public Value get(Key key) {

readLock.lock();                                                             #1

try {

PriorityQueueNode node = this.hashTable.get(key);

if (node == null) {

return null;

} else {

node.updatePriority(node.getCounter()+ 1);                              #2

return node.getValue();

}

} finally {

readLock.unlock();                                                      #3

}

}

Only one thread can hold a write lock at a single time, but a read lock can be held by many threads at the same time, provided no thread is holding a write lock.

When we read from a data structure (or a database) we don’t modify it, so if 1, 2, 10, or a million threads are reading from the same resource at the same time, they will always get consistent results (if the invariant is implemented correctly, and we don’t have side effects that mutate the resource).

When we write on a resource, we change it, and while the process is ongoing, all reads and writes will make no sense, because they will be based on inconsistent data. That leads to four possible combinations of locks (Read-Read, Read-Write, Write- Read, and Write-Write), as shown in table 7.4.

If any (other) thread is holding a read lock, we can acquire another read lock, but need to wait for a write lock until all the read locks are released.

If any thread is holding a write lock, all other threads need to wait to acquire either a write or a read lock.

So, the advantage of actually distinguishing between read and write locks for our cache is that all calls to get can be executed in parallel, and only calls to set will block. Had we not made this distinction, access to the cache would as a result have been entirely synchronous, with only one thread at a time being able to check the cache. This would have introduced an unnecessary latency in our application, reduc­ing the advantage of using a cache and possibly even making it counterproductive.

6. Other approaches to concurrency

Using locking mechanisms is not the only way to deal with concurrency. Locking han­dles the access to a segment of code that mutates a resource, so that we are sure that these instructions applying modification will be executed by one thread, and one thread only, at each instant in time.

On the other end of the spectrum, there is a completely symmetrical approach championed by functional programming: remove mutability from your resources altogether.

One of the principles of functional programming is, in fact, that all objects should be immutable. This has several advantages; for instance, it makes it easier to reason about code because we can analyze a function to be sure that no other method will influence what it does, and it removes race conditions altogether. It also has some dis­advantages; for instance, it makes some operations harder, like keeping a running state, and makes it more expensive to update state in large objects.

When we talk about concurrency, having immutable objects means that you can read your resource without worrying that someone else will change it while you are reading it, so you don’t have to set a lock.

Writing remains a bit more complicated. In the functional world, a method like set would return not a Boolean, but a new instance of the cache itself, where its con­tent has been updated.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *