Use case - First attempt: Remembering values

At this point, we know that we can’t afford to compute each intermediate result over and again every time we receive a call; that would be too expensive. So, the most natural thing to do is store these intermediate results once we compute them and look them up when we need the results again.

We are obviously in luck because the function to produce the summary takes only one argument, the name (or internal ID) of the company and its domain^[1] is relatively small.^[2] And thus we can easily identify the intermediate results we need, and we can expect to reuse them several times. If our application runs for long enough without restarting, after an initial warm-up period when we have to actually compute values (and hence latency will be high during this warm-up), we will have computed enough intermediate results to respond to calls quickly, without the extra latency due to the external HTTP calls.

Take a look at how these assumptions change our architecture in figure 7.2, and meet the cache, our knight in shining armor that will save us from failing our SLA and going out of business.

To get rid of all the nitty-gritty HTTP related details that are not particularly important for our analysis, let’s abstract them out in a magic function that can be called by our sentiment analyzer and just return all the data we need for all the social networks we draw our posts from. Figure 7.3 shows the simplified architecture for the sentiment analyzer.

Figure 7.2 With the introduction of cache, before issuing a call to the app server, the web server checks if the results have already been computed, and in that case skips steps 3-6 in this figure. Note that cache can be added also for the single loaders, in case we allow partial results (for instance, if one of the external services is down). In theory, it can also be added to the “data cruncher” instead of the web server, delegating the interface with the data gatherer to the data cruncher, so that the web server directly calls only the data cruncher. Having cache on the web server (physically on the same machine), however, has several advantages.

It looks much cleaner now, right? Remember, we are hiding all complexity in the “sentiment generator” because we want to focus on the caching mechanism, and we are not interested in the details of how this sentiment is computed. Nonetheless, this doesn’t mean that you can only use caching in-memory or for standalone applications; on the contrary, web applications are probably one of the best places to add some caching layers: the detailed example from which we started should have made this clear.

Moreover, it is important to remember that simplifying complex things by abstracting low-level details is a paramount technique in algorithm analysis. For instance, we often assume that some containers will store integers instead of worrying about every possible type of entry that could be stored, and even the RAM model itself, that we introduced in appendix B, is a simplification to hide the details of the myriad of different configurations that real computers can have.

Figure 7.3 The architecture of our example application, after abstracting out all details relative to the generation of the summary into an ad hoc component, the “sentiment generator”, that we can imagine is hosted on the web server, and is called synchronously from the web app.

1. Description and API

As always, let’s draft an API and a contract that our data structure should adhere to.

2. Fresh data, please

Now that we have a first solution for our specific example, we could ask if these conclusions apply also to the general case. To that end, we need to discuss some concerns that we’ve swept under the proverbial rug so far. We need to question if the computation is static, isolated, and time-independent. Think, for instance, of a matrix product or numerically computing a complex integral. If that’s the case, we are in luck, because we can compute the results once (for each input), and they will be valid forever.

It might happen, instead, that we have to deal with some computation that can vary depending on time (for instance, on day of the week or month) or on any other external factor (for instance, an aggregate of daily orders that will change when new orders are placed and when the date changes). In this case, we would have to be careful when reusing values we have already computed, because they can go stale; meaning that under some conditions, the value once computed and stored in our cache might not be the most up to date, or even relevant, anymore.

How we deal with stale cache depends heavily on the context. In some cases, even stale values can provide an acceptable approximation when a certain margin of error is acceptable. For instance, if we do compute aggregates on daily sales and show them in a live chart, we might be OK with having data synched every minute or every 10 minutes. This would avoid recomputing all the aggregates (possibly grouped by product or store) every time the chart is displayed in our reporting tool, which could be used by tens of employees at the same time.

Probably the best example explaining why this “controlled staleness” can be acceptable is provided by web caches. HTTP standards allow the server (the provider of the content) to add headers to resources (web pages, but also images, files, and so on) to let the client (usually the browser) know when it is allowed to read a certain resource from cache, and also to let intermediate cache nodes know if and for how long the response provided by the server to an HTTP (GET, usually) call can be cached.

Vice versa, if we have a time-critical, safety-critical application, such as a monitoring application for a nuclear power plant (for some reason, this example works well in communicating a sense of criticality . . . ), then we can’t afford approximations and stale data (certainly not for more than a few seconds).

The other open questions are these: What makes data stale and do we have a quick way to know when that happens? If data simply ages, we can store a timestamp when we write it to cache and check it when we read it. If it’s too old to still be relevant, we can recompute and update the cache. If there are other external conditions—for instance, a change in configuration or some event, such as a new order made or a change of the required precision for calculus—we are still fine as long as we can check these conditions, and the check is relatively inexpensive with respect to computing the value again from scratch.

In the rest of the chapter we simplify the problem by assuming that cached content doesn’t go stale. Handling stale content can be seen as an orthogonal task that can enhance the base caching mechanism we built, for the most part independently of the choices we made for this mechanism.

3. Handling asynchronous calls

Acknowledging the existence of stale data and discussing workarounds was a good starting point toward generalizing caching solutions to more than just our carefully crafted example. The next step is generalizing the computation model. So far, we have assumed we are in a single-threaded environment, so calls are handled by a synchronous queue and are executed one after the other. This is not only unrealistic; this is also wasteful. Real-world applications can’t afford the latency that would result in leaving clients waiting while previous calls are being processed. We can run many copies of this pipeline, as shown in figure 7.4, but each of them will have its own cache and will be able to handle one call at a time. Caching would not be as effective as possible, because if two threads get the same request, neither will be able to read the result from cache. But even worse, assuming the sentiment generator works synchronously, handling one request at a time slows us down even more.

Consider any of the threads depicted in figure 7.4 and suppose the cache has already stored the intermediate results for Twitter and Facebook when the sentiment generator receives four more requests for Google, Twitter, Facebook, and Google. For the first request, we’ll have to compute everything from scratch, but for the next two, we already have all we need in cache, and we could output the result immediately. In a synchronous architecture, however, we would be forced to wait for the first call to complete before the next two (and all the others that would possibly pile up in the meantime) could be processed by the sentiment generator and returned. In an asynchronous configuration, however, the web app can call the sentiment generator asynchronously for each sentiment analysis request, “join” on the responses containing the intermediate results, and return as soon as all the information for a call is gathered, and it can compute the final result. What happens, though, if in the figure’s example sequence the second request for Google is processed while the first one is still computing the intermediate results?

If the first call hasn’t finished yet, no value is added to the cache for Google. So, when the second call comes, the web app sees a cache miss and calls the sentiment generator to retrieve all the data from social networks and compute the sentiment again.

Then, the result of whichever call that finishes first will be stored in the cache. And when the other call finally produces a result, it will also try to store that result in the cache. Depending on the implementation of the cache, this can simply overwrite the old result (they could be different for a lot of reasons), discard the new result, or in the worst-case scenario, produce a duplicate.^[5]

But regardless of how the collision is handled, the worst part is that we twice compute a result that we could have retrieved from cache. This in turn means unnecessarily high latency, and in some cases extra costs. For instance, in our example, data providers do charge for data from social networks.

4. Marking cache values as “Loading”

Finding a perfect solution for race conditions is not easy. In some cases, it is impossible. Here, for example, if we consider the fully detailed architecture of figure 7.2, we have several HTTP calls that can have a variable latency or fail altogether. So, we can’t be sure which one of the two calls to retrieve Google sentiment would return first. Assuming the first call finishes first can be reasonable, but it can also lead to inefficiency, if for any reason the first call to the data gatherer^[6] lasts more than the average. That’s not the worst-case scenario, though: if we decide that the second call to our / sentiment endpoint will wait and reuse the result computed for the first one, and then the calls to the data gatherer fail for some reason, the net result will be that both web calls to our web server will fail.

That said, we can and need to try to do better and avoid wasting resources. To handle the situation described in the last sub-section, one thing we could do is certainly add a “loading” state for cache values, used to mark entries in the cache that are currently being computed. When the first call to /sentiment/Google checks the cache and finds a miss, it will create an entry for Google and mark it as “in progress.”

When the second call checks the cache, it will be aware that the value is being computed. Then we can either do some time-based polling of the cache, checking every few hundreds of milliseconds until a value is stored for that entry, or implement a publisher-subscriber mechanism, with the web app subscribing to the cache to let it know it’s interested in the entry “Google,” and the cache notifying all subscribers for that entry once its value is finally computed.

Either way, we can’t completely solve the problems with the race conditions mentioned here, but risking a little extra delay and a few more 500s responses is in many cases worth the savings we earn by avoiding re-computing values that are in progress.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Data structures and algorithms