Use case: Don’t compute things twice

In our daily job as software developers, we write applications, most of which perform really simple tasks. Adding two numbers, or even dividing them, or adding two vectors (with modern GPUs[1]) are trivial operations, fast enough that we don’t need to bother with optimizing them. (Of course, this wasn’t always the case, but if you happen to be young enough, you’ve never had to deal with x86 processors.)

Yet, no matter how fast and optimized multicore processors or server clusters become, there will always be some kind of computations, some complex operations for which it will just be too expensive for us to ignore how wasteful it would be to per­form them multiple times when we don’t need to.

Back to the previous vector sum. If our vectors have billions of elements (or too many to fit in a GPU’s memory at once), then even this operation becomes quite expensive. The same is true if we are going to repeat the same divisions billions of times when we could have gotten away with a few hundred of them: the impact on our applications’ running time will be sensible.

Number crunching is not the only context in which optimizing your computation matters. If we move to web applications, for example, one of the costliest operations is certainly accessing a database, and even more so if it involves iterating through a cur­sor to count or compute something.

It’s not just terribly expensive (in terms of resources) and slow. Database cursors might involve extensive (possibly even table- or DB-wide) locks if they are not read­only, but even writing single rows can, in some DBs, require locking a page or a whole table.

Every time we lock some data, all read operations on that data have to wait until the lock is released. If this is not handled carefully, a big load of write operations to the DB can put it on fire, slowing your application down and producing an inconvenient lag for your users, or even grinding your DB to a halt, which in turn will cause all your HTTP calls to time out and thus an outage of your website/application.

Wow, that’s scary, isn’t it? Now, if there only was a way to avoid it!

Many companies, even tech giants, in their early days (and at the dawn of the inter­net age), had to experience first-hand the troubles of running and growing a website smoothly. That is, until they found ways to adapt and cope with it.

One of the best ways to ease the load on a database is to avoid computing expen­sive results twice in a short time. There are many other strategies orthogonal to this, from sharding to loosening the consistency constraint (moving to eventual consis­tency), that are needed or at least helpful in order to achieve good, stable perfor­mance on a web application. This is obviously not the right place to present them, but the literature on scalability is rich, and if you are going to work in the field, we defi­nitely recommend you look at some of the amazing books or web guides that have been published.

In this chapter we are going to focus, instead, on caching. To narrow our scope a bit, and hopefully make our example clearer, let’s consider a plausible situation where you could actually use some caching.

Imagine you have this aggregator service that gathers news about companies from social networks, and provides some sort of insight on them; for example, if people are talking about a company in a positive or negative way (what’s called sentiment analysis in data science). Figure 7.1 shows a possible simplified architecture for such a service.

You will need to call external APIs to gather posts from the major social networks and then analyze them and assign a sentiment to each post. Once they are all labeled, each with a certain degree of confidence, you need to decide if overall the sentiment about the company is positive or negative. For instance, you can decide that a tweet spoke positively about company X, with a degree of confidence of 70%, while for another tweet you have a degree of confidence of 95% that the sentiment is negative. Then you’ll have to weigh the two of them, taking that confidence into consideration (among many other things).

You will likely have to subscribe to, pay for, and connect to different services for the different social networks and write adapters that cope with the different formats.

Each external service requires an HTTP call outside your intranet, and each call will have some latency that’s different for the various services. You can call each service in parallel, but even then, if you do need all the data to make a decision, your latency will be at least as high as the slowest of those services.

Figure 7.1 A possible architecture for the “Get social networks daily summary” application. (1) The client sends an HTTP call to the web server asking for the daily summary (for a specific company). (2) The web server contacts the app server to get the data from the major social networks (here the list is just meant as an example). The app server might be physically and logically on the same machine as the web server (in which case the call is a method call), or physically hosted on the same machine but in a different process, or even hosted in a different machine, such that the call ends up being an actual HTTP call. (3) The data gatherer starts one new thread for each loader/social network. All these calls are asynchronous. (4) Each loader sends an HTTP call to an external API over the internet. Once it’s done, it returns the loaded data to the gatherer. Once all the loaders are finished, the gatherer returns all the collected data to its caller (the web server, in this case). (5) The web server synchronously calls the sentiment analyzer, passing the raw data, and expecting the summary in return. Alternatively, the web server could have called an orchestrator or the sentiment analyzer directly at point 2, and this step would be delegated. (6) Once the sentiment has been computed and passed back to the web server, it builds an HTTP response around it and sends it back to the client

Latency can really be a problem; say you offered your customer a service-level agree­ment (SLA) where you committed to return 95% of your monthly calls within 750ms, but unfortunately, during peaks of traffic, a couple of those external services can take as long as 3s to answer.

To make things worse, your timeout for HTTP responses is set to 3.5s, meaning that you have to return an answer to the client within 3.5 seconds; otherwise your load balancer will kill the call. I can imagine you are thinking it would be enough to adjust the timeout, but suppose you can’t change this timeout, because otherwise you couldn’t support the traffic load, given the resources you have available. So, assuming you take around 250ms to process data for a single source, if it takes 3 seconds to get that data, considering the time to handle the incoming call, do some post-processing, and send the response back, you risk having a lot of 503 errors. And guess what? That also goes against your SLA.

At this point it is worth noting that if you are a paid service and you violate your SLA, you might have to give some money back to your customers. So, you definitely would like to avoid that.

Let’s say, in order to keep things simple for our example, that you’d like to provide this sentiment analysis for one company at a time (one company per HTTP call), per day, and always only based on the day before. Imagine that people will use the infor­mation gathered yesterday before the stock market opens to decide whether or not it’s a good idea to invest in a company. Also, let’s assume we only provide our prediction service to the Fortune 100 top companies.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *