Use case: Cache applications

This section would definitely be shorter if we listed examples of systems that do not use cache! Cache is ubiquitous, at all levels, from processors to the highest-level applica­tions you can imagine.

Low-level, hardware caches work slightly differently, though, so we’d better stick to software caches.

As mentioned, many single- and multi-thread applications employ ad hoc in-memory caches to avoid repeating expensive computations.

These caches are usually objects (in OO languages), provided by libraries, and allocate dynamic memory in the heap of the same application that uses them. In the case of multi-thread applications, the cache might run in its own thread and be shared among several threads (as we saw in the previous section, this configuration requires special attention to prevent race conditions).

The next step is bringing the cache out of one application, into its own process, and communicating with it through a certain protocol; this can be HTTP, Thrift, or even RPC.

A typical case is the several layers of cache in a web application:

  • A CDN to cache and deliver static content (and some dynamic content)
  • A web cache that caches content from the web server (usually entire pages and might be the CDN itself)
  • An application cache storing the result of expensive server computations
  • A DB cache storing the most-used rows, tables, or pages from a database

Finally, for large applications with thousands of requests per second (or more), caches need to be scaled together with the rest of the web stack. This means that a single pro­cess might not be enough to hold all the entries that the application needs to cache in order to run smoothly. For instance, if we have a million accesses per second to a data­base, a cache in front of it that can only store a million elements is probably close to useless, because there would be too high a turnaround, and all requests would basi­cally go ahead to the DB, crashing it to a halt in a few minutes.

For this reason, distributed caches have been introduced. These caches, such as Cassandra, use several processes called nodes, each one of them being a standalone cache, plus another process (sometimes a special node), called the orchestrator, that is in charge of routing requests to the right node. Distributed caches use a special hashing function, consistent hashing (see figure 7.14), to decide which node should store an entry.

Web clients also make extensive use of caching: browsers have their own caches (although those work slightly differently, storing content on your disk), but they also have a DNS cache to store the IP addresses corresponding to the domains you browse. Lately, more caching has been added with local and session storage, and perhaps the most promising of all, service worker caching.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *