How to use cache is just as important

That’s right, the cache type is only part of the story. To make it work properly, you also need to use it in the best possible way for your application.

For instance, if you place a cache in front of a database, on writing operations you could decide to write values only to the cache and only update the DB if another client requests the same entry; or, on the other hand, you could decide to always update the DB.

Even more, you could decide to write on the DB only and update the cache just when data is requested for read, or update the cache on writes.

These policies described above all have names, because they are widely used in software design and engineering.

Write-Behind (or Write-Back), our first example, is a storage policy where data is writ­ten into cache on every change while it’s written into the corresponding location on the main storage (memory, DB, and so on) only at specified intervals of time or under certain conditions (for instance, on reads).

In this case, the data on cache is always fresh, while data on the DB (or other sup­port) might be stale. This policy helps keep latency low and also reduces the load on the DB, but it might lead to data loss; for instance, when we write back only on reads, and an entry stored in cache is never read after write. In some applications this data loss is fine, and in those cases write back is the preferred policy.

Write-Through (or Write-Ahead), always writes entries both on cache and on the main storage at the same time. This way the database will usually have only one write and will (almost) never be hit by the application for read. This is slower than Write- Behind, but reduces the risk of data loss, bringing it down to zero, with only the exception of edge cases and malfunctions. Write-Through strategy doesn’t improve writing performance at all (but caching would still improve reading performance); instead, it is particularly useful for read-intensive applications when data is written once (or seldom) and read many times (usually in a short time). Session data is a good example of usage for this strategy.

Write-Around refers to the policy of writing data only to the main storage, and not to cache. This is good for write-and-forget applications, those ones that seldom or never reread recently written data. The cost of reading recently written data in this configu­ration is high because they will result in a cache miss.

Read-Through refers to the overall strategy of writing entries on the cache only when they are read, so they are already written on another, slower, memory support. The writing policy that’s used for this purpose can be any of Write-Back or Write- Around. The peculiarity of this policy is that in Read-Through, the application only interfaces to cache for reading, and the cache store will be delegated to read data from the main storage on a cache miss.

Refresh-Ahead is a caching strategy used in caches where elements can go stale, and they would be considered expired after a certain time. In this approach, cache entries with high requests that are about to expire are proactively (and asynchro­nously) read and updated from the main source. This means that the application will not feel the pain of a slow DB read and cache store.

Cache-Aside has the cache sitting on the side of the application, and only talking to the application. It’s different from Read-Through because in this case the responsibil­ity of checking the cache or the DB is on the application, which will first check the cache and, in case of miss, do some extra work to also check the DB and store the value read on the cache.

Choosing the best strategy can be as important as acing the cache implementation, because choosing unwisely can overload and crash your database (or whatever the main storage is, in your case).

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *