Problem and Solutions of Multi-indexing

Your family runs a small grocery shop, and you’d like to help your parents keep up with the inventory. To impress your family and show everyone those computer science classes are worth the effort, you embark on the task of designing a digital inventory management tool, an archive for stock keeping, with two requirements:

  • Be able to (efficiently) search products by name so you can update the stock.
  • Get, at any time, the product with the lowest items in stock, so that you are able to plan your next order.

Of course, you could just buy an off-the-shelf spreadsheet, but where would the fun be in that? Moreover, would anybody be really impressed with that? So, here we go, design­ing an in-memory data structure that can be queried according to two different criteria.

Clearly, real-world scenarios are more complex than this. You can imagine that each product requires a different time to be shipped to you, that some products are ordered from the same vendors (and therefore you might want to group them in the same order to save on shipment costs), that a product’s price may and will vary with time (so you can choose the cheapest brand for, say, brakes or suspensions), and even that some products might be unavailable sometimes.

All this complexity, however, can be captured in a heuristic function, a score that is computed keeping in mind all the nuances of your business. Conceptually, you can handle that score in the same exact way as the simple inventory count, so you can keep things simple in our example and just use that.

One way to handle these requirements could be by using two different data struc­tures: one for efficient search by name, for instance a hash table, and a priority queue to get the item which most urgently needs to be resupplied.

We will see in chapter 7 that sometimes combining two data structures for a goal is the best choice. This is not the time yet; for now you need to keep in mind the issues in coordinating two such containers, and also that you will likely need more than twice the memory.

Both considerations are kind of worrying; wouldn’t it be nice if there was a single data structure that could handle both aspects natively and efficiently?

1. The gist of the solution

Let’s be clear about what we are seeking here: it’s not just a matter of optimizing all the operations on a container, like we discuss in appendix C. Here each bit of data, each entry in the container, is made of two separate parts, and both can be “mea­sured” in some way. There are the names of each product, which can be sorted alpha­betically, and the number of items we have in stock, that’s also associated with each product: quantities that can be compared to each other to determine which products are scarcer and most urgent to resupply.

Now, if we sort the list of entries according to one criterion—for example, the name—we need to scan the whole list to find a given value for the other criterion, in this case the quantity in stock.

And if we use a min-heap with the scarcer products at its top, then (as we learned in chapter 2) we will also need linear time to scan the whole heap looking for a prod­uct to update.

Long story short, none of the basic data structures we describe in appendix C, nor a priority queue, can single-handedly solve our problem.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *