Now that we have seen how to implement k-d trees (in chapter 9) and SS-trees (chapter 10), we can focus on the problems we are able to solve by using these structures. And we can go back to our initial example: writing an application to find the closest warehouse/shop/retailer to customers, in real time. Figure 11.1 shows yet another variation of the map first introduced in figure 8.1, where we show a few (real) cities and some imaginary shops in the area around them. In our example, we need to find the closest shop that is able to deliver a certain item to customers after they purchase that item on our website. For this reason, we only show the kind of goods sold for each shop, rather than the shop’s name or brand.
Given all the work done in the previous chapters, in this chapter we could actually get away without even writing any code. k-d tree, SS-tree, or another similar data structure can take care of all the operations we need; just create a container with the full list of warehouses, and then, for each customer’s request, query it to find the closest hub.
If, however, the retailer’s stock is dynamic, and items can go out of stock in one place or be restocked in another, then our current model won’t work, and we need to make a tiny change to our tree nodes and our search method. We will talk about these details later in this chapter. First, let’s see how we can write a solution to the basic version of our e-commerce problem.
1. Sketching a solution
Initially we will make some assumptions to simplify our scenario. Assume all the shops have all the products in stock at all times. Figure 11.2 shows a possible workflow for order processing under this scenario; of course, this is not what would happen in real life, but simplifying a problem usually helps us sketch a first solution upon which we can iterate, introducing one by one the constraints found in real situations and reasoning about how they influence our applications and how we can cope with them:
- Shops have dynamic inventory, so we need to check whether the closest shop to a customer actually sells an item and has it in stock.
- Delivery costs and other considerations might cause us to prefer a further shop to the closest one.
- If we need to make HTTP calls as part of our workflow, we need to take extra care to avoid network pitfalls, and we might have to carefully select our algorithm to cope with issues such as fault tolerance and timeouts.
The core of the problem, in this setting, is to keep an updated list of stores and retrieve the closest one to a customer each time an order is placed on the website.
Listing 11.2 shows these trivial operations; we also define an auxiliary type for shops in listing 11.1.
It encapsulates the information associated with a retailer. It will become even more useful later in the chapter as we develop our final solution. We store the location of the shop in the field point, for the sake of consistency with what we have seen in the previous chapters.
From the previous snippets, however, it’s apparent that we’ll need to make some adjustments to the node objects and to the API of the methods we have defined in the previous chapters.
For example, the Node class could hold a reference to the shop, as shown in listing 11.3, rather than directly to a point, and methods like insert, likewise, would take an instance of Shop as an argument and not just a point, so that the new signature would look something like
function insert(node, newShop, level=0)
Listing 11.3 Redefining Node
class Node #type Shop shop
#type Node left
#type Node right
#type integer level
function Node(shop, left, right, level)
The application code looks so simple, but do not let it fool you: the complexity is all hidden in the order function that performs all the operations involved in actually placing an order to a real shop. This likely means that you have to interact with each shop’s own web applications,[3] so first of all, you have to imagine that you need a common API by which all the shops’ services need to abide.[4]
2. Trouble in paradise
Now we’ve finally have defined a simple method that we can use to find the closest retailer to our customer and let them know about the purchase so that they can ship goods to users.
Notice how simple listing 11.2 is. Simplicity, in IT as in life, is often a good thing. Simple code usually means maintainable and flexible code. The problem is that sometimes things look great on paper, but when you deploy your application and open it to real traffic, you often find issues you hadn’t thought of or even imagined. For example, in the real world, you have to deal with dynamic inventory for each shop and, even worse, race conditions.
The first issue we have overlooked is that not all the retailers have all the goods in stock. So, it’s not enough to find the closest shop to a customer; we need to find the closest retailer that can actually sell and deliver the product bought by our user. If the user buys more than one item and we need to deliver them all in one go (to save on shipping or just to reduce user churn), we likely want to filter shops that have all the items at the same time, whenever possible. But then another question arises. If the user buys products A and B, and there is one shop that could ship both, but it’s 100 miles away from the customer, and two shops each have only one of the items, but they’re within 10 miles of the customer, which solution should we choose? What about choosing between a closer shop that has a longer delivery time or higher cost, and another one that is further away, but cheaper and ultimately better for the customer?
Even more issues come up if we lay down the architecture of the system (see figure 11.3). So far, we have treated shop datasets as if they were local, but that’s not necessarily the case; each retailer can have their own system with which we need to interact:
- They can sell items offline, for instance. If that happens, our information about their stock can become stale. Similarly, we can get out of sync if a shop’s items are restocked.
- In listing 11.2 and figure 11.3, we assume that a call to shop.order will succeed, but since it’s likely to be a remote call over HTTP, there are many reasons why it could fail independently of the item availability in the shop’s inventory: the call could time out, the shop’s application could crash and be unreachable, and so on. If we don’t check their response, we will never know if the order was successfully placed. And if we do check, but never get a response, what should we do?
These are extremely challenging issues, that we’ll try to solve in the next sections.
Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)