Applications of nearest neighbor search: Centralized application

Let’s set architectural questions aside for the moment, assume we handle in-house (within the same virtual machine running our web application) all orders for retailers, and focus on the first round of questions. At some point, we asked how we should choose which shop or combination of shops is best in order to serve a customer. Nei­ther k-d trees, nor SS-trees, nor any other data structure can (or should) answer this question. This is a business-related decision that changes from company to company, and perhaps over time or with other variables within the same company.

What we can do, though, is provide our containers with a way to filter points in a NN search by some conditions that we can pass into our code, thus allowing whoever uses the container to reason about the business rules, customize them, and pass them as arguments to the search method.

1. Filtering points

By creating hooks through which clients can customize business logic, we provide a template method that can be effectively customized depending on the business needs.

Listing 11.4 shows the code for k-d tree’s nearest neighbor search, modified to allow filtering of points.

As you can see, we pass an extra argument (with respect to the regular method in listing 9.9), a predicate that is evaluated on the current node before accepting this point as a nearest neighbor.

The only difference with the basic version, besides boilerplate to pass around this new argument, is at line #5, where the predicate is actually evaluated on the current node.

This version of the method allows us to filter single points and solve the first of the issues mentioned in the previous section, making sure that we choose a shop that actu­ally has the item(s) a customer ordered.

Listing 11.4 The filteredNearestNeighbor method

For instance, we could redefine nodes as shown in listing 11.3, and pass to fNN the predicate hasItemX() defined in listing 11.5. We can also define a more generic ver­sion, hasItem(), that takes two arguments, an item and a shop, and uses currying to create a unary predicate checking a single fixed item in the shop, and pass it as shown in listing 11.6.

To find the nearest shop that can ship Pinot noir, we can call our fNN method with something like

fNN(treeRoot, customerPosition, has!tem(“Pinot noir”))

Unfortunately, while this filtering mechanism allows us to filter out retailers that don’t have the items we need, it isn’t powerful enough to let us decide that it is best to choose a closer retailer with higher shipment costs over a cheaper one that is twice as far away, and in general it can’t deal with complex conditions comparing different solutions.

2. Complex decisions

If what we really need is not just filtering shops according to some criteria, but also choosing the best options among them, then we have to seek a more powerful mechanism.

We have two main choices:

  • Use n-nearest neighbor to retrieve a list of n shops that satisfy our criteria, and then process this list to decide which one, among the possible solutions, is the actual best choice.
  • Or we can replace the predicate passed to our nearest neighbor method, as shown in listing 11.7, which instead of a unary predicate is using a binary func­tion that takes as arguments the current node and the best solution found so far, and returns which one is the best.

And, of course, we can also use a combination of the two methods.

The first solution doesn’t ask for any change to our container, so we won’t need to develop it any further. You can use sorting, a priority queue, or any selector algorithm you like in order to decide which solution is the best, according to your business rules. This mechanism has an advantage: it also allows you to try solutions where you order different items at different shops and see if that works better than the best “all-at-one- place” solution.

When we pass a compare function to fNN, instead, this flexibility is not going to be pos­sible, and we will only examine solutions where all items are shipped by the same shop.

If that’s fine, because we have guarantees that we can always find such a shop (or we handle separately the case where we fail to find one), the “compare function” mechanism has the advantage of requiring less extra memory and being overall faster.

As mentioned, the idea is that we can encapsulate all the business logic in a binary function that will compare the node currently being traversed in nearest neighbor search with the best solution found at that point of the traversal. We also assume that this predicate will perform any necessary filtering, for example, making sure that the current node’s retailer has all items in stock. The changes needed to our search method are minimal, as shown in listing 11.7. We simply gather together two steps (fil­tering shops and comparing the current solution with the best one found so far), and instead ofjust checking distances during our NN search, we use a function passed as an argument to fNN, function cmpShops,7 to decide which entry is nearer.

So, now the heart of the business logic lies in this method, which in turn will decide which shops are filtered in and how we choose the our best option.

There are a couple of edge cases that we should always address in this function:

  • If the current node is filtered out, always choose the current NN.
  • If the best solution found so far is empty (that is, nn == null), then the current node should always be chosen, unless it is filtered out (at point 1).

Listing 11.8 provides a possible generic implementation of this comparison method, including filtering out shops that don’t have all items in stock, and checking distances by using an unspecified heuristic method to decide which shop is the best.

It’s worth clarifying once again that the choice of the heuristic is not connected to any of the data structures we have seen, and, vice versa, those algorithms don’t depend on this function. It’s just a heuristic method that encapsulates all the domain-specific logic, and so it changes from application to application. Depending on how you write it, and how you model the Shop class, you can customize the behavior of the search method to solve your instance of the problem.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *