Beyond Dijkstra’s algorithm: A*

As we have seen, BFS and Dijkstra’s algorithms are very similar to each other; it turns out that they both are a particular case of the A* (pronounced “A-star”) algorithm.

This algorithm, shown in listing 14.7, is notjust more generic; it improves the per­formance of Dijkstra’s algorithm in at least two different situations. Before delving into those scenarios, though, let’s focus on the differences between these algorithms and A*.

As we can see at line #1 of listing 14.7, this generic definition of A* takes two extra arguments, a distance function, and a heuristic. They both contribute to the computa­tion of the so-called f-score at line #18. This value is a mix of the cost of reaching the current node u from the source and the expected cost needed in order to reach the goal from u.

By controlling these two arguments, we can obtain either BFS or Dijkstra’s algo­rithm (or neither). For both of them, the heuristic will need to be a function that is identically equal to 0, something we could write like lambda(v) → 0. Both of these algorithms, in fact, completely disregard any notion of or information about the dis­tance of vertices to goal.

For the distance metrics, the situation is different:

  • Dijkstra’s algorithm uses the edge’s weight as a distance function, so we need to pass something like distance = lambda(e) → e.weight.
  • BFS only takes into account the number of edges traversed, which is equivalent to considering all edges to have the same weight, identically equal to 1! And thus, we can pass distance = lambda(e) → 1.

In practice, 99.9% of the times, you’d better directly implement Dijkstra’s algorithm or BFS, and not as a special case of A*. This brings us to a golden rule about keeping things simple that I learned from a great engineer I worked with.

NOTE Do not make your code more generic than needed. You shouldn’t consider writing a generic version of something until you have at least three different variants that could be implemented as minor changes of the same generic code.

General purpose code, such as the generic version of A* shown in listing 14.7, usually carries some overhead, for instance, to call methods or lambdas like distance, instead ofjust retrieving an edge’s length, or for BFS, using a priority queue instead of a faster plain queue. It also becomes increasingly hard to maintain and reason about.

So, from these considerations, it should be clear that A* hasn’t been developed to provide a generic method to be parameterized. And yet, it can be extremely useful.

As we mentioned, in fact, there are at least two good reasons to implement A*, two contexts in which A* provides an advantage over Dijkstra’s.

Let’s be clear from the beginning: this is not always true; in the general case, Dijkstra’s algorithm is asymptotically as fast as A* (or the latter might not even be meaningfully applicable).

A* gains an advantage only in some contexts where we have extra information that we can somehow use.

The first case where we can use A* to drive search faster to the goal is when we have information about the distance from all or some vertices to the goal(s). Figure 14.20 explains this situation better than a thousand words! Notice that in this particu­lar case, the key factor is that the vertices, modeling physical places in the real world, carry extra information with them (their position, which is fixed) that can help esti­mate their distance to the final goal. This isn’t always true and is usually not the case for generic graphs.

To put it differently, the extra information here doesn’t come from the graph, but from domain knowledge.

The good news is somewhat limited though, because there is no a priori guarantee that A* will perform better than Dijkstra’s algorithm. On the contrary, it’s easy to craft an example where A* will always do as badly as Dijkstra’s algorithm. Check out figure 14.21 to get an idea how we can tweak our previous example to fool A*! The key, here and always, is the quality of the extra information captured by the heuristic function: the more reliable and closer to real distance the estimate, the better A* performs.

1. How good is A* search?

If we were in a class and this was a live presentation, this should be your follow-up question: Can we craft an example where A* performs consistently worse than Dijks­tra’s algorithm?

It turns out we can, easily: if we take the example of figure 14.21 and change just the weight of the edge from v1 to vG, setting it to any value smaller than 2, and at the same time we keep the same estimates, then we would be sure that A* would visit every other vertex before getting to the goal, while Dijkstra’s would never visit v6 and, depending on the order it processes edges out of vS, it might also skip vertices v2 to v4.

The key for this example is that the heuristic overestimates the distance to goal for v1 (and a few more vertices): A*, it turns out, minimizes the estimated cost, which can be different from the actual cost, and whenever this estimate is pessimistic, we get into trouble.

The previous example, in fact, shows how using the wrong estimates can make the search unnecessarily slower, which is inconvenient but sometimes acceptable. Things could be much worse, however, because we could also find examples where A* returns a solution whose cost is not optimal. In figure 14.22, the estimate for vertex v3 is bloated, and consequently vertex vG is reached through a different path before even visiting v3. Remember that once the goal is reached, the search stops and so the wrong (non-optimal) path is returned.

While sometimes this can be considered acceptable, in practice, we usually want to avoid it, and we luckily have a way to guarantee this.

It can be proved, in fact, that A* is complete and optimal when the heuristic func­tion satisfies two conditions: it must be admissible and consistent.

  • Admissible (aka optimistic)—Such a heuristic never overestimates the cost to reach the goal.
  • Consistent—A heuristic is consistent if, given a vertex v and any of its successor u, the estimated cost for u is at most the estimated cost for v, plus the cost of getting from v to u. In formula, heuristic(u) < distance(v, u) + heuristic(v).

As one of our reviewers of this book suggested, there is an even clearer way to remem­ber the difference between these two conditions: admissible means it’s not overestimat­ing the cost of a path, and consistent means it’s not overestimating the cost of an edge.

When we plan to use A* search, the first thing we need to ensure is that we have a heuristic function that is both admissible and consistent. This condition is necessary and sufficient to ensure the optimal solution will be found.

What does all of this mean for our problem—delivering goods to customers? Well, to speed up the search for the best route, we can use the “straight-line distance to the customer’s address” as the heuristic. This will guide search, favoring paths that get closer to the goal over those that are directed away from it; in turn, it will require us to visit fewer vertices before reaching the goal.

Figure 14.23 shows how A* would find the best route for our previous example, the one to which we applied Dijkstra’s algorithm in figure 14.19. While running Dijkstra’s algorithm, we would have had to visit all vertices whose distance from the source is smaller than 13 5 6 (the total distance of the shortest path from source to goal). A* can reach the goal faster and, although it still visits some vertices not on the shortest path, it doesn’t go through vertices vA, vB, vC, vF, vN, vR, which are instead visited by Dijkstra’s.

In this particular example, with A* we can save traversing 6 edges over 23, which is quite good (a 25% save), especially considering that there are two paths with a very close weight, differing by just 2 meters.

Considering road distance as edges’ weight and straight-line distance as the heuris­tic, we can see that straight-line distance is certainly optimistic, because the road dis­tance can never be shorter; at best it can be the same.

Straight-line distance is also a consistent heuristic. In fact, if we apply our choices to the condition for consistency, we get

straight_line(u, goal) ≤ road_distance(v, u) + straight_distance(v)

which is certainly true, since straight_line (v, u) <road_distance(v,u), and straight­line distance, being a Euclidean distance, certainly abides by triangle inequality.

While this condition always holds, the value of the heuristic for a vertex u can be larger than the value assigned to its parent v. For example, in figure 14.23, consider vertices vE and vD: the straight-line distance between vertex vD and the goal is larger than its parents’, vE’s. This happens because not all vertices are connected by an edge (or rather, two directional edges), and so, for example, from vE (which is the closest vertex to the goal), you can reach vG only with a detour, by first going through vD.

It can be proved that if, instead, the graph was fully connected, we would also visit vertices according to their straight-line distance from the goal. In other words, the value of the heuristic would monotonically decrease during the traversal.

Then again, most useful graphs are not fully connected, and luckily we don’t need this strict condition for A* to be able to find the optimal solution, but we can make do with a consistent and admissible heuristic.

These properties can guarantee that the best solution will be found, but should we also aim to get the most accurate estimate possible? There might be many admissible and consistent heuristics, but does the algorithm find the best route faster if we choose one with a more precise estimate?

Needless to say, when we have to compute thousands of routes per hour, using a more efficient search can save a lot of computation, allowing shipments to leave the factory faster and ultimately saving money.

Once again, we can get a theoretical guarantee about A* performance: if we fix the heuristic and distance, then for any consistent heuristic, A* will not just be optimal, but also optimally-efficient. This means that no other algorithm is guaranteed to expand fewer nodes than A*.

That being said, the closer the estimate is to the actual distance of a vertex to the goal, the faster the algorithm will reach the goal. As an example, try to come up with a better heuristic for the graph in figure 14.23. (Hint: What could be more precise than straight-line distance?)

While we are happy to have a way to guarantee that A* will find the optimal solution, we also mentioned that sometimes it is acceptable to settle for a sub-optimal one. Especially when the cost of traversal is high, or there are constraints on the response time, we might want to choose a non-admissible heuristic that guarantees faster convergence.

2. Heuristics as a way to balance real-time data

That concludes our discussion about optimality of search. We also mentioned there is at least one other scenario where A* can prove itself particularly useful. Let’s briefly explore it.

The great power of having this heuristic function is that we can use a different met­ric with respect to the distance and convey more information about the domain. Heu­ristics could even combine several data, as long as the value returned by a heuristic is scaled appropriately to make sense when compared to the edge’s distance. For instance, it would make little sense (and cause terrible performance) to use meters for edges’ weights and seconds (or even millimeters) for the heuristic’s estimates; how­ever, we can always scale millimeters to meters or, if the heuristic conveys information about the average time needed to reach the goal from each vertex, then we could mul­tiply it for the roads’ average or minimum speed to obtain a quantity that could then be added to any edge’s weight.

But we can also take this to the next level by decoupling even more the purposes of the distance and of the heuristic. Imagine that we are computing the best route on the go, instead of a priori, like a car navigator does.

First, we switch our metric from distance to travelling time, and the time needed to traverse a road changes depending on traffic, weather, areas closed to transit during certain hours, and so on.

However, the average time needed when following a certain route is known in advance, and we can use that as a compass to balance our live decisions.

Suppose, for instance, that now you are also planning deliveries on a larger scale, and you need to ship some goods from Naples to Florence. When you get to Rome, the faster route would be through the motorway passing east of the city, and it usually would take slightly less than three hours. However, your navigator realizes that for the next 10 miles on that route there is heavy traffic, while going around Rome on the west side, the traffic is all clear. If the navigator just used Dijkstra’s algorithm, the next edge to expand would be the shorter time for the next 10 miles, and it would lead you west.

Unfortunately, that would add at least one hour to your trip. A* can come to the rescue and balance the short-term advantage of a traffic-free motorway section with the long-term gain of a shorter and usually faster route.

Although this example is extremely simplified, you get the point. A* can better bal­ance long-term costs with local choices, and that’s why it has been the state-of-the-art in AI, for example, for pathfinding in video games.

The next step to improve these navigation algorithms would be considering how requests for shortest path are handled in a vacuum. Especially during rush hours, when many users will request the shortest path between similar locations, suggesting the same route to all of them could lead to unnecessarily high traffic on some roads, and very low traffic on others. Wouldn’t it be great if a more balanced routing system could take into consideration all requests involving the same segments, spread the traffic over several routes, and minimize the congestion due to vehicles sharing the same road?

That goal is ambitious, out of scope for this book, and out of reach of classic com­puters. That’s why quantum developers are working on it, with encouraging results.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *