Now we can be quite satisfied with our genetic algorithm for TSP, and we can move on to new, interesting, and of course NP-Hard problems on graphs.
There is a rich body of literature on this kind of problem; some writings are milestones and benchmarks in computer science and some appear in many practical applications throughout engineering.
Minimum vertex cover is important both theoretically and practically, so we couldn’t close this book without discussing it!
Given a graph G=(V,E), a vertex cover for G is any set of its vertices that covers all of its edges; an edge e=(u,v) ϵ E is covered by a subset of vertices S ⊆ V if at least one of the edge’s endpoints, either u or v, belongs to S.
Every graph has a trivial vertex cover: the set V of all vertices. What we are really interested in, though, is finding a minimum vertex cover, the smallest possible subset of vertices that covers all edges. Figure 18.19 shows a few examples of simple graphs and their vertex covers.
Notice that some graphs have a single solution (like the first one on the left), while others have multiple minima vertex covers. Can you find the alternative solution for the graph on the right?
Figure 18.20, instead, shows examples of what is not a minimum vertex cover.
The decision problem for the minimum vertex cover (“Given a subset of vertices, is it a minimum vertex cover for G?”) has been proved to be NP-Hard.
Since there are 2|V| possible subsets of V, a brute-force search seems out of the question.
There are polynomial-time approximation algorithms for minimum vertex cover, but they can only guarantee a result that is within an approximation factor in the order of magnitude of 2 (slightly lower than 2) from the best solution.[1] This problem, in fact, belongs to a computational subclass of NP called APX-complete problems, the set of NP optimization problems for which there exists a polynomial-time constant-factor approximation algorithm.
For some categories of graphs, like bipartite graphs and trees, there exist worst- case polynomial-time algorithms to solve minimum vertex cover exactly, but besides these happy cases, we either accept an approximated (non-optimal) solution, or we need an exponential-time algorithm.
It goes without saying that we are going for the former by using a genetic algorithm. But first, let’s talk about applications of this problem.
1. Applications of vertex cover
Back to our e-commerce company: as the business grows, so do warehouses. To protect the company from thefts and to guarantee the employees’ safety, we need to install cameras that cover all the alleys. Suppose that our warehouses are shaped like the graphs in figure 18.20, where edges are alleys, and vertices are placed at cross points and dead ends; also suppose that cameras have special wide lenses (or motors, or both) and can cover up to 180° or even 360°. Then, the minimum number of cameras we need is given by the minimum vertex cover for those graphs.
Although oversimplified, this is a real, recurring application of the problem, the easiest to adapt to our example scenario, but definitely not the only one.
Another field where efficient vertex cover algorithms make a difference is bioinformatics, in particular computational biochemistry. It often happens that DNA sequences in samples present conflicts (the exact definition of conflict depends on the context) that we need to resolve: graphs come to our aid. We can define a conflict graph where vertices are sequences and edges are added when there is a conflict between any pair of sequences. Notice that this graph is potentially disconnected.
Since the goal is to resolve all conflicts by removing as few sequences as possible, what we are looking for is the minimum vertex cover of the conflict graph.
Going back to computer science applications, vertex cover can be used to aid network security. It has been employed[2] to design optimal strategies, in terms of combinatorial topology of routers, against the diffusion of stealth worms in large computer networks.
Finally, among the many other fields where we can find this problem, we’d like to mention one that we have already discussed in this book: minimum vertex cover has been used[3] to develop an efficient nearest neighbor classifier for general metric spaces (not limited to Euclidean spaces or to Hilbert spaces).
2. Implementing a genetic algorithm
Now it’s time to roll up our sleeves and write a decent optimizer for this problem; the good news is that we can reuse most of the work we had done for the 0-1 knapsack!
The solutions to the minimum vertex cover problem, in fact, are subsets of the vertices, just like for the knapsack they were subsets of the items available. We can therefore implement chromosomes as bit strings and reuse the same crossover and mutation operators!
The only thing that changes, obviously, is the definition of the fitness function. Here the quality of the solution is given by the number of vertices in the chosen subset (the smaller, the better) under the constraint that all edges are covered. We could implement the constraint separately and discard all subsets that are not vertex covers or assign a huge value to their fitness. This way, however, a minimal solution that covers all edges but one would be penalized even with respect to the trivial solution containing all vertices (which is a valid vertex cover). It is important, instead, to keep the former solution in the population, as it could be turned into a valid vertex cover by adding a single vertex.
To work around this issue, I suggest we integrate the number of uncovered edges into the fitness function, with a certain multiplicator (by default, we can use 2).
Considering the first example in figure 18.20, let’s compare two possible solutions in figure 18.21. The graph on the left has one uncovered edge, so its fitness is 5, the same as the trivial solution (on the right) which is a vertex cover. This example also shows how important it is that the multiplicator for the uncovered edges is greater than 1. If we just added the number of uncovered edges to the number of vertices, the solution on the left would have fitness equal to 4, but any subset with 4 vertices would be a better, and valid, solution.
At the same time, using this fitness function, we don’t discard a promising solution that is only two mutations away from the minimum vertex cover (which uses only 3 vertices).
Listing 18.13 shows an implementation of the fitness function for this method. The caveat, with respect to the knapsack problem, is that we also need to pass an instance of the graph to this method, because it’s needed in order to check which edges are covered by the solution. In many languages, this can be obtained without changing the template for the genetic algorithm (shown in listings 18.10 and 18.4) by currying the fitness function and bounding it to an instance of the graph before passing it to the genetic algorithm main method.
JavaScript, for instance, is one of the languages that allows this: check out the implementation provided with JsGraphs on GitHub.[4]
And that was almost all the new code we need to write a solver for vertex cover. There is only a final word of caution.
To be certain to return a valid vertex cover, we will need to sort the final population by fitness and, starting from the one with the smallest fitness value, validate solutions: we can return the first one in the sorted list that is also covering all edges.
Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)