We described the travelling salesman problem (or TSP for short) in chapter 17, where we also provided a solution based on simulated annealing.
Figure 18.14 shows the same example we produced in the previous chapter, an instance of the problem with 10 cities for which the best solution has cost 1625 (miles), the total length of the following path:
[“New Carthage”, “Syracuse”, “Buffalo”, “Pittsburgh”, “Harrisburg”,
➥ “Opal City”, “Metropolis”, “Gotham City”, “Civic City” ]
Now we would like to tackle the same problem with a genetic algorithm and see if we can speed up convergence to a good solution (and possibly improve the average solution: remember that these optimization algorithms output near-optimal solutions, and so they provide no guarantee of always finding the best possible one).
1. Fitness, chromosomes, and initialization
A good starting point, in general, for all problems, is the design of how solutions are encoded and of the cost function. While with the 0-1 knapsack we wanted to maximize a quantity—the total value of the entries added to the knapsack—TSP is a minimization problem, so we’ll assume the use of a minimization algorithm (as mentioned, otherwise we can just maximize the opposite or inverse of the cost).
Therefore, we can use the same definition for the fitness function of the genetic algorithm as the cost function we saw in section 17.2, and in particular in listing 17.2: that is, just the sum of the distances between adjacent vertices, as shown in figure 18.14.
We can even reuse the same encoding for solutions: they are permutations of the graph’s vertices. This can easily be translated into chromosomes that won’t be bit strings anymore in this case; we need each gene to be an integer value (the index of one of the vertices), with the implicit constraint that there can’t be two genes holding the same value.
We can initialize any solution as a random permutation of the indices between 0 and n-1, where n is the number of vertices in the graph, just like we did for simulated annealing. We can even pass the same method as the chromosomelnitializer argument to method initPopulation in listing 18.2.
2. Mutations
Even for mutations, our job can be (at least for an initial assessment) pretty much limited to reusing methods developed in chapter 17 for local search, and if we’d like to compare to the performance of simulated annealing, we also shouldn’t add new kinds of mutations.
There are, however, a few differences. First, while in listing 17.3 we implement an ensemble method, where with some probability one of the transitions will be applied, in genetic algorithms, this ensemble mechanism is already implicit for mutations, and thus we have to provide each mutation (aka local search) method separately.
Another difference is that for the genetic algorithm it doesn’t really make sense to have a mutation that randomly recreates an individual from scratch. This goes against the principles of natural selection, and we have seen that genome diversity in the population is already supported by crossover and local mutations.
Therefore, we can extract two methods from listing 17.3, one for swapping adjacent vertices, and one for swapping random vertices, and in turn create two mutation operators from them. (Both are shown in listing 18.11. Warning: These methods, for the sake of simplicity, alter their inputs—while this is usually fine for GA mutations, you should be aware of that and state it clearly in the method’s documentation.) To see these methods in action, see figure 17.13 in the previous chapter.
A final, though important, consideration: the mutation chance in GAs will typically be much smaller than the ratios we used for the same methods in simulated annealing (because, obviously, for simulated annealing these “mutations” were the only way to explore the problem domain at any iteration).
As a first attempt, however, we tried to use the same probabilities to try and make a fairer comparison. A couple of sections later, we’ll discuss how it went.
2. Crossover
In crossover lies the real novelty we are adding, with respect to simulated annealing: the latter had no way to combine two good solutions. Truth be told, there wasn’t even a way to keep multiple solutions in the first place!
There are many ways in which we can combine two permutations of the same sequence. OK, perhaps not that many, but there is definitely more than one. Since we don’t need to make things even more complicated, let’s go for the simplest way—or at least the simplest I can think of.
We are still going to choose a single cutoff point, like for 0-1 knapsack, but then things get interesting. Let’s see an example to help us understand. Figure 18.15 shows this crossover method in action for the example we have used since chapter 17, a graph with 10 vertices.
After randomly choosing a cut point, we copy all the genes in the first parent’s chromo-some to the child’s genome, from the gene at index 0 to the one at the point of cut. Now we need to fill the rest of the child’s
chromosome, but we can’t just copy the sequence after the cut point (like we did for the knapsack’s bit strings), because the last four genes in the second organism’s chro-mosome contain vertices 0, 2, and 9, which have already been assigned as values to genes in the first half of the new chromosome.
What we can do, instead, is go through the whole second chromosome from start to finish, and each time we find one of the vertices that was not copied from chromo-some 1 to the child (vertices 1, 4, 8, and 5), we add this vertex to the new chromo-some. This way, the net result is that the first part of the new chromosome, from its beginning to the cut point, is identical to its first parent’s, while the vertices in the sec-ond part, after the cut point, appear in the same order as they do in the second parent (where, however, they might not have been adjacent to each other).
4. Results and parameters tuning
Now that we have all the pieces of the genetic algorithm defined and implemented, there are a couple of questions that we would like to answer:
- Do we get any improvement over simulated annealing?
- How much do crossover and mutations influence the performance of the algorithm?
You might have guessed it: now it’s profiling time!
The former question is easier to answer. Simulated annealing, in our tests in section 17.2.4, based on JsGraphs’ implementation, required approximately 600 iterations to converge to the best solution, and we were able to obtain an average cost equal to 1668.248 (remember, Monte Carlo algorithms, like simulated annealing and the genetic algorithm, don’t always return the best result—there is no guarantee of that).
You can also take a look on JsGraphs’ GitHub at the implementation[1] of a genetic algorithm to solve TSP. Running this algorithm with a population of 10 individuals, with the same mutations rates we used for simulated annealing, and a crossover chance of 0.7, the algorithm takes on average 10 generations to converge to the optimal solution (cost: 1625). For a fair comparison we need to compare the iterations of simulated annealing (where a single solution is kept) to the product of the number of iterations of GA by the population size. Still, we go down from 600 to 100, a nice improvement.
But where we really understand the advantage we get is by computing the average cost on the same number of “total” iterations, for instance 1,000 iterations of simulated annealing versus a population of 25 individuals evolved for 40 iterations (or any combination whose product is 1,000). Without too much effort spent tuning the parameters, we were able to obtain an average cost equal to 1652.84, using high chances for both crossover and random vertex swap mutation. With a population of 100 elements, the average goes down to 163 6.8.
Another interesting factor we would like to understand is this: How do crossover and mutations chances influence the evolution of the population?
To answer that, let’s try to run the algorithm with only one of these operators enabled and draw a curve of the best organism’s fitness. In order to get a better understanding and clearer charts, we’ll use a more complicated instance of the problem: a complete graph with 50 vertices (and random weights for the edges).
Figure 18.16 shows a chart with the plots of the optimization trend for three runs of the algorithm on a population with 100 organisms for 100 generations. The first thing you are likely to notice is that the crossover operator plateaus at some (early) point, but this shouldn’t be a surprise, right? We have already discussed the role of crossover and its limitations. In particular, it can recombine the genomes of the initial population, but if none of the organisms shows a certain feature,[2] then crossover can only improve fitness so much.
We can also see that mutation 2 is more effective than mutation 1, which also plateaus at some point: here as well, it shouldn’t be surprising. If you read chapter 17 about simulated annealing, where we introduced both these mutations, you saw that swapping only adjacent vertices is a small-range local search, which makes it more difficult to get out of local minima.
So far, so good, but what would happen if we were to increase the population size? This would increase the genome diversity in the initial population, so it’s legitimate to expect crossover will be more effective, drawing from a larger pool.
Figure 18.17 confirms this idea. Crossover makes the algorithm evolve faster in the initial phases and manages to get as good a solution as the one obtained using only the most effective mutation.
This was a particularly happy case, though. Running the crossover-only version a few more times, as in figure 18.18, shows that the quality of the best solution found depends on the diversity and quality of the initial population. If we are lucky during initialization, the algorithm will get to an optimal solution; otherwise—as expected— using only the crossover, the algorithm will plateau to a sub-optimal solution. When enabling only the mutations, the final result (not shown on the chart) has less variance across different runs, and the evolution plots of various attempts look similar.
Yet another consideration to make is that both charts in figures 18.16 and 18.17 also confirm that the “adjacent pairs swap” mutation is less relevant to the algorithm and can be put aside in favor of the “all pairs swap” one.
So, do we get any advantage in using crossover and mutations at the same time? To understand that, we need to abandon our charts (which provide an idea for the trend of the simulation, but are not statistically relevant, because they are based on just a few runs), and run a more thorough analysis, summarized in table 18.2.
As you can see, the best results have been found using a mix of crossover and mutations. Using random swaps mutation is more important than using crossover, while the adjacent swaps mutation is not only less important, but can even become detrimental when its chance is too high—the same way it happens for crossover. This might feel puzzling— after all, the worst you’d have expected was, maybe, that a mutation proved useless. How can we explain the fact that a higher chance for crossover and mutation 1 translates into worse solutions, on average? Well, the point is that even if we use elitism, when the other organisms in the new generations are affected by many changes, there is a higher chance that their fitness will get worse, and this effect becomes stronger with each generation, as the average fitness becomes better. If a solution is close to the optimal one, it’s much easier for a random change’s result to be pejorative than ameliorative.
That’s because, unlike simulated annealing, where we reject pejorative changes in the final stages of the simulation, GAs accept them without questioning. Besides the small fraction of high-fitness elements potentially guaranteed by elitism, the rest of the population can drive the average fitness down when too many random changes happen at the same time.
One final word of caution: these results are still to be taken with a grain of salt, because the average is computed on “only” a thousand runs, and because we haven’t explored extensively the domain of parameter values.
NOTE To do parameter fine-tuning in a systematic way, there is an effective protocol often used in machine learning. We start with a set of coarsegrained, heterogeneously spaced values to find the order of magnitude for the best solution (for instance, for the mutation chance, we could choose [0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 1]), and then refine our search by repeating (one or more times) the process around the most promising values. (For instance, assuming the best results were obtained with 0.2 and 0.5, then we test [0.15, 0.2, 0.25, 0.3, …, 0.45, 0.5, 0.55, 0.6].) Since we have multiple parameters, and we need to measure the results for all the combinations for the three of them (as shown in table 18.2), we really want to keep these lists as short as possible and refine them in several steps.
5. Beyond TSP: Optimizing the routes of the whole fleet
As we mentioned in chapter 17, solving TSP is nice, but only allows us to optimize the route of a single truck. With the business thriving, it’s natural to have several trucks taking care of shipments from each warehouse (or possibly from more than one warehouse at the same time).
Each truck has a limited capacity, so we need to take parcel size into account (rings a bell?) and, moreover, each parcel’s destination can significantly change the route needed to deliver all packages assigned to a truck.
This is an extremely complicated problem to optimize, because for each assignment of parcels to the available trucks, we need to optimize the trucks’ routes one by one.
To simplify the problem, what’s often done is splitting the area covered by a warehouse into zones and finding an a priori optimal assignment of these zones to trucks. This, however, forces us to settle for sub-optimal solutions and to include some redundancy in the trucks’ capacity to cope with the fluctuations in demand.
An alternative is running a TSP optimization for each truck after assigning the deliveries to all vehicles, as part of the fitness function computation, and then sum whatever metric is used (miles run, or time spent, or gasoline consumed) across all of them.
As you can imagine, optimizing the larger problem is not easy, because for any change in the assignments of parcels to vehicles, we need to rerun at least two TSP optimizations.
In terms of epistasis (as defined in section 18.1.9), the fitness function has a high degree of dependence between its variables, so a genetic algorithm could be effective, but a random search could also be a viable alternative.
Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)