When it comes to optimization algorithms, the gist is that we are trying to replace a brute-force search over the whole problem space with a local search that filters out as many points in the problem space as possible: finding the same result by searching a much smaller domain.
Optimization algorithms explore the neighborhood of current solutions, which are points in the search space, and do this either deterministically or randomly, trying to spot areas of interest, promising regions where we expect to find better solutions than what the search has previously found.
The way we perform this filtering, the range of the search and how “local” it actually is, if we move randomly or in a deterministic way: these are the key factors that characterize the different techniques we have described in chapters 16 and 17.
In the previous chapter we discovered simulated annealing and discussed how it converges to near-optimal solutions by allowing transitions uphill, while gradient descent only moves downhill, toward better solutions, and as such it gets easily stuck in local minima. Simulated annealing can therefore work better than gradient descent when the cost function has many local (sub-)minima.
While powerful, simulated annealing can’t be perfect, and indeed we have also seen that there are two main drawbacks to consider:
- The algorithm is slow to converge. While gradient descent, which is deterministic, takes the fastest route downhill,[1] simulated annealing (which instead is stochastic) randomly wanders across the cost function’s landscape, and as such it might require many attempts before finding the right direction.
- With certain shapes for the cost functions, where local/global minima are in narrow valleys, the probability that simulated annealing randomly “walks” into those valleys can be low, so low that it can fail to reach a near-optimal solution altogether.
In section 17.1 we analyzed in some depth why simulated annealing works well, and how it creates this sort of dynamic filtering, indirectly restricting (based on their cost) the domain points that can be reached at a given stage. If you remember, initially the algorithm jumps from point to point, as likely to go uphill as downhill, exploring the landscape at large.
We have also seen how it can happen that the optimization finds a near-optimal solution in the early stages, and then moves away from it, because in the initial phase it’s likely to accept transitions uphill. The problem is that simulated annealing has no memory, it doesn’t keep track of the past solutions, and—especially if we allow long- range transitions—there is a concrete risk that the algorithm will never get back to a solution that was as good.
An even more probable risk is that the algorithm does find the valley containing global minimum at some early-ish stage, not necessarily getting any closer to the end of the valley (perhaps landing just somewhere close to the entrance of the valley), and then moves away and never manages to enter it again.
Figure 18.1 illustrates these situations. In section 17.1 we also discussed how simulated annealing with restart can keep track of past solutions and—when stuck—ran- domly restart from one of these past positions, to check if a solution better than the current best is found.
This workaround can help somewhat with the former situation, where we get to a great solution early, but it’s unlikely to improve the latter (just finding a promising position at the entrance of a valley), because we only save at most a small number of the best previously found solutions. As such, even if the optimization had managed to find the beginning of a path to global minimum, we would forget about it, unless we were lucky enough to land close to the bottom of the valley.
1. Inspired by nature
Simulated annealing was inspired by metallurgy, mimicking its cooling process to drive a system from chaotic to ordered behavior. Nature, as a matter of fact, has often been a great source of inspiration for mathematics and computer science. Just think about neural networks, probably the most pervasive of these examples, at the time of writing.
For biological processes, such as neural networks, it’s not hard to imagine why. They have been adapted and perfected over millions of years, and since their efficiency is tightly connected to organisms’ survival, we can find clever and efficient solutions everywhere in nature.
Genetic algorithms are yet another example of biologically inspired algorithms and, even more, they are based on this very principle of evolution in response to stimuli from the environment.
As shown in figure 18.2, a genetic algorithm is an optimization algorithm that maintains a pool of solutions at each iteration. Compared to simulated annealing, this allows maintaining a larger degree of diversity, probing different areas of the cost function’s landscape at the same time.
One thing that can’t be shown in this “cost function” kind of graphic is that genetic algorithms also evolve the solutions by recombining them: this allows us to take the strengths of different solutions and merge them into a better one. Take, for instance, the TSP problem, which we described in section 17.2 (we’ll also talk about it again in the next section). Imagine that we have two poor solutions, each having the best possible sequence for a different half of the vertices, but each with terrible solutions for the remaining half. By combining these two solutions in the right[2] way, we can take the good halves from each and get a great candidate solution, possibly (if you are lucky!) even the best possible.
This idea of combining solutions is something that we don’t find either in gradient descent (obviously!) or in simulated annealing, where a single solution is kept and “studied” at any time.
We’ll see that this new idea is embodied in a new operator, the crossover operator, that derives from the biological analogy used for this technique. But to avoid getting ahead of ourselves, we still need to reveal what inspired genetic algorithms; in doing so, we will also make clear where the name comes from.
A genetic algorithm is an optimization meta-heuristic based on the principles of genetics and natural selection. We mentioned that this optimization technique maintains a pool of solutions; what we hadn’t said is that these solutions will mimic a population that has evolved through several generations—each organism in the population is defined by a chromosome (or, sometimes, a pair of chromosomes), that encodes a single solution to the problem that’s optimized.
So, in a nutshell, chromosomes encode information determining an organism’s behavior (genetics), and in turn how well an organism can adapt to its environment, survive, and generate offspring (natural selection).
Computer scientist John Holland, inspired by this mechanism, in the early 1970s devised[5] a way to use these two principles to evolve artificial systems. One of his students, David Goldberg (considered for a long time the expert of reference on this topic), popularized this approach through his research and his book at the end of the 1980s.[6]
The core idea, shown in figure 18.4 in the next section, is to encode a solution as a chromosome and assign each chromosome to an organism. Then, naturally, it follows that based on how good a solution is, we derive how fit the corresponding organism is for its environment. Fitness, in turn, is a proxy for the chances of an organism to reproduce and pass its genetic material to the next generation through its offspring. As in nature, the alpha-individuals of a population, the stronger (or faster, or smarter) ones, tend to have a higher chance of surviving longer and finding a mate.
But perhaps the best way to understand genetic algorithms is by seeing them in action. To that extent, and to keep things concrete, while we describe all the building blocks of this meta-algorithm, we’ll develop an example along the way to help readers visualize how each component works. Do you remember the 0-1 knapsack problem? We are going all the way back to chapter 1, when we introduced this problem to model our “packing to Mars” situation. Back then, we mentioned there is a pseudopolynomial dynamic algorithm that can provide the absolute best solution, but it can be too slow when the capacity of the knapsack is large. Branch-and-bound approximated algorithms exist that can compute near-optimal solutions (with some guarantees on the upper bound and in reasonable time), although the efficient ones are usually quite complex.8
To get a fast, clear, simple optimization algorithm, we will implement a genetic algorithm that can find approximated, near-optimal solutions to the knapsack problem.
Before we delve into the details of genetic algorithms, I suppose you could use a refresher. Let’s quickly review the problem definition, and the instance we were using.
NOTE In the generic 0-1 knapsack problem, we have a container with a limited capacity M, and a set of goods, each characterized by a weight (or any other measure of capacity) w, and a value v. If we add an item to our knapsack (or any generic container we decide to use!) it must be all of it; we can’t add fractions of items. The sum of the weights of all available items exceeds the capacity of the knapsack, so we need to choose a subset of items, and the goal, therefore, is to choose the particular subset that maximizes the value of the goods carried.
In section 1.4.1, we tackle a more concrete problem: given the list of goods shown (for your convenience) in table 18.1, we’d like to fill a crate that can carry at most 1 ton, not with as much food as possible, but with the largest total calorie count possible.
As such, we could easily check that wheat flour, rice, and tomatoes could sum up to the maximum carriable weight, but they wouldn’t give the maximum number of calories.
In chapter 1, we briefly described the key consideration that is used by the branch- and-bound heuristics to approximate this problem. It computes the value by weight (in this case, calories by weight) for each of the goods, and prefers food with higher ratios; although that’s used as a starting point by the Martello-Toth algorithm, just choosing products in descending order of per-kilo values won’t give us the best possible solution.
And, in fact, the dynamic programming algorithm that exactly solves the 0-1 knapsack problem won’t even compute this ratio; we will not use it here either, so it’s not even shown in table 18.1.
If you’d like to have a closer look at the problem, please feel free to skim through section 1.4; let’s now move on to discussing the main components of genetic algorithms (GA).
To completely define an optimization algorithm, we need to specify the following components:
- How to encode a solution—Chromosomes, for GA
- Transitional operators—Crossover and mutations
- A way to measure cost—Fitness function
- How system evolves—Generations and natural selection
2. Chromosomes
As we mentioned, chromosomes encode a solution to the problem we need to solve. In the original work by Holland, chromosomes were only supposed to be strings of bits, sequences of zeros and ones. Not all problems can be encoded with a binary string (we’ll see an example in section 18.2 with TSP), and so a more generic version was derived later, allowing genes (each of the values in a chromosome) to be continuous, real values.
Conceptually, those versions can be considered equivalent, but whenever we don’t use binary strings for the chromosomes, we will likely require restrictions on the possible values they can store: the operators acting on organisms will have to be adjusted in order to check and maintain these constraints.
Luckily for us, the 0-1 knapsack is the perfect example to discuss the original genetic algorithm, because a solution to this problem can be encoded exactly as a binary string. For each item, a zero means we leave it on Earth, while a 1 means we add it to the crate going to Mars.
Figure 18.4 shows a bit-string representation for a chromosome that can be used in our example. Two different solutions (or phenotypes, in biological terms) can be represented by two chromosomes (or genotypes): their difference boils down to the difference between the two bit-strings.
It’s worth noting that for this example we have a 1:1 mapping between genotype and phenotype: a solution (the phenotype) is uniquely identified by its representation (the genotype) and so two organisms with the same genotype will translate to the same solution and in turn to the same fitness value (as we’ll discuss in a few sections).
In the context of our 0-1 knapsack example, if two organisms have the same genotype, both solutions will add the same set of goods to the crate to Mars.
The equivalence between genotype and phenotype is, however, not observed in nature, where the difference between them is clear, because the genome only encodes developmental rules, not precise outcomes, and an organism is also determined by its interaction with the environment.[7]
Likewise, in some simulated problems fitness is not always fully determined by genotype. One of the first examples of an application of genetic algorithms I studied was the work[8] of Floreano and Mondada at EPFL, where they evolved the neural networks of small two-wheel robots, simulating the evolution of gatherers, and later of predator and prey populations (in a sort of hide and seek game).
In this experiment, the genotype was the set of weights for a robot’s neural network, which, incidentally, also completely identify the phenotype. Each organism’s fitness, though, was later determined by its interaction with the environment and with other robots.
If we were to extend their original work by allowing online learning for the robots (today, it wouldn’t be hard to imagine applying stochastic or mini-batch backpropaga- tion to evolve the NN’s weights based on the input from the environment), then neither would the phenotype be completely determined by the genotype, at least not after the first update, because after that, the way weights change is heavily influenced by the interaction of the robot with the environment.
With respect to figure 18.4, we also need to make one more thing clear: the actual representation of chromosomes depends on the actual, specific problem we are solving. In this case, it depends on the specific instance of the 0-1 knapsack (although a generic representation for all 0-1 knapsacks can and should be designed).
If we want to design code for the genetic algorithm main method, instead, we should be concerned with designing organisms, for example, modeling them with a class that will be provided the actual chromosome (and as we’ll see, the methods doing modifications on them) whether at runtime through composition, as arguments (using the strategy design pattern), or at compile time through inheritance (the template design pattern).
NOTE The genetic algorithm is a meta-algorithm, a template that is completed by the code specific to each problem tackled. As such, there is part of the code that is generic—the structural code defining the optimization technique—and another part that is specific to the problem. At this point, while describing the components of genetic algorithms, we’ll provide the pseudocode for the template, and since we are using the knapsack problem as an example, we’ll also provide some pseudocode to implement examples of the specific methods for the 0-1 knapsack, with these snippets clearly marked.
Listing 18.1 shows a possible implementation of the Individual class, modeling each individual organism in the population evolved.
3. Population
The most apparent difference between simulated annealing and genetic algorithms is that instead of tuning a single solution, genetic algorithms will evolve a population of individuals.
Listing 18.2 shows a possible implementation for a method that initializes the population that will be used for the genetic algorithm. This is also a templated method, since it will have to be provided with the actual code for initializing the single chromosomes (which, again, will depend on the actual problem tackled).
There are a few different strategies that can be used for initialization. You can rely on random initialization to provide diversity in your initial population. This is the original and still most common approach, but in certain situations you might also add constraints on the chromosomes (to avoid invalid solutions) or even decide to provide an initial population (that can, for instance, be the output of a different algorithm or of a previous iteration).
For a generic instance of the 0-1 knapsack problem, since chromosomes are just bit strings, we are good with a random generator, as shown in listing 18.3.
There is a caveat: not all randomly generated strings are acceptable solutions for a given instance of the knapsack problem. For example, a string with all values set to 1 would certainly mean (for a non-trivial instance) that the weight constraint has been violated. As we’ll see next, we can assign a low fitness to solutions that violate this constraint; alternatively, when we generate chromosomes, we can add a check to avoid violating it in the first place.
4. Fitness
Connected to the definitions of chromosome and organism, there is the notion offitness of an individual. As you can see in listing 18.1, the Individual class has a fitness method returning a value that measures how well an organism adapts to its environment.
The term fitness naturally implies a maximization problem, because higher fitness is usually associated with better performance in an environment. Conversely, for many problems we usually want to express our problems’ goals as cost functions to be minimized, so we have two choices: either implement the genetic algorithm template in such a way that a lower value for fitness means better performance, or reformulate our
cost functions according to the choice we made. For instance, if we need to find the point of highest elevation on a map, we can either implement a maximization algorithm or set the fitness function to f(P)=-elevation(P), for any given point on the map, and stick with our usual minimization strategy (as we saw in chapters 16 and 17 for gradient descent and simulated annealing).
Consider our example with the 0-1 knapsack problem: the goal is naturally expressed by a value function that we want to maximize, the sum of the nutritional values of the goods added to the crate.
The first chromosome in figure 18.4, for example, is 0100100, which means we sum the calories of rice and tomatoes, for a total of 1,359,000 calories. What happens when we exceed the weight that the crate can carry, for instance, with the chromosome 1111111? To spot these edge cases, we need to check the total weight while computing the fitness and assign a special value (for instance, in this case, 0) to those solutions that violate the constraint on the weight.
An alternative can be making sure that this situation never occurs by carefully checking it in the operators that create and modify individuals (initialization, crossover, and mutations, as we’ll see). This solution, however, may have consequences on the optimization process, artificially reducing the plurality of features in the population.
At this point, if the template method that we have implemented for our genetic algorithms tries to maximize the fitness of individuals, we are golden. What if, instead, our implementation strives for lower fitness? For this particular case, we might have an easy solution: we sum the values of the goods discarded, corresponding to zeroes in a chromosome. For any given solution, the lower this sum is, the higher will be the sum of the calories of the goods in the crate. Of course, if we go this way, we have to assign a very high value (even infinity) to the solutions where the weight threshold is exceeded.
5. Natural selection
We now have the tools to create a single organism, generate a whole population, and check individual fitness. In turn, this allows us to mimic a basic natural process that guides the evolution of populations in the wild: natural selection.
In nature, we see it everywhere: the strongest, fastest individuals, the ones that are best at hunting or at hiding, tend to survive longer, and in turn (or in addition to that) they have greater chances of mating and transmitting their genes to the next generation.
The goal of genetic algorithms is to use a similar mechanism to have the population converge toward good solutions. In this analogy, the genes of individuals with high fitness[9] encode features that give them an advantage over the competitors.
In our knapsack example, a good feature could be including food with high calories- per-weight ratio. Solutions including potatoes will be individuals with low fitness, and that therefore will struggle to survive to the next generations. As an example, in the predator-prey simulation mentioned in section 18.1.2, the prey that developed a strategy to hide behind objects were more efficient in escaping predators.
Listing 18.4 shows, through some pseudocode, how natural selection (in genetic algorithms) works. This method regulates how a population transitions between the current generation and the next one, taking the old population as input, and returning a new population, randomly created from the original organisms through a set of operators.
This natural selection process is applied at each iteration of the algorithm. When the algorithm starts, it’s always with a fully formed population outputted by the initialization method (its size is determined at runtime through a method argument, as we have seen). Individuals in the input population are evaluated for their fitness and then go through a process of selection that determines which ones are going to either pass to the next generation unaltered or propagate their genes through mating.
In the genetic algorithm’s simplest form, the new population is just initialized as an empty list (line #2), a list which is then populated (pun intended) with the new individuals resulting from selection and mating (#3—#6).
For mating, there are, obviously, differences with respect to the biological analogy. First and foremost, in genetic algorithms sexual reproduction between two asexual organisms is generally[10] used: moreover, the recombination of genetic material between the two parents doesn’t follow any biologically meaningful rules and the actual details of how crossover is performed are for the most part left to the specific problem that is solved.
Finally, we add mutations to the genetic material after crossover, modeling crossover and mutations as separate phases.
Figure 18.5 illustrates the general mechanism with an analogy to the biological process.
We start with an initial population where individuals have a common basis, but also some peculiar characteristics. The diversity and variance in the population largely depends on which stage of the simulated evolution we are in. As we’ll see, our aim is having greater diversity at the beginning of the simulation and then have the population converge toward a few homogenous, high-fitness groups.
In our example, our population of ducks shows a few unique features, from the shapes of their beak, to their colors/patterns, to the shape of wings and tails. If you look closely, you should be able to spot them and to figure out how these changes are encoded in their chromosomes.
This is because, as we have discussed, each feature in an individuals’ phenotype is determined by a difference in their genotype, a 1 that becomes a 0, or vice versa (or maybe a few bits that need to flip together, or even, if we move past binary strings, a gene assigned with a different real number).
In order to perform selection, we need to evaluate each duck’s fitness. This can be as easy as feeding each chromosome to a function or as complicated as running a simulation for hours (as in the predator-prey setting we described) and seeing which individuals survive longer or perform a real-world task better.
We’ll only talk about selection for mating in the next subsection, but one thing is generally true regardless of the details of the selection mechanism chosen: organisms with higher fitness will have greater chances of being chosen to pass their genes to the next generation. This is crucial, maybe the only really crucial part, because without this “meritocracy,” no optimization would be performed.
Once we have selected two individuals, we need to recombine their genetic material. We’ll also tackle crossover and mutations in their own section, but one thing that we have already mentioned is that these operators, acting on chromosomes, are largely determined by the specific problem that is being solved.
In our example in figure 18.5, our little duckling takes its feathering pattern and color from one parent and its tail shape from the other. Then a further mutation kicks in changing the feathering again from a checkerboard pattern to a polka dot pattern.
The new population, formed by repeating this process many times, should show a larger presence of those genes/features associated with high-fitness individuals in the initial population.
Now that we have seen how the broad natural selection process works, it’s time to dive into the specifics of its steps.
6. Selecting individuals for mating
Let’s start with selection: we’ll go back to our knapsack problem example to better explain it.
There are several techniques that can possibly be used for selecting, at each iteration, the individuals that will mate or progress; some of the most successful alternatives are
- Elitism
- Thresholding
- Tournament selection
- Roulette wheel selection
But there are also many other possible techniques used. Considering our restricted list, the first two techniques are filters on the old population, while the remaining two are methods to select organisms for mating.
Elitism and thresholding, illustrated in figure 18.6, are used to decide what organisms can or can’t transmit their genomes to the next generation.
Elitism allows the best individuals to pass completely unaltered to the next generation. Whether or not to apply this principle is up to the algorithm designer; as always, it might work better with some problems and worse with others.
Without elitism, all organisms in the current iteration would be replaced in the next one, so that all organisms would “live” exactly one generation; with this workaround, we can instead simulate a longer lifespan for particularly well-fitting individuals—the better their fitness in comparison to the average population’s, the longer they will be kept.
In turn, this will also ensure that the genes of the alpha organisms will be present, untouched, in the next generation too (although it’s not certain they will still be the top-fitness individuals in the next generation).
One notable effect of using elitism is that the fitness of the best element in the population is monotonically improving over generations (that, however, might not be true for the average fitness).
Thresholding has the opposite goal: it will prevent the lowest-fitness organisms from transmitting their genes to the next generations, which in turn reduces the probability that the algorithm explores less-promising areas of the fitness function’s landscape.
The way it works is simple. We set a threshold for the fitness of the individuals that are allowed to be selected for mating and ignore the ones outside the requirements. We can discard a fixed number of organisms, such as the five individuals with the worst fitness value, or discard a variable number of individuals based on their fitness value.
In the latter scenario, the threshold on the fitness value is usually set dynamically (changing each generation) and based on the fitness value of the best individual for that generation. With a static, absolute value (which would require domain knowledge to be set), the risk is that it would be ineffective (resulting in the whole population being filtered in) or even worse, fatal for the simulation when it’s too stringent, resulting in all, or almost all, individuals being filtered out in the early stages.
In the example in figure 18.6 we suggested setting the threshold to 80% of the best individual’s fitness value.[11] Whether or not thresholding should be applied and what the threshold ratio should be completely depend on the actual problem to be solved.
After filtering the initial population, and possibly escorting the best individuals to the next generation, we are still tasked with recreating the remaining individuals in the new population. To do so, we implement crossover, which will be discussed in the next sub-section, as a way to generate a new organism from two parents, and mutations, to provide genetic diversity to the new generations. The necessary step before being able to apply crossover is, therefore, selecting those two parents. As you can see in listing 18.4, we are going to perform this selection several times at each iteration.
There are many possible ways to select individuals for mating—we are going to discuss two of the most common here.
Tournament selection is one of the simplest (and easiest-to-implement) selection techniques used in genetic algorithms; it’s illustrated through an example in figure 18.7. Conceptually, we randomly select a handful of organisms and then have them “compete” in a tournament where only the winner gets the right to mate. This is similar to what we see everywhere in wildlife, where (usually) male adults fight for the right to mate with females (ever watched a documentary on deer crossing horns during mating season?).
Now, in reality we don’t actually code a tournament, unless we are running a tournament simulation![12] As shown in listing 18.5, we just randomly select k individuals and take the one with the best fitness. The exact number, k, can vary depending on the problem and on the size of the population, but usually somewhere around 3 to 5 elements could be a good choice. Consider that the larger the pool of individuals participating in the tournaments, the lower the chance for low-fitness individuals to be chosen.
For instance, for the third-highest individual to be chosen, neither the best nor the second-best fitness organisms can be chosen (and, of course, the third-best must be), so the probability this happens with a pool of k individuals is[13]
1/n * [(n-2)/n]k-1
assuming n is the size of the population.
For the lowest fitness individual (after, possibly, applying thresholding), it becomes
1/n * [1/n]k-1 = 1/nk
The generic formula for the m-th best fitness becomes
1/n * [(n-m+1)/n]k-1
As you can see, beyond the details and the actual exact probability, the chances of any individual (except the first) are decreasing exponentially with k (while polynomially with m).
It goes without saying that we need to apply tournament selection twice to get the pair of parents we need to generate a single element in the new population.
Roulette wheel selection is definitely more complicated to implement than tournament selection, but the high-level idea is the same: higher-fitness individuals must have more chances to be selected.
As we have seen, in tournament selection the probability that an element with low fitness is chosen decreases polynomially with the rank of the element (its position in the list of organisms sorted by fitness); in particular, since the probability will be O([ (n-m)/n] k), the decrease will be super-linear, because k is certainly greater than 1.
If, instead, we would like for lower-fitness elements to get a real chance of being selected, we could resort to a fairer selection method. One option is assigning the same probability to all organisms, but then we wouldn’t really reward high-fitness individuals anymore.[14] A good balance would be, for example, making sure that the probability of selecting each individual is proportional to its fitness.
If we decide on the latter, we can use the roulette wheel selection. Each organism is assigned a section on a “roulette wheel” whose angle (and in turn the length of the arc subtended) is proportional to the fitness of the organism.
One way to obtain this is to compute the sum of the fitness of all individuals and then, for each one, see the percentage of the cumulative fitness for the whole population it accounts for.[15] For instance, figure 18.8 shows how to apply this to our example population for the knapsack problem. The angles of each organism’s sector are computed with the formula 0=2ra*f, where f is the normalized fitness of the given individual, aka the fraction of the individual’s fitness over the total sum for the whole population.
Then each time we need to select a new parent for mating, we spin the wheel and see where it stops: higher fitness will have a larger probability of being picked, but lower ones still have a shot.
NOTE Notice that if we choose to use rank instead of fitness, we’ll need to sort the population first (time required: O(n*log(n))). Sticking to fitness, we only need a linear number of operations to compute the total and the percentages, and since this is computed at each iteration, for large populations we can have a relevant savings. Likewise, tournament selection also doesn’t require prior sorting of the population.
When it comes to implementing this technique, however, we are clearly not going to bother with building an actual wheel (or even a simulated one)!
The easiest way we can achieve the same result is by building an array like the one shown in figure 18.9, where the i-th element contains the sum of the normalized[16] fitness of the first i individuals in the current population (taken in the order they are stored in the population array).
For example, if we refer to the population shown in figure 18.8, the first element of the wheel-array holds the population[0].normalizedFitness = 0.16, the second element holds population[0].normalizedFitness + population[1].normalizedFitness = 0.16 + 0.15, and so on, as summarized in figure 18.9. Notice that the last element sums up exactly to 1. Ideally, we can imagine a hidden first element, not shown in the figure, whose value is 0 (we’ll see that this helps with writing cleaner, succinct code).
To select an element, we draw a random number r from a regular distribution of real numbers between 0 and 1 (not included). Equivalently, in the wheel analogy, we would draw a random angle θ between 0° and 360° (not included) to find how much we should spin the wheel. The relation between the two numbers would be 0=r*3 60.
Listing 18.6 shows the method to generate a wheel-array like the one in figure 18.9.
To find where our wheel’s pin is pointing, we need to run a search on the array. We look for the smallest element that’s larger than r, the random number drawn by the selection routine. We can modify the binary search algorithm to perform this task and limit the runtime to O( log (n)) for each element selected; therefore, for each iteration, we will have to perform O(n*log(n)) operations to select parents and apply crossover.
Alternatively, a linear search is also possible. It’s a lot easier to write and less likely to be buggy, but that makes the running times grow to O(n) and O(n2) respectively.
Choose wisely depending on the time constraints you have, as well as your domain knowledge. We leave the implementation of your preferred method of search as an exercise, and I strongly suggest you start with the simplest method, test it thoroughly, and then, if you really need the speedup, attempt to write the binary search method and compare their results.
7. Crossover
Once we have selected a pair of parents, we are ready for the next step: as we outlined in listing 18.4, it’s time for crossover.
We mentioned this already, but just to give you a quick recap: crossover simulates mating and sexual reproduction of animal[17] populations in nature, stimulating diversity in the offspring by allowing the recombination of subsets of features from both parents in each child.
In the analogous algorithmic process, crossover corresponds to wide-range search of the fitness function landscape, since we generally recombine large sections of the genome (and hence of the solutions) carried by each of the parents. This is equivalent to a long leap in the problem space (and cost function landscape).
For instance, take our example problem: the 0-1 knapsack and packing goods for a journey in space. Figure 18.10 shows a possible definition for the crossover method for this problem: we choose a single crossover point, an index at which we cut both chromosomes; then one part of each chromosome (at the opposite sides of the crossover point) will be used for recombination. In this case, we just glue the two parts together.
Randomness is and should be a huge part of crossover. In our example, we can see it in the choice of the crossover point. Some choices (as in figure 18.10 (A)) will lead to improvements in the child’s fitness, while other crossover points (as in figure 18.10 (B)) will lead to poorer or sometimes even disastrous results.
This single-crossover-point technique, if you noticed, actually produces two pairs of opposite sub-sequences (left-right and right-left of the crossover point). While in our example we only used one of the pairs and discarded the other, some implementations of the genetic algorithm use both pairs to produce two children with all the possible combinations[18] (either keeping them both or just keeping the one with better fitness).
But nobody says we need to stick with the single-point crossover; that’s just one of many alternatives. For instance, we could have a two-point crossover, selecting a segment of the first chromosome (and, implicitly, one or two remaining segments of the other), or we could even randomly select each value from one of the parents’ chromosomes. Both examples are shown in figure 18.11.
And many more alternatives are possible: as we said before, crossover can only be defined on the actual chromosomes, and depending on their structure and constraints (it ultimately depends on the structure of the candidate solution), different ways of recombining chromosomes are possible. We’ll see plenty of examples later in this chapter.
One last thing: it is customary to associate a crossover chance, aka crossover ratio, to our crossover operators. This represents the probability that crossover/mating actually happens between the selected individuals. For instance, with a crossover ratio of 0.7, there is a 70% chance that crossover does happen for any pair of organisms selected for mating.
As shown in listing 18.7, every time we apply crossover, we randomly decide what to do, based on the crossover chance. If the choice is not to apply crossover, then there are a few alternatives; most commonly, we randomly select one of the parents that move to the next step. Notice that this is different from elitism because the organism outputted by crossover will still undergo mutations, while any individual selected by elitism will be copied completely unaltered to the new population.
Remember that listing 18.7 shows a wrapper for the actual crossover operator, compatible with the template we provided in listing 18.4 for natural selection’s generic template. The actual method will be crafted and passed every time a specific problem instance is addressed.
Listing 18.8 shows the single-point crossover operator that we discussed for the 0-1 knapsack problem.
8. Mutations
After a new organism is created through crossover, the genetic algorithm applies mutations to its newly recombined genome. In nature, as we have seen, the recombination of parents’ genomes and mutations happens simultaneously[19] during sexual reproduction. Conceptually, this works analogously in genetic algorithms, although the two operators are implemented separately. If crossover is regarded as wide-range search, mutations are often used for small-range, local search.
We have seen that crossover promotes diversity in the new population, so why do we need mutations?
The answer is simple and best provided with an example (shown in figure 18.12). The population considered in this example shares the same values for three of the genes. This means that no matter how you recombine the organisms’ genes during crossover, the resulting children will all have beans and strawberries included in the solution and potatoes excluded from it. This is a real risk for problems where chromosomes are large (carrying a lot of information), especially if the length of the chromosome is larger or comparable to the size of the population. The danger is that no matter how long we run our simulation, or how many iterations and crossovers we perform, we won’t be able to flip the values for some genes, which in turn means that the areas of the problem spaces that we can explore are limited by the choice of the initial population. And this, of course, would be a terrible limitation.
To prevent this issue and increase the diversity in the genomic pool of our population, we add a mechanism that can change each gene independently and for any organism.
In Holland’s original work, as we had mentioned, chromosomes were bit strings, and the mutation was also conceived as domain-agnostic. The mutation operator would be applied to each organism’s chromosome, and for each gene (that is, each bit) it would toss a coin[20] and decide if the bit should be flipped. This is shown in figure 18.13.
In modern genetic algorithms, however, chromosomes can take different shapes and have constraints, so mutations can become much more complex or be applied to the chromosome as a whole, rather than to single genes. As we’ll see for TSP, for instance, a mutation operator can be applied (with a certain probability) just once to the whole chromosome (instead of to each of its genes) and swap two vertices in the tour.
The wrapper class for the mutation operator is identical to the one for crossover shown in listing 18.7, apart from minor differences in handling arguments, especially the ones to the wrapped methods: we only have one organism to pass, but it’s a good idea to also forward the mutation chance to the wrapped method. This is necessary for bit-wise mutations, such as the one we discussed for the knapsack problem, implemented in listing 18.9.
9. The genetic algorithm template
Listing 18.10 provides an implementation of the main method for the genetic algorithm template, and also concludes our discussion of the 0-1 knapsack problem. We have all we need now to actually run the algorithm on our instance and find out that the best possible solution is bringing wheat flour, rice, and beans. To see a solution in action, and possibly apply it to larger problem instances with hundreds of possible items to pack, you can use the implementation of the genetic algorithm provided by JsGraphs on GitHub[21] and implement the methods for crossover, mutations, and random initialization that we discussed in this section. It will be a great exercise to test your understanding of this technique and delve deeper into its details.
10. When does the genetic algorithm work best?
To recap what we have discussed so far, in chapters 16 and 17, we have presented a few techniques that can be used in optimization problems to find near-optimal solutions without exploring the whole problem space. Each of them comes with some strengths and some pain points:
- Gradient descent (chapter 16) converges fast but tends to get stuck in local minima. It also requires the cost function to be differentiable (as such, it couldn’t be used in experimental settings or game theory, for instance in the predator- prey robotic evolution experiment described earlier in this section).
- Random sampling (chapter 16) overcomes the need for differentiability and the issue with local minima, but it’s (unbearably) slow in converging (whenit manages to converge).
- Simulated annealing (chapter 17) has the same advantages of random sampling, but evolves in a more controlled and steady way toward minima. Nevertheless, convergence can still be slow, and it has a hard time finding minima when they lie in narrow valleys.
At the beginning of the chapter, we discussed how the genetic algorithm can overcome many of the issues of these heuristics; for instance, it can speed up convergence, compared to simulated annealing, by keeping a pool of solutions and evolving them together.
To conclude this section, I’d like to provide yet another criterion that can help you choose which optimization technique you should use. It involves another term, epista- sis, that is borrowed from biology: its mean is gene interaction, and in our algorithmic analogy, it expresses the presence of dependent variables—in other words, variables whose values depend on other variables.
Let’s first go through an example to better explain this.
Each gene on a chromosome can be considered a separate variable that can assume values within a certain domain. For the 0-1 knapsack problem, each gene is an independent variable, because the value it assumes won’t directly influence other variables. (If we enforce the constraint on the weight of each solution, however, flipping a gene to 1 might force us to flip one or more other genes to 0. That’s a loose indirect dependency, anyway.)
For the TSP, as we’ll see, we assign a vertex to each gene with the constraint that there can’t be any duplicate. As such, assigning a variable will impose constraints on all other variables (that won’t be able to be assigned the same value), so we have a more direct, although still loose, dependency.
If our problem was optimizing the energy efficiency of a house that was being designed, and some variables were its area, the number of rooms, and the amount of wood needed for the floors, the latter would be dependent on the other two, because the area to cover with wood flooring would depend on the square feet and rooms of the house. Changing the size of the house would immediately force us to change the amount of wood used if the floor’s thickness remains fixed (the possibility of changing its thickness makes it meaningful having a separate variable for this).
For 0-1 knapsack, we say that the problem has low variable interaction, and so low epistasis; the house optimization has high epistasis, while for the TSP, its epistasis is somewhere in the middle.
Now, the interesting thing is that the degree of variable interaction contributes to the shape of the landscape of the objective function that we want to optimize: the higher the epistasis, the wavier the landscape will likely look.
But above all, when the degree of interaction is high, varying one variable also changes the value of other variables, and this means making a bigger leap in the cost function’s landscape and problem domain, making it harder to explore the surroundings of solutions and fine-tune the algorithm’s results.
Knowing the epistasis of a problem can guide our choice of the best algorithm to apply:
- With low epistasis, minimum-seeking algorithms like gradient-descent work best.
- With high epistasis, it’s best to prefer random search, so simulated annealing, or for very high variables interaction, random sampling.
- What about genetic algorithms? It turns out that they work best in a wide range of medium to high epistasis.
Therefore, when we design the cost/fitness function for a problem (which, let me state this clearly once again, is one of the most crucial steps to a good solution), we need to be careful about the degree of interaction in order to choose the best technique that we can apply.
During the design phase, we can also try to reduce the dependent variables, whenever possible. This will allow us to use more powerful techniques and ultimately get better solutions.
Now that we have concluded our discussion of the components and theory of genetic algorithms, we are ready to delve into a couple of practical applications to see how powerful this technique is.
Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)