# Simulated annealing and graph embedding

To solve the TSP, we were dealing with a graph while ignoring its embedding,[1] the only thing that mattered was the distance between pair of vertices.

In the last couple of chapters, we focused on abstract graphs and finding meaning­ful ways to embed them in the plane.

If you recall, when we presented simulated annealing, we said it could work well with discrete cost functions, and even step-shaped cost functions. Those, as we have seen, are some of the situations where it’s advisable to prefer simulated annealing over gradient descent.

Therefore, how could we close this chapter without trying to use simulated anneal­ing to crack the minimum edge crossing problem?

### 1. Minimum edge crossing

As always, to apply our template for simulated annealing (shown in listing 17.1) to a concrete problem, we need to specify two functions: the cost function and the update step.

The former is often implied by the problem, and so it is here, for the basic version of the problem: we just count how many edges intersect.

The latter, the update step, leaves more leeway to the algorithm designer. Should we just move a single vertex randomly? By how much? Should we swap the positions of two vertices? Should edges be taken into consideration during the update?

The one advice I can give in these situations is to start small, get something simple working, and then try to add new ideas and validate them by measuring whether or not they help convergence. This is what we have done with the TSP by first developing the methods performing a single action, measuring their effectiveness, and then com­bining them in a random ensemble.

To make things more concrete, we’ll focus on finding an embedding close to the rectilinear crossing number (rcn) for complete graph K8. If you remember Guy’s con­jecture, which we provided in chapter 15, it gives us an exact formula for this graph’s crossing number, which is 18. In that same chapter, however, we also learned that the rectilinear crossing number for complete graphs can be and usually is larger than its crossing number; in this case, rcn(K8) is 19.

So, let’s start with a simple step that takes a vertex and slightly moves it within a cer­tain range. After fiddling with the range of updates for vertices, we chose to update both x and y separately, by choosing two random deltas within 10% of the drawing area’s width and height respectively. Smaller ranges made the algorithm too slow, while with larger values the algorithm behaved erratically. Then, running simulated annealing with k=0.1, α=0.97, T0=200 and maxSteps=500, the average rcn over 100 runs was 21.904. Not bad, but not particularly good.

Two considerations must be made. First, we kept the number of steps low (if you recall, we used up to 10,000 for TSP in the last section). The second consideration is that for the same reasons (which we’ll discuss in a second), we had to go down from 1,000 runs to just 100.

Both changes are due to the fact that computing the cost function and the transi­tions (by cloning a graph embedding) were particularly computationally heavy. Of course, this is partly due to using a general-purpose version of simulated annealing; it could be optimized by writing an ad hoc version that doesn’t need to clone the whole embedding, but just remember what was changed, and what the cost of the current solution was (and maybe also computing the delta based on the changes, not the whole solutions: for instance, just compute the number of intersections for the vertex moved before and after the change).

I would like to avoid focusing on these optimizations here. Don’t get me wrong; optimization is crucial, especially for production-ready code, but early optimization can also get in the way of improving your algorithm or learning process. Premature optimization is still the source of all evil (well, maybe not all of it, but a good chunk!).

In this case, I preferred providing simple, clean, non-optimized code rather than more obscure (though better-performing) routines.

The next step was adding a larger-range search: swapping two vertices’ positions in 10% of the iterations, so that 90% of the time we apply the short-range transition. How do you think it went? Well, just poorly; the average number of intersections grew to 22.89. This, however, wasn’t unexpected. If you think about it, a complete graph is completely symmetrical, so that swapping two vertices is totally useless! Even worse, it was detrimental because we were wasting 10% of the iterations, hence the poorer result.

Nevertheless, this transition can be useful for other types of graphs that aren’t sym­metrical, so we’ll leave it in. (While we are using complete graphs for our examples, the algorithm can and will be applied to any graph. We’ll see in the next section some examples where swapping vertices becomes crucial to getting a good result.)

Yet, we still need to do something different to improve our algorithm. What about choosing a single vertex and moving it randomly anywhere in the drawing area?

We applied this transition 10% of the time, and the average crossing number went down to 19.17, meaning that the algorithm was almost always finding the best solu­tion. Speaking of which, figure 17.14 compares two solutions for the embedding of K8. The one on the left was found by the random sampling algorithm provided in chapter 16, and the other one is the result of the simulated annealing-based method.

It goes without saying that there could be a lot more work to do to improve the algo­rithm by fine-tuning the parameters and perhaps coming up with better operators to tweak the solutions.

Last but not least, the algorithm should be tried and optimized on a diverse set of graphs to be sure to avoid overfitting it to complete graphs. (Alternatively, when faced with a specific problem, you can tune parameters on small instances of the graphs you expect to see, and once ready, apply the tuned version to your real instances.)

From what we could see, scaling up to K10 (figure 17.15), for instance, the configu­ration we used seems to work well with larger complete graphs.

### 2. Force-directed drawing

In section 16.5 we described a class of graph drawing algorithms called force-directed drawing that use a physics-based approach to compute aesthetically-pleasing embed­dings of graphs. Figure 17.16 reminds us why a nice embedding is important when graphs need to be visualized.

The spring embedder model for drawing undirected graphs was first introduced[3] by Peter Eades in the late 1980s, then refined[4] by Kamada and Kawai, who introduced the idea of optimal edge length and sequential vertex update (by moving only one vertex at each step).

The algorithm is evolved into a state of minimum energy by using gradient- descent. Often, though, by using a deterministic learning technique like gradient descent, we are bound to remain stuck in local minima, reaching an equilibrium for the system, but not the state with the minimum possible energy.

It’s no surprise that simulated annealing can help in this case as well. Davidson and Harel first used this technique[5] to converge to optimal embeddings while staying out of local pitfalls.

The problem with using standard simulated annealing is that random search makes convergence too slow. To work around this limitation, several authors suggested using hybrid solutions that leverage the strengths of both approaches. GEM[6] algorithm stands out among them for its innovative approach and impressive results.

GEM doesn’t use simulated annealing, but it borrows the concept of temperature from it. There is no cooling cycle; rather the temperature (which is still expressing the degree of “chaos” in the system) is used to control the range of movement for vertices on update, is computed for each vertex after an update, and is scaled down to smooth vertices’ oscillation.

Since GEM algorithm is not directly appliable as an instance of simulated anneal­ing, we’ll stick with the algorithm developed by Davidson and Harel, which produces results of comparable quality.

As we mentioned in the previous chapters, the first step with graph drawing algo­rithms is stating the criteria that will be used to judge the quality of an embedding; the crossing number is not the only key to drawing graphs nicely. Davidson and Harel’s approach uses five criteria:

• Distributing nodes evenly to spread the vertices uniformly in the canvas
• Keeping vertices away from borders
• Making edge-lengths uniform
• Minimizing edge-crossings
• Avoiding vertex-edges overlap by keeping vertices from coming too close to edges

We must also clarify that the algorithm assumes the edges will be drawn as straight-line segments. Now, let’s see how these five criteria translate to formulas by writing five components of the cost function.

For the first component, the algorithm uses a formula derived from electric poten­tial energy; given two vertices vi  and vj, we compute

where dij is the distance between the two vertices, and λ1 is a parameter we pass to the algorithm to control the weight of this component, a normalizing factor that defines the relative importance of this criterion compared to the others. This term behaves like a repulsive force, so higher values push the algorithm to prefer embeddings with larger distances between the vertices.

To keep vertices away from borders, we add another component. For each vertex vi we compute

Here the values ri, li, ti, and bi are the distances between vi  and the margins of the rectangular canvas where the graph is embedded; λ2 is another normalization factor, to weight this term. Higher values of λ2 will cause embeddings with vertices close to the borders to be penalized more.

Now let’s talk about edges. For each edge ek = u → v, we compute

where dk = distance(u,v) is the length of the edge and λ3 is the usual normalization parameter. This behaves like an attractive force, so larger values for λ3 favor smaller distances between adjacent vertices.

For edge intersections, we can just count them and multiply the numbers of inter­sections by a normalization factor λ4.

Finally, to keep vertices away from edges (if you remember, this was one of the key criteria we gave in chapter 16 to validate embeddings) we can add this term for each vertex-edge pair:

where gkl = d1stance(ek,vl) and λ5 is another normalization factor.

This term (another repulsive force) is quite expensive to compute (the edge-vertex distance is computationally heavy, as we saw in chapter 15), and even in the original paper is not used in the default settings of the algorithm. We’ll leave it out for now, while encouraging the reader to implement it as an exercise and then experiment with it on the examples we will show.

Listing 17.4 shows an implementation of the full cost function (with all five compo­nents), although the examples shown were run without the edge-vertex distance term.

As the next step, we would need the methods to compute transitions to new solu­tions; luckily, though, we can reuse the same methods we defined in the last section. After all, the problem space is the same; the only thing that we need to change is the cost function because we are changing our criteria to decide what a good embedding is.

The algorithm in the paper was only using the vertex local update heuristic, but not with a constant range, rather making the neighborhood in which a vertex can be moved smaller with the progressing of the algorithm.

You can also find the working code implemented for the JsGraphs library on GitHub.[7]

Speaking of good embeddings, in this case it makes little sense to check the quality of the results by looking at the average numbers over many repetitions: we want graphs to be drawn nicely, and there is no magic formula to measure “niceness.” The only way to judge the results is by presenting them to the human eye.

Figure 17.17 is, I think, the perfect summary to explain what we have been building in these last couple of sections. We are trying to come up with a nice embedding for the square grid graph with side 4, a graph with 16 vertices arranged like a square mesh.

Random sampling struggles to even find an embedding without intersections, a goal that is reached by the algorithm presented in section 17.3.1, which, however, doesn’t do a particularly good job of making the structure of the graph clear to us.

The drawing on the right, instead, looks almost perfectly symmetrical. Would you have been able to understand the shape of this graph from the other two embeddings?

For the record, this embedding was obtained by using the values summarized in table 17.4.

Figure 17.18 shows a couple more examples, with a larger grid and a different kind of graph, the triangular grid. They both look pretty nicely drawn, after some parameter tuning.

Before getting too excited and assuming this is the perfect algorithm for all graphs, we obviously need to try it on other kinds of graphs.

Figure 17.19 shows the results for K5 and K7. For both graphs, the embedding found has the minimum possible number of intersections, and vertices look well-spread but, as you can see, these embeddings are not perfect, because some vertices are too close to non-adjacent edges, and thus some edges overlap.

These situations can be corrected by adding the fifth component of the cost function, the one discouraging short distances between vertices and edges.

So, to close this chapter, here is a bit of homework for you: extend the cost func­tion and find better embeddings for these graphs.

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)