JavaScript and Performance – Part 2

1. Avoiding Work

The fastest way to do something is to avoid doing it—or at least part of it—at all. By thinking about what the code is doing, you can often spot unneces­sary redundancy or things that can be done in a faster way.

In the case of our example project, there is such an opportunity for doing less work. Every pair of nodes has the forces between them computed twice, once when moving the first node and once when moving the second. Since the force that node X exerts on node Y is exactly the opposite of the force Y exerts on X, we do not need to compute these forces twice.

The next version of the function changes the inner loop to go over only the nodes that come after the current one so that each pair of nodes is looked at exactly once. After computing a force between a pair of nodes, the function updates the position of both.

function forceDirected_noRepeat(graph) {

for (let i = 0; i < graph.length; i++) {

let node = graph[i];

for (let j = i + 1; j < graph.length; j++) {

let other = graph[j];

let apart = other.pos.minus(node.pos);

let distance = Math.max(1, apart.length);

let forceSize = -repulsionStrength / (distance * distance);

if (node.hasEdge(other)) {

forceSize += (distance – springLength) * springStrength;

}

let applied = apart.times(forceSize / distance);

node.pos = node.pos.plus(applied);

other.pos = other.pos.minus(applied);

}

}

}

Measuring this code shows a significant speed boost. It is twice as fast on Firefox 58, about 30 percent faster on Chrome 63, and 75 percent faster on Edge 6.

The big boost on Firefox and Edge is only partially a result of the actual optimization. Because we need the inner loop to go over only part of the array, the new function replaces the for/of loops with regular for loops. On Chrome, this has no measurable effect on the speed of the program, but on Firefox simply not using iterators makes the code 20 percent faster—and on Edge it makes a 50 percent difference.

So, different JavaScript engines work differently and may run programs at different speeds. A change that makes code run faster in one engine may not help (or may even hurt) in another engine—or even a different version of the same engine.

Interestingly, Chrome’s engine, which is called V8 and is also the one Node.js uses, is able to optimize for/of loops over arrays to code that’s no slower than looping over the index. Remember that the iterator interface involves a method call that returns an object for each element in the itera­tor. Somehow, V8 manages to optimize most of that away.

Taking another good look at what our program is doing, possibly by calling console.log to output forceSize, it becomes clear that the forces gener­ated between most pairs of nodes are so tiny that they aren’t really impacting the layout at all. Specifically, when nodes are not connected and are far away from each other, the forces between them don’t amount to much. Yet we still compute vectors for them and move the nodes a tiny bit. What if we just didn’t?

This next version defines a distance above which pairs of (unconnected) nodes will no longer compute and apply forces. With that distance set to 175, forces below 0.05 are ignored.

const skipDistance = 175;

function forceDirected_skip(graph) {

for (let i = 0; i < graph.length; i++) {

let node = graph[i];

for (let j = i + 1; j < graph.length; j++) {

let other = graph[j];

let apart = other.pos.minus(node.pos);

let distance = Math.max(1, apart.length);

let hasEdge = node.hasEdge(other);

if (!hasEdge && distance > skipDistance) continue;

let forceSize = -repulsionStrength / (distance * distance);

if (hasEdge) {

forceSize += (distance – springLength) * springStrength;

}

let applied = apart.times(forceSize / distance);

node.pos = node.pos.plus(applied);

other.pos = other.pos.minus(applied);

}

}

This yields another 50 percent improvement in speed, with no discern- able degradation of the layout. We cut a corner and got away with it.

2. Profiling

We were able to speed up the program quite a bitjust by reasoning about it. But when it comes micro-optimization—the process of doing things slightly differently to make them faster—it is usually hard to predict which changes will help and which won’t. There we can no longer rely on reasoning—we have to observe.

Our runLayout function measures the time the program currently takes. That’s a good start. To improve something, you must measure it. Without measuring, you have no way of knowing whether your changes are having the effect you intended.

The developer tools in modern browsers provide an even better way to measure the speed of your program. This tool is called a profiler. It will, while a program is running, gather information about the time used by the various parts of the program.

If your browser has a profiler, it will be available from the developer tool interface, probably on a tab called Performance. The profiler in Chrome spits out the following table when I have it record 4,000 iterations of our current program:

This lists the functions (or other tasks) that took a serious amount of time. For each function, it reports the time spent executing the func­tion, both in milliseconds and as a percentage of the total time taken. The first column shows only the time that control was actually in the function, whereas the second column includes time spent in functions called by this function.

As far as profiles go, that’s a very simple one since the program doesn’t have a lot of functions. For more complicated programs, the lists will be much, much longer. But because the functions that take the most time are shown at the top, it is still usually easy to find the interesting information.

From this table, we can tell that by far the most time is spent in the physics simulation function. That wasn’t unexpected. But on the second row, we see the includes array method, as used in GraphNode.hasEdge, taking up about 18 percent of the program’s time.

That is a bit more than I expected. We are calling it a lot—in an 85- node graph (which you get with treeGraph(4, 4)), there are 3,570 pairs of nodes. So with 4,000 iterations, that’s more than 14 million calls to hasEdge.

Let’s see whether we can do better. We add another variant of the hasEdge method to the GraphNode class and create a new variant of our simu­lation function that calls that instead of hasEdge.

GraphNode.prototype.hasEdgeFast = function(other) {

for (let i = 0; i < this.edges.length; i++) {

if (this.edges[i] === other) return true;

}

return false;

};

On Chrome, this shaves about 17 percent off the time it takes to com­pute the layout, which is most of the time taken by includes in the profile.

On Edge, it makes the program 40 percent faster. But on Firefox, it makes it slightly (about 3 percent) slower. So, in this case, Firefox’s engine (called SpiderMonkey) did a better job optimizing calls to includes.

The row labeled “Minor GC” in the profile gives us the time spent clean­ing up memory that is no longer being used. Given that our program creates a huge number of vector objects, the 3 percent time spent reclaiming mem­ory is strikingly low. JavaScript engines tend to have very effective garbage collectors.

3. Function Inlining

No vector methods (such as times) show up in the profile we saw, even though they are being used heavily. This is because the compiler inlined them. Rather than having the code in the inner function call an actual method to multiply vectors, the vector-multiplication code is put directly inside the function, and no actual method calls happen in the com­piled code.

There are various ways in which inlining helps make code fast. Func­tions and methods are, at the machine level, called using a protocol that requires putting the arguments and the return address (the place where exe­cution must continue when the function returns) somewhere the function can find them. The way a function call gives control to another part of the program also often requires saving some of the processor’s state so that the called function can use the processor without interfering with data that the caller still needs. All of this becomes unnecessary when a function is inlined.

Furthermore, a good compiler will do its best to find ways to simplify the code it generates. If functions are treated as black boxes that might do any­thing, the compiler does not have a lot to work with. On the other hand, if it can see and include the function body in its analysis, it might find additional opportunities to optimize the code.

For example, aJavaScript engine could avoid creating some of the vec­tor objects in our code altogether. In an expression like the following one, if we can see through the methods, it is clear that the resulting vector’s coordinates are the result of adding force’s coordinates to the product of normalized’s coordinates and the forceSize binding. Thus, there is no need to create the intermediate object produced by the times method.

pos.plus(normalized.times(forceSize))

But JavaScript allows us to replace methods at any time. How does the compiler figure out which function this times method actually is? And what if someone changes the value stored in Vec.prototype.times later? The next time code that has inlined that function runs, it might continue to use the old definition, violating the programmer’s assumptions about the way their program behaves.

This is where the interleaving of execution and compilation starts to pay off. When a hot function is compiled, it has already run a number of times. If, during those runs, it always called the same function, it is reasonable to try inlining that function. The code is optimistically compiled with the assumption that, in the future, the same function is going to be called here.

To handle the pessimistic case, where another function ends up being called, the compiler inserts a test that compares the called function to the one that was inlined. If the two do not match, the optimistically compiled code is wrong, and the JavaScript engine must deoptimize, meaning it falls back to a less optimized version of the code.

4. Creating Less Garbage

Though some of the vector objects that we are creating might be optimized away entirely by some engines, there is likely still a cost to creating all those objects. To estimate the size of this cost, let’s write a version of the code that does the vector computations “by hand,” using local bindings for both dimensions.

function forceDirected_noVector(graph) {

for (let i = 0; i < graph.length; i++) {

let node = graph[i];

for (let j = i + 1; j < graph.length; j++) {

let other = graph[j];

let apartX = other.pos.x – node.pos.x;

let apartY = other.pos.y – node.pos.y;

let distance = Math.max(1, Math.sqrt(apartX * apartX +

apartY * apartY));

let hasEdge = node.hasEdgeFast(other);

if (!hasEdge && distance > skipDistance) continue;

let forceSize = -repulsionStrength / (distance * distance);

if (hasEdge) {

let forceX = apartX * forceSize / distance;

let forceY = apartY * forceSize / distance;

node.pos.x += forceX; node.pos.y += forceY;

other.pos.x -= forceX; other.pos.y -= forceY;

}

}

}

The new code is wordier and more repetitive, but if I measure it, the improvement is large enough to consider doing this kind of manual object flattening in performance-sensitive code. On both Firefox and Chrome the new version is about 30 percent faster than the previous one. On Edge it’s about 60 percent faster.

Taking all these steps together, we’ve made the program about 5 times faster than the initial version on Chrome and Firefox and more than 20 times faster on Edge. That’s quite an improvement. But remember that doing this work is useful only for code that actually takes a lot of time. Try­ing to optimize everything right away will only slow you down and leave you with a lot of needlessly overcomplicated code.

5. Garbage Collection

So why is the code that avoids creating objects faster? There are several reasons. The engine has to find a place to store the objects, it has to figure out when they are no longer used and reclaim them, and when you access their properties, it has to figure out where in memory those are stored. JavaScript engines are good at all these things but usually not so good that they become free.

Imagine memory, again, as a long, long row of bits. When the program starts, it might receive an empty piece of memory and just start putting the objects it creates in there, one after the other. But at some point, the space is full, and some of the objects in it are no longer used. The JavaScript engine has to figure out which objects are used, and which are not, so that it can reuse the unused pieces of memory.

Now the program’s memory space is a bit of a mess, containing living objects interspersed with free space. Creating a new object involves finding a piece of free space large enough for the object, which might require some searching. Alternatively, the engine could move all live objects to the start of the memory space, which makes creating new objects cheaper (they can just be put one after the other again) but requires more work when moving the old objects.

In principle, figuring out which objects are still used requires tracing through all reachable objects, starting from the global scope and the cur­rently active local scope. Any object referenced from those scopes, directly or indirectly, is still alive. If your program has a lot of data in memory, this is quite a lot of work.

A technique called generational garbage collection can help reduce these costs. This approach exploits the fact that most objects have short lives. It splits the memory available to the JavaScript program into two or more gen­erations. New objects are created in the space reserved for the young genera­tion. When this space is full, the engine figures out which of the objects in it are still alive and moves those to the next generation. If only a small fraction of the objects in the young generation are still alive when this occurs, only a small amount of work has to be done to move them.

Of course, figuring out which objects are alive does require knowing about all references to objects in the live generation. The garbage collector wants to avoid looking through all the objects in the older generations every time the young generation is collected. So when a reference is created from an old object to a new object, this reference must be recorded so that it can be taken into account during the next collection. This makes writing to old objects slightly more expensive, but that cost is more than compensated for by the time saved during garbage collection.

6. Dynamic Types

JavaScript expressions like node.pos, which fetch a property from an object, are far from trivial to compile. In many languages, bindings have a type, and thus, when you perform an operation on the value they hold, the compiler already knows what kind of operation you need. In JavaScript, only values have types, and a binding can end up holding values of different types.

This means that initially the compiler knows little about the property the code might be trying to access and has to produce code that handles all possible types. If node holds an undefined value, the code must throw an error. If it holds a string, it must look up pos in String.prototype. If it holds an object, the way the pos property is extracted from it depends on the shape of object. And so on.

Fortunately, though JavaScript does not require it, bindings in most programs do have a single type. And if the compiler knows this type, it can use that information to produce more efficient code. If node has always been an object with pos and edges properties so far, the optimizing compiler code can create code that fetches the property from its known position in such an object, which is simple and fast.

But events observed in the past do not give any guarantees about events that will occur in the future. Some piece of code that hasn’t run yet might still pass another type of value to our function—a different kind of node object, for example, that also has an id property.

So the compiled code still has to check whether its assumptions hold and take an appropriate action if they don’t. An engine could deoptimize entirely, falling back to the unoptimized version of the function. Or it could compile a new version of the function that also handles the newly observed type.

You can observe the slowdown caused by the failure to predict object types by intentionally messing up the uniformity of the input objects for our graph layout function, as in this example:

let mangledGraph = treeGraph(4, 4);

for (let node of mangledGraph) {

node[‘p${Math.floor(Math.random() * 999)}’] = true;

}

runLayout(forceDirected_noVector, mangledGraph);

Every node gets an extra property with a random name. If we run our fast simulation code on the resulting graph, it becomes 5 times as slow on Chrome 63 and 10 (!) times as slow on Firefox 58. Now that object types vary, the code has to look up the properties without prior knowledge about the shape of the object, which is a lot more expensive to do.

Interestingly, after running this code, forceDirected_noVector has become slow even when run on a regular, non-mangled graph. The messy types have “poisoned” the compiled code, at least for a while—at some point, browsers tend to throw away compiled code and recompile it from scratch, removing this effect.

A similar technique is used for things other than property access. The + operator, for example, means different things depending on what kind of values it is applied to. Instead of always running the full code that handles all these meanings, a smart JavaScript compiler will use previous observa­tions to build up some expectation of the type that the operator is probably being applied to. If it is applied only to numbers, a much simpler piece of machine code can be generated to handle it. But again, such assumptions must be checked every time the function runs.

The moral of this story is that if a piece of code needs to be fast, you can help by feeding it consistent types. JavaScript engines can handle cases where a few different types occur relatively well—they will generate code that handles all these types and deoptimizes only when a new type is seen. But even there, the resulting code is slower than what you would get for a single type.

Source: Haverbeke Marijn (2018), Eloquent JavaScript: A Modern Introduction to Programming, No Starch Press; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *