Recognizing Text in JavaScript

We have a characterScript function and a way to correctly loop over charac­ters. The next step is to count the characters that belong to each script. The following counting abstraction will be useful there:

function countBy(items, groupName) {

let counts = [];

for (let item of items) {

let name = groupName(item);

let known = counts.findIndex(c => c.name == name);

if (known == -1) { counts.push({name, count: 1});

} else {

counts[known].count++;

}

}

console.log(countBy([1, 2, 3, 4, 5], n => n > 2));

// → [{name: false, count: 2}, {name: true, count: 3}]

The countBy function expects a collection (anything that we can loop over with for/of) and a function that computes a group name for a given element. It returns an array of objects, each of which names a group and tells you the number of elements that were found in that group.

It uses another array method—findIndex. This method is somewhat like indexOf, but instead of looking for a specific value, it finds the first value for which the given function returns true. Like indexOf, it returns -1 when no such element is found.

Using countBy, we can write the function that tells us which scripts are used in a piece of text.

function textScripts(text) {

let scripts = countBy(text, char => {

let script = characterScript(char.codePointAt(0));

return script ? script.name : “none”;

}).filter(({name}) => name != “none”);

let total = scripts.reduce((n, {count}) => n + count, 0);

if (total == 0) return “No scripts found”;

return scripts.map(({name, count}) => { return ‘${Math.round(count * 100 / total)}% ${name}’;

}) .join(“,”);

}

console.log();

// —> 61% Han, 22% Latin, 17% Cyrillic

The function first counts the characters by name, using characterScript to assign them a name and falling back to the string “none” for characters that aren’t part of any script. The filter call drops the entry for “none” from the resulting array since we aren’t interested in those characters.

To be able to compute percentages, we first need the total number of characters that belong to a script, which we can compute with reduce. If no such characters are found, the function returns a specific string. Otherwise, it transforms the counting entries into readable strings with map and then combines them with join.

Source: Haverbeke Marijn (2018), Eloquent JavaScript: A Modern Introduction to Programming,

No Starch Press; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *