One area where higher-order functions shine is data processing. To process data, we’ll need some actual data. This chapter will use a data set about scripts—writing systems such as Latin, Cyrillic, or Arabic.
Remember Unicode from Chapter 1, the system that assigns a number to each character in written language? Most of these characters are associated with a specific script. The standard contains 140 different scripts—81 are still in use today, and 59 are historic.
Though I can fluently read only Latin characters, I appreciate the fact that people are writing texts in at least 80 other writing systems, many of which I wouldn’t even recognize. For example, here’s a sample of Tamil handwriting:
The example data set contains some pieces of information about the 140 scripts defined in Unicode. It is available in the coding sandbox for this chapter (https://eloquentjavascript.net/code#5) as the SCRIPTS binding. The binding contains an array of objects, each of which describes a script.
{
name: “Coptic”,
ranges: [[994, 1008], [11392, 11508], [11513, 11520]], direction: “ltr”, year: -200,
living: false,
link: “https://en.wikipedia.org/wiki/Coptic_alphabet”
}
Such an object tells us the name of the script, the Unicode ranges assigned to it, the direction in which it is written, the (approximate) origin time, whether it is still in use, and a link to more information. The direction may be “ltr” for left to right, “rtl” for right to left (the way Arabic and
Hebrew text are written), or “ttb” for top to bottom (as with Mongolian writing).
The ranges property contains an array of Unicode character ranges, each of which is a two-element array containing a lower bound and an upper bound. Any character codes within these ranges are assigned to the script. The lower bound is inclusive (code 994 is a Coptic character), and the upper bound is non-inclusive (code 1008 isn’t).
Source: Haverbeke Marijn (2018), Eloquent JavaScript: A Modern Introduction to Programming,
No Starch Press; 3rd edition.