Languages are written using scripts, or groups of symbols or characters used to express a language. English, Spanish, and Norwegian use Latin script. Farsi uses a variant of Arabic script. Hindi and Rajasthani use Devanagari.
Scripts are comprised of characters. In computing, each character in a script is represented by a hexadecimal numeric value, also known a character code. Mapping codes to characters is called character encoding.
There are multiple systems of character encoding available in computing. On the Web, however, you should use Unicode. Unicode is a system that maps characters from multiple scripts to unique hexadecimal numeric values. The Latin letter A, for example, is represented by the number 0041 , while the Armenian character a is represented by the number 0556 . Depending on the context, these numbers may be prefixed by u+ or a \u when used with CSS.
Stick with me here—I promise there’s a point to all of this background information. Fonts map character codes to “glyphs”. A glyph is the actual shape that represents a character. A lowercase letter “a”, for example, can be represented by glyphs from several different fonts, as shown below. From left to right are glyphs representing the letter “a” from the Bodoni 72 Bold, Juju Outline, Junction Bold, and Futura Bold fonts.
Now, font files contain the entire character set or glyph set available for that font. That includes obscure punctuation, characters from other scripts, and symbols such as © and ™. There’s a very good chance you won’t use all of those characters on your site. But if your web font contains them, you’re still sending those bytes to your users.
The good news is that we can manage this using the unicode-range descriptor and a process known as “subsetting”. Subsetting is the process of breaking a font into multiple files, each containing a smaller collection—a subset—of glyphs.
Browsers that fully support unicode-range —and this includes most versions released since 2016—only download a font face when characters in the document fall within its corresponding unicode range.
Most web font services automatically manage subsetting and unicode ranges. For self-hosted fonts, there’s FontTools.
1. Subsetting Self-hosted Fonts with FontTools
Consider a multi-script font such as Gaegu (available with an SIL Open Font License), which includes characters from Latin and Hangul scripts. We might split this font into two files: gaegu-Latin.woff2 and gaegu-hanguL.woff2 . We can then use the unicode-range descriptor to assign each file to a different Unicode range:
src: url(‘https://example.com/fonts/gaegu-latin.woff2’) format(‘woff2’);
unicode-range: U+000-5FF; /* Latin glyph range */
src: url(‘https://example.com/fonts/gaegu-hangul.woff2’) format(‘woff2’);
unicode-range: U+1100-11FF; /* Hangul glyph range (partial) */
For self-hosted fonts, we’ll need to create the subset version of the font ourselves using FontTools14. FontTools is a Python library for manipulating fonts. While this does require us to have Python installed, we don’t need to know how to program with Python.
To install FontTools, we’ll need to use pip , the Python package manager. In a terminal window or at the Windows command-line prompt, type the following:
pip install fonttools[woff]
This installs fonttooLs and two additional libraries that we’ll need for creating WOFF and WOFF2 files: brotli and zopfli.
This command installs a few different subpackages, including ones for font format conversion ( ttx ) and merging fonts ( pyftmerge ). We’re interested in pyftsubset , which can create subsets from OpenType, TrueType, and WOFF font files.
Let’s use pyftsubset to create a Latin-only version of the Gaegu font:
pyftsubset ~/Library/fonts/Gaegu-Regular.ttf –unicodes=U+000-5FF
At a minimum, pyftsubset needs an input file and one or more glyph identifiers or a Unicode range as arguments. In the example above, we’ve used the –Unicodes flag to specify the range of characters to include. Again, both of these arguments are required.
To create a WOFF2 web font, we need to pass an additional –flavor flag:
pyftsubset Gaegu-Regular.ttf –unicodes=U+000-5FF –flavor=”woff2″
For OFL-licensed fonts, we should also rename our font file and remove name information from the font tables. To do that, we need to pass two more flags: –-output-file flag, and -name-IDs :
pyftsubset ~/Library/fonts/Gaegu-Regular.ttf –unicodes=U+000-5FF –flavor=”woff2″ –output-file=’myproject/subsetfont-latin.woff2′ –name-IDs=”
Passing an empty string as the argument for –name-IDs strips all existing name information from the font file. Now we can use our subset OFL-licensed font in our project.
pyftsubset is more feature-rich than we’ve discussed here. We can, for example, exclude ligatures and vertical typesetting data. To see a full list of commands and how they work, use pyftsubset –help.
Source: Brown Tiffany B (2021), CSS , SitePoint; 3rd edition.