Language Information and Text Directionality in HTML

Specifying the language in which a document is written and text direction (e.g., left to right and right to left) are essential in ensuring that the browser renders the document correctly and as intended. This is one aspect of the internationalization of documents, in that it makes documents accessible to as many international users as possible. The attributes primarily used accomplish these tasks are the lang and dir attributes.

1. lang

The lang attribute specifies the base language of the attribute values and content of an element. It is intended to make browsers render a Web page meaningfully, based on the accepted usage for a specified language. It is inherited, and the value it takes is a language code that represents a natural language. A language code comprises a primary code and a possible sub­code. The primary code is a two-letter code that represents a language abbreviation, and the sub-code represents a country code. In the language code “en-US,” for example, “en” is the primary code and represents English, and “US” is the sub-code and represents USA. The entire language code means US version of English. Most languages require only the primary code. Common primary codes include fr (French), it (Italian), de (Germany), nl (Dutch), es (Spanish), el (Greek), pt (Portuguese), ja (Japanese), zh (Chinese), ru (Russian), and he (Hebrew). Relevant language codes can be found in IANA Language Subtag Registry at iana.org. Figure 3.42 shows examples of how the attribute is used.

Notice in the example that the lang attribute is used on the <html> element and then on the child elements, as necessary. You should always use the attribute on the <html> element, not on the <body> element, to ensure that the text inside the <head> element is covered.

1.1. Benefits of Using the lang Attribute

Providing language information in your document has several benefits. It allows you to provide language-dependent styling and behavior. Situations in which providing language information in your content is helpful include:

  • Helping search engines identify words, based on users’ language preferences.
  • Helping speech synthesizers, such as screen readers, pronounce words properly.
  • Assisting the browser in making decisions on language-dependent matters, such as where to place hyphens; where to place line breaks; how to justify; when to convert the case of letters; and which font variants, quotation marks, ligatures, and spacing to use.
  • Helping in the checking of spelling and grammar, for example, of the user’s input.
  • Helping you to set different styles for different languages in a multilingual document. You can see example of this in Section
  • of Chapter 14.

2. dir

The order in which browsers display text depends on the base direction set for, or inherited by, the element that contains it. The attribute used to set base direction is the dir attribute, and the values it takes are ltr (left to right), rtl (right to left), and auto (which leaves it to the user agent to decide). The default is ltr. It is inherited and can be overridden. It is useful to note that the attribute does not actually affect the order in which the characters of text are displayed, but it affects only the order of the words. It only helps, in combination with other processes, determine how the browser handles the display of text. In some cases, it only visually aligns text left or right.

The way the determination of text direction works in browsers is that each character in Unicode (introduced in Chapter 2) has a directionality property associated with it. Some characters are designated as ltr (left to right) and others as rtl (right to left). In addition, Unicode provides the Unicode bidirectional (bidi) algorithm, which is used to display these characters, using their directionality properties. Browsers, by default, determine the direction in which to display a sequence of characters (e.g., a word), using the bidi algorithm, and do this automatically, independently of the current base direction. For example, for a sequence of Latin characters, it displays one after the other from left to right, and for a sequence of Arabic or Hebrew characters, it displays one after the other from right to left. Therefore, the word “forward” in English, for example, is displayed from left to right, while the same word in Arabic is displayed from right to left.

This means that the base direction set with the dir attribute is used only to determine the direction in which the words are displayed. Basically, it makes the word that is displayed first in left-to-right direction display last in right-to-left direction, and vice versa. Incidentally, the bidi algorithm can be turned off, using the <bdo> element (or bidirectional override element), which overrides the current directionality properties of characters. Sometimes, it is necessary to do this when the algorithm does not produce the desired result. This usually happens when different languages are mixed in the same text. The use of the dir attribute is mandatory for the <bdo> element. Another element that can be used to resolve problems from mixing languages is the <bdi> element (or bidirectional isolation element), which can be used to isolate text that needs to be formatted differently from the surrounding text. However, this element is not supported by all major browsers. An alternative way of resolving the same issue is to use an inline element (such as the <span> element) to isolate the relevant text and then use the dir attribute. Figures 3.43 and 3.44 show some examples of how these attributes and elements are used and the effects.

In the example, the content of the first <p> element is displayed using the default ltr base direction. In the second, specifying hr does not make a difference. In the third, rtl starts the text from the right. In the fourth, the Hebrew text is not displayed from the right, as it should be, because the default ltr base direction is used. Note that the bidi algorithm still ensures that the characters of each word are displayed correctly. In the fifth <p> element, specifying rtl makes the text display correctly from the right. In the sixth, two different languages are displayed using default base direction. This makes the Hebrew text to display improperly from right to left. In the seventh, the <bdo> element overrides the bidi algorithm for the Hebrew text and sets the base direction to rtl. The <bdi> element is also used to isolate

W3C to ensure that it is displayed from left to right, since it is English. In the eighth <p> element, the <bdo> element is used to override the bidi algorithm and set the direction to rtl, even though the correct direction for displaying the text is from left to right.

Source: Sklar David (2016), HTML: A Gentle Introduction to the Web’s Most Popular Language, O’Reilly Media; 1st edition.

Leave a Reply

Your email address will not be published. Required fields are marked *