Locales in Java

When you look at an application that is adapted to an international market, the most obvious difference you notice is the language. This observation is actually a bit too limiting for true internationalization, since countries can share a common language, but you might still need to do some work to make computer users of both countries happy. As Oscar Wilde famously said: “We have really everything in common with America nowadays, except, of course, language.”

1. Why Locales?

When you provide international versions of a program, all program messages need to be translated to the local language. However, simply translating the user interface text is not sufficient. There are many more subtle differences—for example, numbers are formatted quite differently in English and in German. The number

123.456.78

should be displayed as

123.456.78

for a German user—that is, the roles of the decimal point and the decimal comma separator are reversed. There are similar variations in the display of dates. In the United States, dates are (somewhat irrationally) displayed as month/day/year. Germany uses the more sensible order of day/month/year, whereas in China, the usage is year/month/day. Thus, the date

3/22/61

should be presented as

22.03.1961

to a German user. Of course, if the month names are written out explicitly, the difference in languages becomes apparent. The English

March 22, 1961

should be presented as

in German, or

in Chinese.

A locale captures local preferences such as these. Whenever you present numbers, dates, currency values, and other items whose formatting varies by language or location, you need to use locale-aware APIs.

2. Specifying Locales

A locale is made up of up to five components:

  1. A language, specified by two or three lowercase letters, such as en (English), de (German), or zh (Chinese). Table 7.1 shows common codes.
  2. Optionally, a script, specified by four letters with an initial uppercase, such as Latn (Latin), Cyrt (Cyrillic), or Hant (traditional Chinese characters). This can be useful because some languages, such as Serbian, are written in Latin or Cyrillic, and some Chinese readers prefer the traditional over the simplified characters.
  3. Optionally, a country or region, specified by two uppercase letters or three digits, such as US (United States) or CH (Switzerland). Table 7.2 shows common codes.
  4. Optionally, a variant, specifying miscellaneous features such as dialects or spelling rules. Variants are rarely used nowadays. There used to be a “Nynorsk” variant of Norwegian, but it is now expressed with a different language code, nn. What used to be variants for the Japanese imperial calendar and Thai numerals are now expressed as extensions (see the next item).
  5. Optionally, an extension. Extensions describe local preferences for calen­dars (such as the Japanese calendar), numbers (Thai instead of Western digits), and so on. The Unicode standard specifies some of these exten­sions. Extensions start with u- and a two-letter code specifying whether the extension deals with the calendar (ca), numbers (nu), and so on. For example, the extension u-nu-thai denotes the use of Thai numerals. Other extensions are entirely arbitrary and start with x-, such as x-java.

Rules for locales are formulated in the “Best Current Practices” memo BCP 47 of the Internet Engineering Task Force (http://toots.ietf.org/htmt/bcp47). You can find a more accessible summary at www.w3.org/Internationat/artictes/tanguage-tags.

The codes for languages and countries seem a bit random because some of them are derived from local languages. German in German is Deutsch, Chinese in Chinese is zhongwen: hence de and zh. And Switzerland is CH, deriving from the Latin term Confoederatio Helvetica for the Swiss confederation.

Locales are described by tags—hyphenated strings of locale elements such as en-US.

In Germany, you would use a locale de-DE. Switzerland has four official lan­guages (German, French, Italian, and Rhaeto-Romance). A German speaker in Switzerland would want to use a locale de-CH. This locale uses the rules for the German language, but currency values are expressed in Swiss francs, not euros.

If you only specify the language, say, de, then the locale cannot be used for country-specific issues such as currencies.

You can construct a Locate object from a tag string like this:

Locate usEngtish = Locate.forLanguageTag(“en-US”);

The toLanguageTag method yields the language tag for a given locale. For example, Locale.US.toLanguageTag() is the string “en-US”.

For your convenience, there are predefined locale objects for various countries:

Locale.CANADA

Locale.CANADA_FRENCH

Locale.CHINA

Locale.FRANCE

Locale.GERMANY

Locale.ITALY

Locale.JAPAN

Locale.KOREA

Locale.PRC

Locale.TAIWAN

Locale.UK

Locale.US

A number of predefined locales specify just a language without a location:

Locale.CHINESE

Locale.ENGLISH

Locale.FRENCH

Locale.GERMAN

Locale.ITALIAN

Locale.JAPANESE

Locale.KOREAN

Locale.SIMPLIFIED_CHINESE

Locale.TRADITIONAL_CHINESE

Finally, the static getAvaitabteLocates method returns an array of all locales known to the virtual machine.

3. The Default Locale

The static getDefault method of the Locale class initially gets the default locale as stored by the local operating system. You can change the default Java locale by calling the setDefault method with a different locale.

Some operating systems allow the user to specify different locales for displayed messages and for formatting. For example, a French speaker living in the United States can have French menus but currency values in dollar.

To obtain these preferences, call

Locale displayLocale = Locale.getDefault(Locale.Category.DISPLAY);

Locale formatLocale = Locale.getDefault(Locale.Category.FORMAT);

4. Display Names

Once you have a locale, what can you do with it? Initially, not much, as it turns out. The only useful methods in the Locale class are those for identifying the language and country codes. The most important one is getDisplayName. It returns a string describing the locale. This string does not contain the cryptic two-letter codes, but is in a form that can be presented to a user, such as

German (Switzerland)

Actually, there is a problem here. The display name is issued in the default locale. That might not be appropriate. If your user already selected German as the preferred language, you probably want to present the string in German. You can do just that by giving the German locale as a parameter. The code

var loc = new Locale(“de”, “CH”);

System.out.println(loc.getDisplayName(Locale.GERMAN));

prints

Deutsch (Schweiz)

This example shows why you need Locale objects. You feed them to locale- aware methods that produce text that is presented to users in different locations. You will see many examples of this in the following sections.

Source: Horstmann Cay S. (2019), Core Java. Volume II – Advanced Features, Pearson; 11th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *