Java and BCP 47 Language Tags

Since Java 7, Java’s Locale object has been updated to take on the role of a language tag as identified in RFC 5646 and BCP 47 specs. The newer language tag support gives developers the ability to be more precise in identifying language resources. The new Locale, used as a language tag, can be used to identify languages in this general form:


Of course, you can continue to think of locale as a lang_region_variant identifier, but Java now uses the RFC 5646 spec to enhance the Locale class to support language, script, broader regions, and even newer extensions if needed. And if the rules for generating this string of identifiers seems intimidating, you can use the Locale.Builder class to build up the tag without worries of misforming it.

The primary language identifier is the almost the same item you’ve always known; it’s an ISO 639 2-letter code or 3-letter code. The spec recommends using the shortest id possible.

The script is new. You can now add a proper script identifier that specifies the writing system used for the language. People can use multiple writing systems to write languages. For example, Japanese speakers/writers can use 3 or more different scripts for Japanese: kanji, hiragana, katakana, and even “romaji” or Latin script. Serbian is another language often written in either Latin or Cyrillic characters.

The region identifier was once limited to 2-letter ISO 3166 codes, but now you can also use the United Nations 3-digit macro geographical region codes in the region portion of a language tag. A macro geographical region identifies a larger region that comprises more than one country. For example, the UN currently defines Eastern Europe to be macro region 151 and includes 10 countries within it.

Eastern Europe 151

Finally, you can use variant, extension, and privateuse sub-tags to provide even more context for a language tag. See RFC 5646 for more details on these. I suggest that you also use the Locale.Builder class to assist if you need to use this level of detail.

Take a look at the Locale documentation for all the details on using these new features. They definitely give you much more control of how you identify and use language resources in your internationalized applications.

User error using Java 7 Locale

Yesterday I pushed a blog entry about my experience with the Locale class in Java 7. As I experimented with the class, I discovered a new category enumeration:

  • Locale.Category.DISPLAY
  • Locale.Category.FORMAT

I learned some more about the Locale.setDefault methods that indicate a user error on my part. I stated that ResourceBundle is tied to the DISPLAY default locale. I made this assumption because all the Format subclasses do use the FORMAT category’s default locale setting. I inferred that ResourceBundle would do something similar — that it would use the DISPLAY default locale. It just isn’t true.

In some ways Java 7 has introduced the concept of 3 separate Locale categories: the two above and a system-wide category. It turns out that Locale.setDefault(aLocale) will reset both the DISPLAY and FORMAT defaults to the system-wide locale. However, a call to Locale.setDefault(Category.DISPLAY, anotherLocale) does not affect the system-wide locale.

It is interesting to note that various Format subclasses will use the FORMAT default locale if you don’t explicitly use a locale param in the their getInstance methods. Surprisingly (to me however), the ResourceBundle.getBundle method does NOT use the DISPLAY locale when you don’t provide an explicit locale parameter; it uses the system-wide default locale that was set at startup or the locale you’ve set with Locale.setDefault(aLocale).

So there you have it. My mistake. Hope this helps clarify.

Now that I’ve described the current behavior, I do have an opinion on this. For what it’s worth, my opinion is this: if the system default locale is used by ResourceBundle.getBundle, then other locale-sensitive classes should also use the system-wide default when you don’t explicitly provide a locale in their creation methods. The difference in how locale-sensitive classes use a default locale is confusing in its current Java 7 state. All methods that use a default locale instance should probably use the system-wide locale default.

Oh, and one other thing… If you’re going to have Locale “Categories”, you might as well introduce another: Locale.Category.ALL or Locale.Category.SYSTEM. When you call Locale.setDefault(ALL, aLocale), I would hope that the call would set the default for all other categories that exist now and in the future. Yes, I do realize that a call to Locale.setDefault(aLocale) already resets all the other category defaults, but for consistency’s sake, we need a Category.ALL. My argument for this is simple: it’s consistent and as a user I just expected to see it. When I didn’t, I became confused. I wrote some demo code, and it didn’t work as expected. Yes, it’s user error, but I just know some of you will make the same error if you haven’t already read about the differences here.


Bugs in Java 7 Locale or ResourceBundle?

While working on a chapter in an upcoming APress book, I was experimenting with Java 7’s Locale and ResourceBundle classes. Java 7 introduces two new Locale categories: DISPLAY and FORMAT. You can set the default locale for localizable user interface resources independently from the default locale for data Format subclasses. For example, you supposedly can set up a DISPLAY locale to display Spanish (es) resources for the user interface text but use American English (en-US) for date formats.

This does seem to work correctly the first time you load a ResourceBundle, but subsequent calls to getBundle seem to get stuck with previous DISPLAY locale settings. For example, take a look at the following code:

public void setCategoryDefaultLocale(Locale displayLocale, Locale formatLocale) {
    Locale.setDefault(Locale.Category.DISPLAY, displayLocale);
    Locale.setDefault(Locale.Category.FORMAT, formatLocale);   

public void demoDefaultLocaleSettings() {
    DateFormat df = DateFormat.getDateTimeInstance(DateFormat.SHORT,
    ResourceBundle resource = ResourceBundle.getBundle("mypackage.resource.SimpleResources");
    String greeting = resource.getString("GOOD_MORNING");
    String date = df.format(NOW);
    System.out.printf("DISPLAY LOCALE: %s\n",
    System.out.printf("FORMAT LOCALE:  %s\n",
    System.out.printf("%s, %s\n\n", greeting, date );

Now I call the two methods above in succession:

setCategoryDefaultLocale(Locale.forLanguageTag("es-MX"), Locale.US);
setCategoryDefaultLocale(Locale.US, Locale.forLanguageTag("es-MX"));

In the following output, notice that the first call to demoDefaultLocaleSettings does actually grab the es-MX resources for the DISPLAY locale, and it formats the NOW date using the en-US FORMAT locale. That’s exactly what I’d expect. However, the subsequent call (even after flushing the ResourceBundle cache), still loads the es-MX resources despite the explicit request for en-US resources. I expected that ResourceBundle would have loaded and retrieved “Good morning!” after the second call to setCategoryDefaultLocale.

¡Buenos días!, 9/18/11 1:59 AM

¡Buenos días!, 18/09/11 01:59 AM

I suspect that the real problem isn’t in the Locale.setDefault methods. Instead, I’m going to blame ResourceBundle for this until I can prove otherwise.

If you’ve seen this in the past, let me know. I’m coming up empty for now trying to resolve why I don’t get U.S. English in the second call. Let me know if you know anything about this! Maybe a bug? Maybe user error?

Understanding Locale in the Java Platform

Language and geographic environment are two important influences on our culture. They create the system in which we interpret other people and events in our life. They also affect, even define, proper form for presenting ourselves and our thoughts to others. To communicate
effectively with another person, we must consider and use that person’s culture, language, and environment.

Similarly, a software system should respect its users’ language and geographic region to be effective. Language and region form a locale, which represents the target setting and context for localized software. The Java platform uses java.util.Locale objects to represent locales. This article describes the Locale object and its implications for programs written for the Java platform.

Have a look. It’s an older article, but still perfectly valid and useful: Understanding Locale in the Java Platform.

Understanding locale in the Java platform

traveling dukeLanguage and geographic environment are two important influences on our culture. They create the system in which we interpret other people and events in our life. They also affect, even define, proper form for presenting ourselves and our thoughts to others. To communicate effectively with another person, we must consider and use that person’s culture, language, and environment.

Read Understanding Locale in the Java Platform for more details about how to use locale in your Java applications.