Java and BCP 47 Language Tags

Since Java 7, Java’s Locale object has been updated to take on the role of a language tag as identified in RFC 5646 and BCP 47 specs. The newer language tag support gives developers the ability to be more precise in identifying language resources. The new Locale, used as a language tag, can be used to identify languages in this general form:


Of course, you can continue to think of locale as a lang_region_variant identifier, but Java now uses the RFC 5646 spec to enhance the Locale class to support language, script, broader regions, and even newer extensions if needed. And if the rules for generating this string of identifiers seems intimidating, you can use the Locale.Builder class to build up the tag without worries of misforming it.

The primary language identifier is the almost the same item you’ve always known; it’s an ISO 639 2-letter code or 3-letter code. The spec recommends using the shortest id possible.

The script is new. You can now add a proper script identifier that specifies the writing system used for the language. People can use multiple writing systems to write languages. For example, Japanese speakers/writers can use 3 or more different scripts for Japanese: kanji, hiragana, katakana, and even “romaji” or Latin script. Serbian is another language often written in either Latin or Cyrillic characters.

The region identifier was once limited to 2-letter ISO 3166 codes, but now you can also use the United Nations 3-digit macro geographical region codes in the region portion of a language tag. A macro geographical region identifies a larger region that comprises more than one country. For example, the UN currently defines Eastern Europe to be macro region 151 and includes 10 countries within it.

Eastern Europe 151

Finally, you can use variant, extension, and privateuse sub-tags to provide even more context for a language tag. See RFC 5646 for more details on these. I suggest that you also use the Locale.Builder class to assist if you need to use this level of detail.

Take a look at the Locale documentation for all the details on using these new features. They definitely give you much more control of how you identify and use language resources in your internationalized applications.

LocalDate in Java 8

Halloween 2014 calThe java.time.LocalDate class is new in Java 8. Inspired by the Joda Time library, LocalDate represents a date as it might be used from a wall calendar. It is not a singular instant in time like java.util.Date. You might use a LocalDate to represent a birthday, the start of a school year, or an anniversary. LocalDate text representations are familiar. They look like “2014-10-09” or “October 9, 2014” or other similar user-friendly text.

How do I create a LocalDate?

Create a LocalDate with any of its several static creation methods. I won’t cover all the ways here but will show you three methods, one of which requires close attention.

Give it to me now!

Want a LocalDate now? Here’s how you do it (assume I’m in Los Angeles CA):

LocalDate todayInLA =;

LocalDate will provide a text string formatted in ISO YYYY-MM-dd format if you call its toString method:

String formattedDate = todayInLA.toString();

This will create text like this:


You should know this fact about now(). It uses your computer’s default time zone to retrieve now’s date. So, a computer in Los Angeles (USA) and a computer in Dublin (Ireland) could execute this same method at the same time and produce two different calendar dates. It’s only reasonable they should. After all, they would be in different time zones.

If you want to be specific about the time zone used when creating the local date, use the over-loaded method with a time zone:

ZoneId zoneDublin = ZoneId.of("Europe/Dublin");
LocalDate todayInDublin =;

Executed at exactly the same time as the previous now method, this time zone-aware method could return the following:


I want a specific date

You can be more specific about your date too. You can ask for a date for a specific year, month, and day. This creates a LocalDate that is the same regardless of time zone. For example, let’s create a local date for Halloween (celebrated on October 31 for those areas that celebrate the holiday):

LocalDate halloween = LocalDate.of(2014, 10, 31);

The toString method produces this:


Can I format the LocaleDate differently?

Of course you can! You’ll need the format method for that. The format method uses a DateTimeFormatter that understands locale-sensitive preferences for printed dates.

You can learn about the format method and more from the LocalDate Javadocs.

Using Jersey (JAX-RS) with Spring

Jersey spring
Spring does a great job for what it does — dependency injection, transaction management, database query simplification, and plenty of other things. Jersey, the reference JAX-RS implementation, is also excellent for what it does — it simplifies the creation of RESTful web services by providing a framework for defining resources, accessing them, and creating their service representations. Wanting to use them both in the same project is only natural! This article shows you how to use both Spring and Jersey. It uses Spring to manage bean creation and scope, and it uses Jersey for resource creation and path mapping.

First, set up your dependencies to include both Spring and Jersey. Make sure you have the following important dependencies:
1. The spring framework
2. The Jersey-Spring integration project that includes a custom servlet that tells Jersey to let Spring handle all the bean creation and injection
3. The Jersey JAX-RS framework itself

Assuming you use Maven, your dependencies look something like this:





You’ll notice that I excluded a lot of Spring items from the Jersey integration dependency. That’s because our project explicitly asks to use a specific Spring version, and I don’t want the versions that Jersey imports.

Next, you have to update your web.xml file to use the Jersey servlet for Spring support. In this web.xml file, I don’t even use a DispatcherServlet at all. Instead, any url mapping will be dispatched and handled by the Jersey annotations. Here’s the web.xml:

    <display-name>Spring - Jersey Integration</display-name>


        <servlet-name>Spring Container</servlet-name>

        <servlet-name>Spring Container</servlet-name>

Finally, create your Spring beans in an applicationContext.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns=""

    <bean id="helloworld" class="" scope="session"/>

The SpringJerseyBean is this:



public class SpringJerseyBean {
    private int count = 0;

    public String sayHello() {
        return "Hello world: " + count++;

Now you have everything you need to create a Jersey service using any of Spring’s additional features. In this configuration, Spring instantiates a helloWorld bean in session scope. Of course, you can change that to any of Spring’s other bean scopes too.

Comparison of the Instant and Date Classes


Java 8 has a new java.time package, and one of its new classes is Instant. The best counterpart to this in past platforms is the java.util.Date class.

There are a couple notable differences between Date and Instant:

  • Date has very few useful methods, and Instant provides many.
  • Instant provides finer time granularity and a longer timeline.

Most of Date’s methods have been deprecated. Date manipulation and formatting have been delegated to the Calendar and DateFormat classes. In comparison, the Instant class allows you to perform some very basic functionality directly. You can add seconds and milliseconds for example. You can parse and generate ISO 8601 date strings with Instant as well. ISO 8601 dates have a consistent form across all locales and look like this: 2014-08-12T14:51:53:00Z. Most of the Instant methods are purely for convenience. You can do the similar things with Date using the Calendar and DateFormat classes.

Both Date and Instant have the same epoch (1970-01-01T00:00:00Z), but Instant can represent a much longer timeline. Date’s internal structure uses a long to represent milliseconds from the epoch. Instant, however, uses a long to represent seconds from epoch AND an int to represent nanoseconds of that second. That certainly means you don’t have to worry about date rollover problems in the near future.

The differences between Date and Instant are relatively minor, but these classes really are the starting point of of a more thorough discussion of the java.time package. Expect more details in the near future.

Standard Charsets in Java 7

Once in a while I poke my nose through the release notes of new Java releases. It’s not a particularly rewarding activity, but this time I did find something interesting. Oddly enough, it was interesting for what it did NOT say. I was surprised, so I thought you might want to know about a new class that is now available and quietly overlooked in any release notes.

Character sets have their own class representation in Java: Charset. You can use the Charset class to identify a character set for encoding or decoding. To create a Charset object, you use a factory method: Charset.forName(String charset). The uncomfortable trick to using this method is that you must be prepared to catch an exception if the JRE doesn’t actually supply the requested character set. Bummer.

I’ve always wondered why the JDK allows a random string as the parameter. I suppose it was for convenience…to allow the JDK to be updated over time with new charset support without having to change any API or enumeration. That’s understandable. But not really knowing what minimal set of character sets is supported in a particular JDK is somewhat…unnerving…especially to an engineer just trying to get his/her work finished.

The JDK documentation was always clear on what character sets you could absolutely depend on to be present. That was helpful and much needed. At least an observant developer could depend on that. However, the JDK now provides a more robust and useful way to identify which charsets are minimally supported. Java 7 provides a new class: java.nio.charset.StandardCharsets.

StandardCharsets does one thing. It lets you know what set of character sets is minimally supported in your JDK. The set is probably unchanged from Java 6 or Java 5 or even earlier. However, now you don’t have to read the documentation as carefully; the standard set is given to you. The Standardcharsets class explicitly enumerates the normal set for you.

Rocket science? No. But this welcome addition to the JDK was a long time in coming, and I’m glad to have found it.

Encoding Unicode Characters When UTF-8 Is Not An Option

The other day I suggested that you use UTF-8 to encode your Java source code files. I still think that’s a best practice. If you can do that, you owe it to yourself to follow that advice.

But what if you can’t store text as UTF-8? Perhaps your repository won’t allow it. Or maybe you simply can’t standardize on UTF-8 across the groups. What then? In that case, you should use ASCII to encode your text files. It’s not an optimal solution. However, I can help you get more interesting Unicode characters into your Java source files despite the actual file encoding limitation.

The trick is to use the native2ascii tool to convert your non-ASCII unicode characters to a \uXXXX encoding. After editing and creating a file contain UTF-8 text like this:

String interestingText = "家族";

You would instead run the native2ascii tool on the file to produce an ASCII file that encodes the non-ASCII characters in  \u-encoded notation like this:

String interestingText="\u5BB6\u65CF";

In your compiled code, the result is the same. Given the correct font, the characters will display properly. U+5BB6 and U+65CF are the code points for “家族”. Using this type of \u-encoding, we’ve solved the problem of getting the non-ASCII characters into your text file and repository. Simply save the converted, \u-encoded file instead of the original, non-ASCII file.

The native2ascii tool is part of your Java Development Kit (JDK). You will use it like this:

native2ascii -encoding UTF-8 <inputfile> <outputfile>

There you have it…an option for getting Unicode characters into your Java files without actually using UTF-8 encoding.


Best practice: Use UTF-8 as your source code encoding


Software engineering teams have become more distributed in the last few years. It’s not uncommon to have programmers in multiple countries, maybe a team in Belarus and others in Japan and in the U.S. Each of these teams most likely speaks different languages, and most likely their host systems use different character encodings by default. That means that everyone’s source code editor creates files in different encodings too. You can imagine how mixed up and munged a shared source code repository might become when teams save, edit and re-save source files in multiple charset encodings. It happens, and it has happened to me when working with remote teams.

Here’s an example, you create a test file containing ASCII text. Overnight, your Japanese colleagues edit and save the file with a new test and add the following line in it:

String example = "Fight 文字化け!";

They save the file and submit it to the repository using Shift-JIS or some other common legacy encoding. You pick up the file the next day, add a couple lines to it, save it, and BAM! Data loss. Your editor creates garbage characters because it attempts to save the file in the ISO-8859-1 encoding. Instead of the correct Japanese text from above, your file now contains the text “Fight ?????” Not cool, not fun. And you’ve most likely broken the test as well.

How can you avoid these charset mismatches? The answer is to use a common charset across all your teams. The answer is to use Unicode, and more specifically, to use UTF-8. The reason is simple. I won’t try hard to defend this. It’s just seems obvious. Unicode is a superset of all other commonly used character sets. UTF-8, a specific encoding of Unicode, is backward compatible with ASCII character encoding, and all programming language source keywords and syntax (that I know) is composed of ASCII text. UTF-8 as a common charset encoding will allow all of your teams to share files, use characters that make sense for their tests or other source files, and never lose data again because of charset encoding mismatches.

If your editor has a setting for file encodings, use it and choose UTF-8. Train your team to use it too. Set up your ANT scripts and other build tools to use the UTF-8 encoding for compilations. You might have to explicitly tell your java compiler that source files are in UTF-8, but this is worth making the change. By the way, the javac command line argument you need is simply “-encoding UTF-8”.

Recently I was using NetBeans 7 in a project and discovered that it defaults to UTF-8. Nice! I was pleasantly surprised.

Regardless of your editor, look into this. Find out how to set your file encodings to UTF-8. You’ll definitely benefit from this in a distributed team environment in which people use different encodings. Standardize on this and make it part of your team’s best practices.

“Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.”

User error using Java 7 Locale

Yesterday I pushed a blog entry about my experience with the Locale class in Java 7. As I experimented with the class, I discovered a new category enumeration:

  • Locale.Category.DISPLAY
  • Locale.Category.FORMAT

I learned some more about the Locale.setDefault methods that indicate a user error on my part. I stated that ResourceBundle is tied to the DISPLAY default locale. I made this assumption because all the Format subclasses do use the FORMAT category’s default locale setting. I inferred that ResourceBundle would do something similar — that it would use the DISPLAY default locale. It just isn’t true.

In some ways Java 7 has introduced the concept of 3 separate Locale categories: the two above and a system-wide category. It turns out that Locale.setDefault(aLocale) will reset both the DISPLAY and FORMAT defaults to the system-wide locale. However, a call to Locale.setDefault(Category.DISPLAY, anotherLocale) does not affect the system-wide locale.

It is interesting to note that various Format subclasses will use the FORMAT default locale if you don’t explicitly use a locale param in the their getInstance methods. Surprisingly (to me however), the ResourceBundle.getBundle method does NOT use the DISPLAY locale when you don’t provide an explicit locale parameter; it uses the system-wide default locale that was set at startup or the locale you’ve set with Locale.setDefault(aLocale).

So there you have it. My mistake. Hope this helps clarify.

Now that I’ve described the current behavior, I do have an opinion on this. For what it’s worth, my opinion is this: if the system default locale is used by ResourceBundle.getBundle, then other locale-sensitive classes should also use the system-wide default when you don’t explicitly provide a locale in their creation methods. The difference in how locale-sensitive classes use a default locale is confusing in its current Java 7 state. All methods that use a default locale instance should probably use the system-wide locale default.

Oh, and one other thing… If you’re going to have Locale “Categories”, you might as well introduce another: Locale.Category.ALL or Locale.Category.SYSTEM. When you call Locale.setDefault(ALL, aLocale), I would hope that the call would set the default for all other categories that exist now and in the future. Yes, I do realize that a call to Locale.setDefault(aLocale) already resets all the other category defaults, but for consistency’s sake, we need a Category.ALL. My argument for this is simple: it’s consistent and as a user I just expected to see it. When I didn’t, I became confused. I wrote some demo code, and it didn’t work as expected. Yes, it’s user error, but I just know some of you will make the same error if you haven’t already read about the differences here.


Bugs in Java 7 Locale or ResourceBundle?

While working on a chapter in an upcoming APress book, I was experimenting with Java 7’s Locale and ResourceBundle classes. Java 7 introduces two new Locale categories: DISPLAY and FORMAT. You can set the default locale for localizable user interface resources independently from the default locale for data Format subclasses. For example, you supposedly can set up a DISPLAY locale to display Spanish (es) resources for the user interface text but use American English (en-US) for date formats.

This does seem to work correctly the first time you load a ResourceBundle, but subsequent calls to getBundle seem to get stuck with previous DISPLAY locale settings. For example, take a look at the following code:

public void setCategoryDefaultLocale(Locale displayLocale, Locale formatLocale) {
    Locale.setDefault(Locale.Category.DISPLAY, displayLocale);
    Locale.setDefault(Locale.Category.FORMAT, formatLocale);   

public void demoDefaultLocaleSettings() {
    DateFormat df = DateFormat.getDateTimeInstance(DateFormat.SHORT,
    ResourceBundle resource = ResourceBundle.getBundle("mypackage.resource.SimpleResources");
    String greeting = resource.getString("GOOD_MORNING");
    String date = df.format(NOW);
    System.out.printf("DISPLAY LOCALE: %s\n",
    System.out.printf("FORMAT LOCALE:  %s\n",
    System.out.printf("%s, %s\n\n", greeting, date );

Now I call the two methods above in succession:

setCategoryDefaultLocale(Locale.forLanguageTag("es-MX"), Locale.US);
setCategoryDefaultLocale(Locale.US, Locale.forLanguageTag("es-MX"));

In the following output, notice that the first call to demoDefaultLocaleSettings does actually grab the es-MX resources for the DISPLAY locale, and it formats the NOW date using the en-US FORMAT locale. That’s exactly what I’d expect. However, the subsequent call (even after flushing the ResourceBundle cache), still loads the es-MX resources despite the explicit request for en-US resources. I expected that ResourceBundle would have loaded and retrieved “Good morning!” after the second call to setCategoryDefaultLocale.

¡Buenos días!, 9/18/11 1:59 AM

¡Buenos días!, 18/09/11 01:59 AM

I suspect that the real problem isn’t in the Locale.setDefault methods. Instead, I’m going to blame ResourceBundle for this until I can prove otherwise.

If you’ve seen this in the past, let me know. I’m coming up empty for now trying to resolve why I don’t get U.S. English in the second call. Let me know if you know anything about this! Maybe a bug? Maybe user error?