Posts Tagged ‘Java’

Standard Charsets in Java 7

February 1st, 2013 joconner No comments

Once in a while I poke my nose through the release notes of new Java releases. It’s not a particularly rewarding activity, but this time I did find something interesting. Oddly enough, it was interesting for what it did NOT say. I was surprised, so I thought you might want to know about a new class that is now available and quietly overlooked in any release notes.

Character sets have their own class representation in Java: Charset. You can use the Charset class to identify a character set for encoding or decoding. To create a Charset object, you use a factory method: Charset.forName(String charset). The uncomfortable trick to using this method is that you must be prepared to catch an exception if the JRE doesn’t actually supply the requested character set. Bummer.

I’ve always wondered why the JDK allows a random string as the parameter. I suppose it was for convenience…to allow the JDK to be updated over time with new charset support without having to change any API or enumeration. That’s understandable. But not really knowing what minimal set of character sets is supported in a particular JDK is somewhat…unnerving…especially to an engineer just trying to get his/her work finished.

The JDK documentation was always clear on what character sets you could absolutely depend on to be present. That was helpful and much needed. At least an observant developer could depend on that. However, the JDK now provides a more robust and useful way to identify which charsets are minimally supported. Java 7 provides a new class: java.nio.charset.StandardCharsets.

StandardCharsets does one thing. It lets you know what set of character sets is minimally supported in your JDK. The set is probably unchanged from Java 6 or Java 5 or even earlier. However, now you don’t have to read the documentation as carefully; the standard set is given to you. The Standardcharsets class explicitly enumerates the normal set for you.

Rocket science? No. But this welcome addition to the JDK was a long time in coming, and I’m glad to have found it.

VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Categories: Java Tags: , , ,

On Sale Already?

December 14th, 2011 joconner No comments

What, already on sale at Amazon? And it’s not even published yet! Java 7 Recipes will release sometime late December 2011.


VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java Tags: ,

Encoding Unicode Characters When UTF-8 Is Not An Option

September 27th, 2011 joconner No comments

The other day I suggested that you use UTF-8 to encode your Java source code files. I still think that’s a best practice. If you can do that, you owe it to yourself to follow that advice.

But what if you can’t store text as UTF-8? Perhaps your repository won’t allow it. Or maybe you simply can’t standardize on UTF-8 across the groups. What then? In that case, you should use ASCII to encode your text files. It’s not an optimal solution. However, I can help you get more interesting Unicode characters into your Java source files despite the actual file encoding limitation.

The trick is to use the native2ascii tool to convert your non-ASCII unicode characters to a \uXXXX encoding. After editing and creating a file contain UTF-8 text like this:

String interestingText = "家族";

You would instead run the native2ascii tool on the file to produce an ASCII file that encodes the non-ASCII characters in  \u-encoded notation like this:

String interestingText="\u5BB6\u65CF";

In your compiled code, the result is the same. Given the correct font, the characters will display properly. U+5BB6 and U+65CF are the code points for “家族”. Using this type of \u-encoding, we’ve solved the problem of getting the non-ASCII characters into your text file and repository. Simply save the converted, \u-encoded file instead of the original, non-ASCII file.

The native2ascii tool is part of your Java Development Kit (JDK). You will use it like this:

native2ascii -encoding UTF-8 <inputfile> <outputfile>

There you have it…an option for getting Unicode characters into your Java files without actually using UTF-8 encoding.


VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Categories: Java, Unicode Tags: ,

Best practice: Use UTF-8 as your source code encoding

September 22nd, 2011 joconner No comments


Software engineering teams have become more distributed in the last few years. It’s not uncommon to have programmers in multiple countries, maybe a team in Belarus and others in Japan and in the U.S. Each of these teams most likely speaks different languages, and most likely their host systems use different character encodings by default. That means that everyone’s source code editor creates files in different encodings too. You can imagine how mixed up and munged a shared source code repository might become when teams save, edit and re-save source files in multiple charset encodings. It happens, and it has happened to me when working with remote teams.

Here’s an example, you create a test file containing ASCII text. Overnight, your Japanese colleagues edit and save the file with a new test and add the following line in it:

String example = "Fight 文字化け!";

They save the file and submit it to the repository using Shift-JIS or some other common legacy encoding. You pick up the file the next day, add a couple lines to it, save it, and BAM! Data loss. Your editor creates garbage characters because it attempts to save the file in the ISO-8859-1 encoding. Instead of the correct Japanese text from above, your file now contains the text “Fight ?????” Not cool, not fun. And you’ve most likely broken the test as well.

How can you avoid these charset mismatches? The answer is to use a common charset across all your teams. The answer is to use Unicode, and more specifically, to use UTF-8. The reason is simple. I won’t try hard to defend this. It’s just seems obvious. Unicode is a superset of all other commonly used character sets. UTF-8, a specific encoding of Unicode, is backward compatible with ASCII character encoding, and all programming language source keywords and syntax (that I know) is composed of ASCII text. UTF-8 as a common charset encoding will allow all of your teams to share files, use characters that make sense for their tests or other source files, and never lose data again because of charset encoding mismatches.

If your editor has a setting for file encodings, use it and choose UTF-8. Train your team to use it too. Set up your ANT scripts and other build tools to use the UTF-8 encoding for compilations. You might have to explicitly tell your java compiler that source files are in UTF-8, but this is worth making the change. By the way, the javac command line argument you need is simply “-encoding UTF-8″.

Recently I was using NetBeans 7 in a project and discovered that it defaults to UTF-8. Nice! I was pleasantly surprised.

Regardless of your editor, look into this. Find out how to set your file encodings to UTF-8. You’ll definitely benefit from this in a distributed team environment in which people use different encodings. Standardize on this and make it part of your team’s best practices.

“Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.”

VN:F [1.9.22_1171]
Rating: 2.3/5 (3 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java, NetBeans, Unicode Tags: , , ,

User error using Java 7 Locale

September 19th, 2011 joconner No comments

Yesterday I pushed a blog entry about my experience with the Locale class in Java 7. As I experimented with the class, I discovered a new category enumeration:

  • Locale.Category.DISPLAY
  • Locale.Category.FORMAT

I learned some more about the Locale.setDefault methods that indicate a user error on my part. I stated that ResourceBundle is tied to the DISPLAY default locale. I made this assumption because all the Format subclasses do use the FORMAT category’s default locale setting. I inferred that ResourceBundle would do something similar — that it would use the DISPLAY default locale. It just isn’t true.

In some ways Java 7 has introduced the concept of 3 separate Locale categories: the two above and a system-wide category. It turns out that Locale.setDefault(aLocale) will reset both the DISPLAY and FORMAT defaults to the system-wide locale. However, a call to Locale.setDefault(Category.DISPLAY, anotherLocale) does not affect the system-wide locale.

It is interesting to note that various Format subclasses will use the FORMAT default locale if you don’t explicitly use a locale param in the their getInstance methods. Surprisingly (to me however), the ResourceBundle.getBundle method does NOT use the DISPLAY locale when you don’t provide an explicit locale parameter; it uses the system-wide default locale that was set at startup or the locale you’ve set with Locale.setDefault(aLocale).

So there you have it. My mistake. Hope this helps clarify.

Now that I’ve described the current behavior, I do have an opinion on this. For what it’s worth, my opinion is this: if the system default locale is used by ResourceBundle.getBundle, then other locale-sensitive classes should also use the system-wide default when you don’t explicitly provide a locale in their creation methods. The difference in how locale-sensitive classes use a default locale is confusing in its current Java 7 state. All methods that use a default locale instance should probably use the system-wide locale default.

Oh, and one other thing… If you’re going to have Locale “Categories”, you might as well introduce another: Locale.Category.ALL or Locale.Category.SYSTEM. When you call Locale.setDefault(ALL, aLocale), I would hope that the call would set the default for all other categories that exist now and in the future. Yes, I do realize that a call to Locale.setDefault(aLocale) already resets all the other category defaults, but for consistency’s sake, we need a Category.ALL. My argument for this is simple: it’s consistent and as a user I just expected to see it. When I didn’t, I became confused. I wrote some demo code, and it didn’t work as expected. Yes, it’s user error, but I just know some of you will make the same error if you haven’t already read about the differences here.


VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java Tags: ,

Bugs in Java 7 Locale or ResourceBundle?

September 18th, 2011 joconner No comments

While working on a chapter in an upcoming APress book, I was experimenting with Java 7′s Locale and ResourceBundle classes. Java 7 introduces two new Locale categories: DISPLAY and FORMAT. You can set the default locale for localizable user interface resources independently from the default locale for data Format subclasses. For example, you supposedly can set up a DISPLAY locale to display Spanish (es) resources for the user interface text but use American English (en-US) for date formats.

This does seem to work correctly the first time you load a ResourceBundle, but subsequent calls to getBundle seem to get stuck with previous DISPLAY locale settings. For example, take a look at the following code:

public void setCategoryDefaultLocale(Locale displayLocale, Locale formatLocale) {
    Locale.setDefault(Locale.Category.DISPLAY, displayLocale);
    Locale.setDefault(Locale.Category.FORMAT, formatLocale);   

public void demoDefaultLocaleSettings() {
    DateFormat df = DateFormat.getDateTimeInstance(DateFormat.SHORT,
    ResourceBundle resource = ResourceBundle.getBundle("mypackage.resource.SimpleResources");
    String greeting = resource.getString("GOOD_MORNING");
    String date = df.format(NOW);
    System.out.printf("DISPLAY LOCALE: %s\n",
    System.out.printf("FORMAT LOCALE:  %s\n",
    System.out.printf("%s, %s\n\n", greeting, date );

Now I call the two methods above in succession:

setCategoryDefaultLocale(Locale.forLanguageTag("es-MX"), Locale.US);
setCategoryDefaultLocale(Locale.US, Locale.forLanguageTag("es-MX"));

In the following output, notice that the first call to demoDefaultLocaleSettings does actually grab the es-MX resources for the DISPLAY locale, and it formats the NOW date using the en-US FORMAT locale. That’s exactly what I’d expect. However, the subsequent call (even after flushing the ResourceBundle cache), still loads the es-MX resources despite the explicit request for en-US resources. I expected that ResourceBundle would have loaded and retrieved “Good morning!” after the second call to setCategoryDefaultLocale.

¡Buenos días!, 9/18/11 1:59 AM

¡Buenos días!, 18/09/11 01:59 AM

I suspect that the real problem isn’t in the Locale.setDefault methods. Instead, I’m going to blame ResourceBundle for this until I can prove otherwise.

If you’ve seen this in the past, let me know. I’m coming up empty for now trying to resolve why I don’t get U.S. English in the second call. Let me know if you know anything about this! Maybe a bug? Maybe user error?

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java Tags: ,

Java 7 Locale Changes

September 13th, 2011 joconner No comments

I admit that I haven’t been particularly active in the i18n or Java standards bodies in the last few years. Pardon me, but I had a wild rumpus in a startup for almost 3 years, then joined Yahoo for a couple years, and BAM! … during that time, the Unicode Consortium added emoji, and the Oracle Java folks overhauled the once relatively simple java.util.Locale class among other things. That’ll teach me to turn my back on those folks.

You can almost forgive the Unicode people for their addition of emoji… I just know that Ken Lunde had something to do with it but haven’t been able to pin the whole issue on him yet. :) [Just kidding people, just kidding Ken!]

BUT more interesting to me right now…have you seen java.util.Locale lately. What the heck happened there? Java 7 introduces Locale.Builder, Locale.Category, and a Locale.forLanguageTag method. With Java 7′s support of BCP 47 (the best practices for language coding), the Locale got a beefy new Builder inner class to help with…well, to help with building BCP 47 compliant locales. More on this in a different post. With the BCP 47 support, I suppose you might need the forLanguageTag method too. Now here’s what stumps me though — that Category class!

As far as I can tell, this category is to help you differentiate a locale instance for two different purposes: user interface language and locale-sensitive data formats. The new categories are the following:


Do we really need a separate Category class to make that explicit?

Locale has another new method: setDefault(Locale.Category, Locale newLocale). I can’t wait to play with this a bit more. I suspect that this will set the default resource bundles (DISPLAY) for UI language and the default FORMAT for things like DateFormat, etc. However, I can’t understand why these two categories are actually needed in the platform libraries. Convenience maybe? You could always use two different Locale instances for this in the past, but I suppose this makes that easier? I’ll have to play around with these to find out more.

Come back again in a couple days after I look into this new Category and how it affects internationalization. I’m having fun discovering some of the new functionality and am eager to let you know what I find.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java Tags: ,

Recently installed Java 7 on Mac OS X

August 20th, 2011 joconner No comments

I recently installed Java 7 on my Mac OS X system. Although the installation went smoothly, I did run into one problem that might not be a big issue for you.

I didn’t go through the hassle of compiling it for myself. Instead, I opted to grab a precompiled JDK from here:

Specifically, I chose the OpenJDK 1.7 universal (32/64 bits) from Mac OS X branch download. Then I ran the /Applications/Utilities/Java to 1) pull both 1.7 versions to the top of the list, and 2) select those 1.7 versions, and 3) unselect the 1.6 JDK I also have installed.

NetBeans seems to run just fine. I’m able to create a new target JDK for the IDE, etc.

Now the problem: I tried running jrunscript from the command line to do some simple tests of new functionality in Java 7. I have found the simple jrunscript JavaScript interpreter to be an excellent way to quickly access Java classes for simple tests, confirmation of APIs, etc. Oddly, the tool didn’t run correctly despite being available in the JDK’s bin directory. Here’s the output of my attempt:

$ which java
$ which jrunscript
$ jrunscript
script engine for language js can not be found

Anyone ever see this before? Find a solution? Thanks in advance.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java, Mac OS X Tags: ,

IUC 34 submission was declined

September 19th, 2010 joconner No comments

Well, the IUC committee made its decision. Although I was initially a little disappointed, the committee declined my session proposal. That’s probably the best; the article was a rehash of an older subject that I already presented long ago. I refreshed the original article and thought I could recycle it. Ha…the IUC wasn’t fooled. The said “NO!”

That’ll teach me. Next time I’ll provide all original content, and I already know what I’ll propose. Timezone selection for web apps. OK, that’s easy you say. Easy peasy. Well, it’s really not. It turns out that it’s pretty easy to pick a locale and time format for a user on the network. But what time zone should you select when displaying time?

That question isn’t always easy to answer. Lots of factors come into play including the locale of the user, the location of the event, the primary location of the site. Which to choose?

I’ll get this written up if you think it’s an interesting subject. Let me know.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java, Unicode Tags: , ,

Understanding Locale in the Java Platform

November 14th, 2009 joconner No comments

Language and geographic environment are two important influences on our culture. They create the system in which we interpret other people and events in our life. They also affect, even define, proper form for presenting ourselves and our thoughts to others. To communicate
effectively with another person, we must consider and use that person’s culture, language, and environment.

Similarly, a software system should respect its users’ language and geographic region to be effective. Language and region form a locale, which represents the target setting and context for localized software. The Java platform uses java.util.Locale objects to represent locales. This article describes the Locale object and its implications for programs written for the Java platform.

Have a look. It’s an older article, but still perfectly valid and useful: Understanding Locale in the Java Platform.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Java Tags: , ,