International Domain Names


The Java SE 6 release provides an interesting new class: It’s small, simple…very focused on a single task. That task has two parts:

  1. to convert domain names from practically any Unicode character to an ASCII Compatible Encoding or ACE.
  2. to convert ACE names back into their full Unicode UTF-16 encoding

To support these two operations, not surprisingly, the class has two static methods:

  1. toASCII
  2. toUnicode

The toASCII method converts its non-ASCII Unicode characters to an ACE form using an algorithm called punycode. Yeah, I snickered at the name too. The results are always surprising, but don’t worry…it’s well defined enough that it produces the same results repeatedly. So, for example, if you want to use the domain name 日本語.jp, the toASCII method would produce the ACE equivalent of The toUnicode method returns the ACE name back to its original form.

So why do you need this? It turns out that the internet’s core infrastructure, including domain name servers and name resolvers just don’t handle non-ASCII characters very well. At the very least, it’s safe to say that they don’t purposefully support non-ASCII characters. However, people want the bigger Unicode character range for their name names. So, RFC 3490 allows for internationalized, Unicode names…but with a hitch. We have to pass ACE names to the infrastructure DNS and name resolvers. Your apps can display 日本語.jp, but those same apps have to convert to ACE when they pass the name off to DNS, etc. So that’s it. That’s why is useful.

Java SE 6 has several new internationalization features. IDN support is just one. To read more about this and other new i18n features, take a look at the article International Enhancements in Java SE 6.

Leave a Reply

Your email address will not be published. Required fields are marked *