Software internationalization
|
Sunday Mar 30, 2008
NetBeans 6.1 uses UTF-8 encoding for source files
I'm always happy when a company or product adopts Unicode as its charset. I think it makes perfect sense to do so. There are lots of good reasons why standardizing on Unicode is the right thing:
I was pleased to see that NetBeans 6.0 and the 6.1 beta uses the UTF-8 encoding (a Unicode encoding) as its default for project configuration and source files. The following figure shows the default setting in the project's property sheet:
This makes it much easier to edit non-ASCII, non-English source and property files. You can type text in any supported Unicode script right into Java source code. A legitimate usage would be comments or even localizable text in javac -encoding UTF-8 YourSource.java Despite the potential benefits of this, NetBeans 6.1 still doesn't support this correctly in my humble opinion. Why not? Well, the biggest reason is simple: file corruption and permanent data loss. Ouch! Let's take a simple "Hello, world!" example in Japanese. This is simple for NetBeans because of the UTF-8 encoding. The NetBeans editor even displays it correctly as shown here:
Unfortunately, the joy of this discovery was short lived when I discovered how easy it is to corrupt this data. Feel like experimenting with the charset encoding? Surely someone will. I suspected what would happen, so I didn't do this with any substantial code base...but someone will. I sure hope they use version control software. Reopen that project property sheet, select another encoding, say
Some of you, the super careful, nit-picky ones will now argue with me, "But John, you haven't really lost anything yet. 8859-1 and CP 1252 don't have those characters, but the original byte values are still entact. You can get them back in this example." OK, I concede the point. But now I'll show you some serious data loss, no messing around this time. Instead of
Now that's just not good. Did NetBeans save the file correctly? Sure. However, NetBeans can do better than this. I would argue that if NetBeans knows that the target encoding does not support the source encoding, it should at least warn the user that the resulting file will contain garbage characters and that parts of the file will be lost--permanantly in many cases. So, just in case anyone over there in the NetBeans developer group can hear me...you have to fix this. Yes, I know it's a silly mistake for someone to do this, but NetBeans can help them avoid the problem. Just provide a warning dialog, "Saving this file in the target encoding will cause data loss because the target encoding does not support all characters in this file or project." Keep the encoding feature, just perfect it by helping some users avoid this costly mistake. The fact is that most software developers still don't understand character sets and encodings, and this is just an accident waiting to happen. On a personal note: I really love NetBeans. And I hope this blog qualifies me for the NetBeans 6.1 blogging contest! I could probably file this under the "suggestions on how to enhance NetBeans 6.1" category. Posted at 02:43AM Mar 30, 2008 by John O'Conner in NetBeans | Comments[2] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hi John!
IMHO, this is not such a big deal - I will give you some reasons (more "reasoning" than "reasons") for that.
If you make a "real" project in a single language, you do not switch to encoding that does not support all chars you need (unless you do not know what you are doing). You simply switch to some known encoding in desired locale or you leave utf-8. And you do it in the beginning - as a part of the project planning.
If your project is intended to be international, you would probably use utf-8 (unless you have strange reasons, like that you really love to work a lot more than necessary and so you want to set the encoding for every language mutation;)).
However, I have changed your enhancement to DEFECT (P4), as I think that uninformed data loss is a defect. I have also reassigned it to i18n module
Petr Dvorak (Joshis)
[Prague]
Posted by joshis on April 23, 2008 at 02:18 AM PDT #
Posted by joshis on April 23, 2008 at 02:19 AM PDT #