I reported that NetBeans 6.1’s project charset encoding feature would allow an unsuspecting user to destroy file data. That’s still true…through no fault of NetBeans really. It’s just a matter of fact — if you start out with UTF-8 and convert your project files to ASCII or ISO-8859-1 or any other subset of Unicode, you will lose any characters that are not also in the target charset.
You’d think this sort of problem would be resolved by now, but it’s not. It’s still almost impossible to quickly and easily migrate an application from the too common default Latin-1 to UTF-8 character set encoding. The problem isn’t that UTF-8 can’t handle the conversion. No, that’s definitely not it. UTF-8 can represent any Latin-1 character and much, much more. The problem is that the Latin-1 charset is so deeply ingrained as the default in every software interface that you just have so many faulty conversion points. A conversion point is a handoff point between one software component and another, a place where character encodings matter and where faulty conversions are way too common.
Last week a friend asked an interesting web-based localization question. He surprised me with it. I wish that I had considered it before, but I had not. In my less than complete analysis of his problem, I found a solution, but I don’t know if he’ll like it. Hell, I barely like it myself, but it’s what I came up with quickly.
Here’s the question:
So here are a couple options: