Archive

Posts Tagged ‘internationalization’

Unicode Characters and Alternative Glyphs

February 13th, 2013 joconner No comments

Smiley face

Unicode defines thousands of characters. Some “characters” are surprising, and others are obvious. When I look at the Unicode standard and consider the lengthy debates that occur when deciding upon whether a character should be included, I can imagine the discussion and rationalization that occurs. Deciding on including a character can be difficult.

One of the more difficult concepts for me to appreciate is the difference between light and dark (or black and white) characters. A real example will help me explain this. Consider the “smiley face” characters U+263A and U+263B:  ☺ and ☻. These characters are named WHITE SMILING FACE and BLACK SMILING FACE respectively.

These are not the only characters that have white and black options. Dozens of others exist. There are even white and black options for BLACK TELEPHONE and WHITE TELEPHONE.

Of course, once these characters go into the standard, they should stay. One shouldn’t remove existing characters. However, a serious question does arise when considering WHITE and BLACK options for a character.

The question I have is this: Why? Why isn’t the white and black color variation simply a font variation for the same character. The Unicode standard clearly states that it avoids encoding glyph variations of the same character. That makes a lot of sense. However, in practice, the standard at least appears to do exactly the opposite for many characters. I can only guess that someone on the standards committee made a very good, logical and well-supported argument for the character differentiation.

My hope for future versions of the standard is that these kind of color variations will be avoided. Not being on the committee when these characters were added, I cannot really complain. And I hope that my comments here don’t come across that way. However, in the future, I’d like the standard to include annotations for these characters that describe why they deserve separate code points. It certainly isn’t clear from the existing character’s notes, and I’m sure that others would be curious about the reasons as well.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Unicode Tags: ,

Standard Charsets in Java 7

February 1st, 2013 joconner No comments

Once in a while I poke my nose through the release notes of new Java releases. It’s not a particularly rewarding activity, but this time I did find something interesting. Oddly enough, it was interesting for what it did NOT say. I was surprised, so I thought you might want to know about a new class that is now available and quietly overlooked in any release notes.

Character sets have their own class representation in Java: Charset. You can use the Charset class to identify a character set for encoding or decoding. To create a Charset object, you use a factory method: Charset.forName(String charset). The uncomfortable trick to using this method is that you must be prepared to catch an exception if the JRE doesn’t actually supply the requested character set. Bummer.

I’ve always wondered why the JDK allows a random string as the parameter. I suppose it was for convenience…to allow the JDK to be updated over time with new charset support without having to change any API or enumeration. That’s understandable. But not really knowing what minimal set of character sets is supported in a particular JDK is somewhat…unnerving…especially to an engineer just trying to get his/her work finished.

The JDK documentation was always clear on what character sets you could absolutely depend on to be present. That was helpful and much needed. At least an observant developer could depend on that. However, the JDK now provides a more robust and useful way to identify which charsets are minimally supported. Java 7 provides a new class: java.nio.charset.StandardCharsets.

StandardCharsets does one thing. It lets you know what set of character sets is minimally supported in your JDK. The set is probably unchanged from Java 6 or Java 5 or even earlier. However, now you don’t have to read the documentation as carefully; the standard set is given to you. The Standardcharsets class explicitly enumerates the normal set for you.

Rocket science? No. But this welcome addition to the JDK was a long time in coming, and I’m glad to have found it.

VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Categories: Java Tags: , , ,

Still Can’t Use Apostrophes? Really?

December 10th, 2012 joconner No comments

Answer this for me. Why in the world are we still preventing very common characters from name fields in online forms, in bank account applications, in insurance forms…tax returns? Why? 

In 2012, many companies have adopted Unicode in their backend databases. But what’s wrong with their development teams that prevent them from allowing customers to spell their names correctly in their application’s user interface? I live in California. We have LOTS of hyphenated names, names with accents, names with apostrophes. There really is no excuse for preventing users from spelling their names online in the the same way that they spell them on paper.

At this point I’m just irritated. At one point I thought I could just tell people how to fix these things. Then I thought I could occasionally blog about it — thinking the word would get out slowly. Well, I suppose if it is working at all, the message is getting out slower than anticipated. I never had delusions that an i18n blog would be generally popular with the masses. This isn’t a soap opera or Hollywood expositor after all. However, you might thing that common sense would just spread, that it would simply be absorbed across the web. It ‘aint so.

Look, if you are a software developer and have ANY influence on how your company provides its input or signup forms online, can you do me a favor? Can you remember that some people have names that actually have an apostrophe or hyphen or n-with-an-accent-grave? You can easily parse these fields; you can check against sql attacks etc that use interesting characters to turn databases into mush. We have the technology people. Let’s consider what might happen if we use it.

All the best,

John O’Conner (note the apostrophe)

VN:F [1.9.22_1171]
Rating: 4.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Providing a Language Selection List

January 14th, 2012 joconner No comments

Lang listThe questions pop up often enough in internationalization circles so I’ll address them here:

1. how should I localize a language selection list?
2. how should I sort it?

Localized Language Lists

Customers use a language selection list to change languages. You must assume that the currently selected language is inappropriate for some reason. One possible reason is that the user cannot read the page content. This means that a localized language list that displays all target language options in the language of my current page probably won’t be understood. For example, let’s pretend that I speak English but am on a Japanese web site. Presenting me with language options that include ” 英語” will not help me if I don’t read Japanese. I can’t be expected to know that those characters mean “English” to a Japanese person. This option is unhelpful. What is helpful?

The right way to represent any language selection list is to display languages in their own language and script. For example, English should be English, Spanish should be Español, Japanese is 日本語, etc. You don’t need to localize this list into every language. One list using the target language’s own language and script for each language choice is both sufficient and optimal. This guarantees that I’ll be able to read and select my target language regardless of the current page’s language setting. This is the most universal option you have, and I consider this a best practice for creating language selection lists.

Sort Order

I don’t know how better to prepare you for my answer…so here it is. The actual sort order is less important than consistency. Two points make this obvious to me:

1. if your customer wants to choose a different language, they probably don’t speak the current one, and the current language’s sort rules won’t be particularly useful anyway.
2. you can’t accurately guess what language rules you should use because you don’t know which language the user will select.

With these points in mind, I don’t think the sort order matters. Correct linguistic sorts for this list are not critical, and anything you choose will be inconvenient to someone. For this reason, I think you simply have to choose a sort order and be consistent every time you show the language list. My suggestion is that you simply order the list in U.S. English order if you consider U.S. English to be the base language of your product. If you consider your base language to be something else, use that. My point is that it doesn’t matter. Sure, you’ll be tempted to provide this list in the sorted order of the language of the current page or host OS setting. Sure that’s an option, but it’ll be incorrect more often than correct when it’s needed. Save your sanity. Choose an order. Be consistent. Don’t worry about localizing this order.

So this sort issue is bothering you still? You just can’t accept it? Ok, that’s fine, but consider this. The solution I’ve described is already used by some pretty big players. Since I just finished evaluating Facebook, let’s use it as an example. Regardless of which localized site I visit, regardless of my browser language preferences, Facebook shows the same list of languages in the same sort order. They don’t even use a US English sort. Their choice is something different, something almost like a US English sort, but maybe using the Romanized version of the target language? Here’s an example — Japanese is sorted with other languages that start with an “N” sound. The Japanese pronunciation is romanized as “nihongo”. So, “nihongo” starts with an n and sorts with other languages that start with n? I can’t quite figure the sort rules out BUT that’s my point…it doesn’t matter. Its consistent every time I go to the page, and it works. Here’s a shot of that Facebook page:

Fb lang selection

Conclusion

Providing a language selection list for your multilingual product is a great idea. It lets customers conveniently change the UI language of the product. Don’t over-think this problem. You can provide this feature without spending countless hours of debate. Follow my suggestions:

  1. Provide a single language selection list in which each entry is translated into the target language and script.
  2. Choose a sort order, any order. Be consistent in displaying this order.

Have a suggestion or comment? I’d enjoy hearing from you.

 

 

 

VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

Job Post: Localization Engineer in San Jose, CA

January 4th, 2012 joconner No comments

Please comment on this post or email me if you are interested in this localization engineer position. I’ll put you in contact with the recruiter.

Direct from the recruiter:

The requirement is for a Localization Engineer in San Jose working for one of the leading Web Commerce companies.  They are looking for more than the typical Localization Engineer as they need someone who has done some programming, not just scripting.   I have attached a description and also listed below some things the manager said they have been missing in the candidates so far.

What I’ve been missing so far in candidates is

-        Experience with web localization (more a focus on i18n engineering, but then without real coding skills), and enterprise localization tools (e.g. WorldServer)

-        Significant modern coding skills (going beyond a simple VBA macro or Perl script), e.g. ability to write some Java application, understand/fix/enhance existing code, or a simple plug-in against a documented SDK (more than just passive knowledge of Java, etc)

-        Ability to clearly articulate concepts or thoughts, describe processes

 

If this sounds like a good fit, I can email you more details. Good luck!

//John

 

VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Job Post: Sr International Product Manager

January 4th, 2012 joconner No comments

Recently a recruiter let me know about this position. If you are interested, please email me (john at joconner dot com) or leave a comment. I’ll send you all the contact information and details. Or of course, you could just go straight to the web site yourself…either way. But, in my opinion, it always helps if you have an internal lead!

Here’s the job description, straight from the internal recruiter:

We are currently looking for a Sr International Product Managers for our Cobalt (www.cobalt.com ) office in the Seattle area.

The selected candidate will lead the efforts around globalizing our core automotive platform and requires

- Strong internationalization, globalization, and localization experience
- International rollout and deployment experience
- Experience in a B2C environment
- Experience working across countries and cultures

This is a FULL TIME with benefits that start day one.

Good luck!

//John

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: jobs Tags: ,

Unicode support doesn’t mean your application is internationalized

October 6th, 2008 joconner No comments

Over the years, I’ve helped many organizations internationalize their software products. One of the most common misunderstandings is how Unicode will help their product. Customers sometimes mistakenly believe that Unicode support will be sufficient to internationalize their products. Sometimes they believe that Unicode “support” is a single, yes-no, on-off ability, when instead Unicode support is typically implemented in various stages and levels.

Unicode is a character encoding standard. It’s a big standard, with lots of nuances. Your products can implement “Unicode support” in many ways. The result is that those products will be able to manipulate, process, store, and perhaps even display the world’s scripts in a variety of ways BUT not usually in all ways. Your product’s ability to support Unicode is not a binary ability; instead, you should understand that products can have “Unicode support” in a variety of levels. In the most simple case, your product might only store and retrieve Unicode characters correctly. At a more sophisticated level, your product may be able to sort, search, or display Unicode characters. Again, Unicode “support” in a product cannot be evaluated by a single check-box or yes-no answer. Typically, products support Unicode in some ways but not in others.

Implementing even the most sophisticated levels of Unicode support doesn’t mean your product is internationalized. Internationalization is the process of preparing a software code base to be easily localized. Internationalization creates a product that has no particular bias towards a single culture or language. That product can be localized for a specific culture. Unicode support can be a key component of an internationalization effort, but it is only one component. Like Unicode support, your internationalization support will have different levels of sophistication and ability.

To summarize, products can support Unicode in a variety of ways. Supporting Unicode does not usually mean that your product has the ability to perform every possible function on Unicode characters. Instead, “support” usually means that you can do some things with Unicode but probably not others. Additionally, supporting Unicode isn’t the only step to internationalize your products. Unicode is only one step, an important step. Internationalization is the process of creating a product that is easier to localize, one that has cultural biases removed so that a specific culture or locale can be supported more easily after localization. You might use Unicode as a step in your internationalization efforts, but Unicode itself doesn’t create an internationalized product.

Contact me or leave a comment if you have questions about how Unicode can help your product. If I can help, I will. If I can’t, I probably know someone who can.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Unicode Tags: ,