Unicode Characters and Alternative Glyphs

February 13th, 2013 joconner No comments

Smiley face

Unicode defines thousands of characters. Some “characters” are surprising, and others are obvious. When I look at the Unicode standard and consider the lengthy debates that occur when deciding upon whether a character should be included, I can imagine the discussion and rationalization that occurs. Deciding on including a character can be difficult.

One of the more difficult concepts for me to appreciate is the difference between light and dark (or black and white) characters. A real example will help me explain this. Consider the “smiley face” characters U+263A and U+263B:  ☺ and ☻. These characters are named WHITE SMILING FACE and BLACK SMILING FACE respectively.

These are not the only characters that have white and black options. Dozens of others exist. There are even white and black options for BLACK TELEPHONE and WHITE TELEPHONE.

Of course, once these characters go into the standard, they should stay. One shouldn’t remove existing characters. However, a serious question does arise when considering WHITE and BLACK options for a character.

The question I have is this: Why? Why isn’t the white and black color variation simply a font variation for the same character. The Unicode standard clearly states that it avoids encoding glyph variations of the same character. That makes a lot of sense. However, in practice, the standard at least appears to do exactly the opposite for many characters. I can only guess that someone on the standards committee made a very good, logical and well-supported argument for the character differentiation.

My hope for future versions of the standard is that these kind of color variations will be avoided. Not being on the committee when these characters were added, I cannot really complain. And I hope that my comments here don’t come across that way. However, in the future, I’d like the standard to include annotations for these characters that describe why they deserve separate code points. It certainly isn’t clear from the existing character’s notes, and I’m sure that others would be curious about the reasons as well.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Unicode Tags: ,

Standard Charsets in Java 7

February 1st, 2013 joconner No comments

Once in a while I poke my nose through the release notes of new Java releases. It’s not a particularly rewarding activity, but this time I did find something interesting. Oddly enough, it was interesting for what it did NOT say. I was surprised, so I thought you might want to know about a new class that is now available and quietly overlooked in any release notes.

Character sets have their own class representation in Java: Charset. You can use the Charset class to identify a character set for encoding or decoding. To create a Charset object, you use a factory method: Charset.forName(String charset). The uncomfortable trick to using this method is that you must be prepared to catch an exception if the JRE doesn’t actually supply the requested character set. Bummer.

I’ve always wondered why the JDK allows a random string as the parameter. I suppose it was for convenience…to allow the JDK to be updated over time with new charset support without having to change any API or enumeration. That’s understandable. But not really knowing what minimal set of character sets is supported in a particular JDK is somewhat…unnerving…especially to an engineer just trying to get his/her work finished.

The JDK documentation was always clear on what character sets you could absolutely depend on to be present. That was helpful and much needed. At least an observant developer could depend on that. However, the JDK now provides a more robust and useful way to identify which charsets are minimally supported. Java 7 provides a new class: java.nio.charset.StandardCharsets.

StandardCharsets does one thing. It lets you know what set of character sets is minimally supported in your JDK. The set is probably unchanged from Java 6 or Java 5 or even earlier. However, now you don’t have to read the documentation as carefully; the standard set is given to you. The Standardcharsets class explicitly enumerates the normal set for you.

Rocket science? No. But this welcome addition to the JDK was a long time in coming, and I’m glad to have found it.

VN:F [1.9.22_1171]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Categories: Java Tags: , , ,

iOS vs Android

January 14th, 2013 joconner 1 comment

Yesterday someone told me that Google’s Android devices have shipped more units than iOS devices in Q3 and Q4 of 2012 — I will check and recheck my source on this. That’s a big claim, but seems plausible considering that Android ships on a lot more than your basic tablets. Android is in a lot of things, including smart televisions and many manufacturer’s smart phones.

It leaves me wondering…has Android finally got the momentum to dominate the small device market, smart phones, etc. More importantly to me, does it have developer interest?

I’m convinced that a successful computing platform for tablets and phones must serve two consumers. First, those customer-consumers that buy the devices and use them day to day must be happy with the usability and overall fitness of the OS. Second, the developer-consumer must be convinced that the platform is easy to develop for. The OS and platform tool chain must be robust and complete. Otherwise, developer interest fades quickly. Without developers, you simply don’t have those random, goofy, hacked apps that seed a market. Without those apps, customer-consumers don’t have any motivation to discover a newer platform.

I’ve finally made my own choice though…my choice about which platform I’m going to develop for. There’s no doubt that I’m fascinated by Android. So fascinated, in fact, that I suspect that many future posts will be devoted to Android. However, I’m going to use a couple tool sets. Of course, I’m going to write native applications in Java, but I’m also going to try something relatively new for me. I’m going to look at… PhoneGap. I only know the idea behind PhoneGap, and I haven’t tried to develop with it yet. However, that’s going to change too.

You know what’s the best thing about PhoneGap? Ready? It’s that you write your application once using PhoneGap APIs, and that application should now run on multiple platforms. I’ve always been fascinated and pulled into this promised of write once and deploy to many devices. Well, PhoneGap promises that. Interestingly, it’s taking on the same job that Java did a long time ago. Oddly, though, I recall that Java never quite made it onto the desktop. And instead of being the common language of all these devices, the common language seems to be descriptive HTML, CSS, and JavaScipt. Of course, my curiously can’t be satisfied easily. I’m going to give both a try: a native Android application and a PhoneGap one. Which will be easier to use? 

I suspect that the native application will be the snappiest, most desired application. However, I want to be pleasantly surprised by Phone Gap. In fact, I’d like to be so delighted with Phone Gap that I give up my other toolsets. In addition to having a single src code that works for both iOS and Java, I’d love to have that single tool chain that can help me develop apps across platforms too!

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Uncategorized Tags:

Still Can’t Use Apostrophes? Really?

December 10th, 2012 joconner No comments

Answer this for me. Why in the world are we still preventing very common characters from name fields in online forms, in bank account applications, in insurance forms…tax returns? Why? 

In 2012, many companies have adopted Unicode in their backend databases. But what’s wrong with their development teams that prevent them from allowing customers to spell their names correctly in their application’s user interface? I live in California. We have LOTS of hyphenated names, names with accents, names with apostrophes. There really is no excuse for preventing users from spelling their names online in the the same way that they spell them on paper.

At this point I’m just irritated. At one point I thought I could just tell people how to fix these things. Then I thought I could occasionally blog about it — thinking the word would get out slowly. Well, I suppose if it is working at all, the message is getting out slower than anticipated. I never had delusions that an i18n blog would be generally popular with the masses. This isn’t a soap opera or Hollywood expositor after all. However, you might thing that common sense would just spread, that it would simply be absorbed across the web. It ‘aint so.

Look, if you are a software developer and have ANY influence on how your company provides its input or signup forms online, can you do me a favor? Can you remember that some people have names that actually have an apostrophe or hyphen or n-with-an-accent-grave? You can easily parse these fields; you can check against sql attacks etc that use interesting characters to turn databases into mush. We have the technology people. Let’s consider what might happen if we use it.

All the best,

John O’Conner (note the apostrophe)

VN:F [1.9.22_1171]
Rating: 4.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

In Memory of Bill Hall

July 27th, 2012 joconner No comments

I met Bill Hall sometime in 1993 or 1994 when we both worked at Novell. He was already a well-known software engineer, consultant, and internationalization guru. As a recently minted college graduate, I adopted Bill as my mentor. Like many mentors, Bill probably never knew this. And yet, he mentored me for more than 16 years in this globalization industry. He is the one, the only one, that I thought knew all there is to know.

Bill Hall could laugh until tears came to his eyes, and he could look at you and freeze a moment with you as if you were the only person in the world that mattered. Later in his life, his eyes would water for no good reason, except that maybe he was just getting older, and allergies or maybe just life itself had squeezed most of Bill out.

When Bill wasn’t talking about internationalization or piloting, he always spoke of his wife and children. I met his wife Ewa and one of his children, Kasia. They and my own wife toured around Tokyo one year long ago while Bill and I spoke at a conference or just happened to be in Tokyo together. I suppose the event doesn’t really matter; it was a long time ago. Kasia must be a junior or senior in college now….wow, time flies.

Just today, I received an email from Kasia, a personal email telling me that her father had died. I’ve since discovered that many others in the internationalization and globalization industry have received a similar but different email or notification from Kasia. At least a dozen other people that I know received those personal emails that said something that only Bill and you would know, something that Bill shared with you, and I got one of those from Kasia. After the initial shock of learning of his death, I couldn’t help but smile. Kasia had sent out an email to me, just to me, and it was personal, and I realized that his dear daughter had inherited Bill’s way of reaching out to people one on one, making them feel as if you were the only person in the world that mattered.

Thanks, Bill for your friendship, for your knowledge, and for what you’ve given our industry. We already miss you.




VN:F [1.9.22_1171]
Rating: 4.8/5 (4 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Uncategorized Tags:

Reactions as Another Aspect of Social Media

June 16th, 2012 joconner No comments

One of the new trends in making web content more social is the recording of reader impressions or reactions. For example, I just read an article about Father’s Day and the article included a poll that allowed me to quickly provide my response or impression of the content. The poll wasn’t a questionnaire that I’d never take the time to fill out. Instead, it was just a few buttons or image maps that require a single click:

Article impression

What’s interesting about this is that not only do I get to enjoy the article content, but I also get an indication of how others perceive or respond to the content — obviously making the content more social. What a great idea! 

Another interesting part of this to me is the choice to keep the response anonymous and aggregated. The above image, for example, shows the categories of reader response but doesn’t tell me exactly who responded in any of the categories. Certainly it would be possible, especially if this were tied to Facebook or Google Plus, to see what my friends or colleagues think about the content too.

I wonder whether the anonymity preference is specific to US English readers. As I think about it, I’m happy to participate in the poll, but I might not want to make my specific opinion public knowledge. I wonder if other cultures would feel differently in general? What groups of people would feel more open to expressing opinions publicly and associating their real or online identities to their response?

Oh, my response to this particular article was “THINK”…but not about the article content. Instead, the article and the poll made me think about changes in social media. Every time I think we’ve tapped our creative juices out, somebody thinks of something new and impressive to make the online world more social. 

VN:F [1.9.22_1171]
Rating: 1.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: -1 (from 1 vote)
Categories: Web Tags: , ,

Just another suggestion for language selection lists

June 1st, 2012 joconner No comments

Recently I was asked to fill in an online questionnaire. As I began the form, an entire window was shown to me, and it had a single UI item on it, a language selection list. Of course, I had to click on the list, wondering what wonderful choices I might have for a user interface. Surprisingly, the list revealed a single entry: English.


If you sometimes read my blog, you’ll know that I’m particularly interested in language lists at the moment. However, if you are interested in providing a list, I do have one additional suggestion…if you don’t have any choices, you probably shouldn’t use a selection list. It’s just too much of a bummer when nothing else is available and just doesn’t make much sense. Maybe disable it until your UI does provide additional languages.

That’s it for today. 

VN:F [1.9.22_1171]
Rating: 1.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 3 votes)
Categories: Globalization, Web Tags:

Internationalization & Unicode Conference 36 Call for Papers

April 1st, 2012 joconner No comments

The IUC 36 call for papers went out last week – http://www.unicodeconference.org/e/IUC36-CfP-03-29-12.htm

This conference event brings together the best minds, ideas, and practices in the worlds of internationalization and localization,  There’s content sessions to please everyone including technical engineers, project managers, and product managers.

The Program Committee is requesting proposals for presentations. Check out the website for details, but some of the general areas are the following:

  • Application Areas
    • Social Nets
    • SEO
    • Websites and web services
    • Libraries and educatoin
    • IDN
    • Mobile and Tablets
    • Security
    • Machine Translation
  • General Techniques
    • i18n libraries
    • bidirectionality and scripts
    • html5
    • Data formats: json, xml,
    • project mgmt
    • font dev
  • Culture and Tech
    • Endangered languages
    • Unencoded languages
    • Case studies
    • ISO language tag issues
  • Regional Considerations
    • Africa, Asia, Middle East
    • Locales and CLDR
    • Emoji Support … sigh…

If you think you might want to present something new and exciting that you have been working on, consider presenting at the conference. Read the above link to find out more.

One last thing. Check out that Gold Sponsor!







VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Uncategorized Tags: ,

Language Signals on the Web

February 8th, 2012 joconner No comments


Presenting a user interface in the customer’s language should be a high priority from your product management team. If not, they’re not doing their job in my opinion. Assuming you have the feature in your product roadmap, how do you choose the UI language of your customer on the web. After all, web applications have multiple, sometimes conflicting language signals.

A language signal is an indicator that gives your application a hint of your customer’s preferred language. In a web application, these signals are numerous. To help you in choosing from all these signals, I believe you should honor the preferences in the following priority. That is, check each signal for its existence in this order, and use the first signal that is available:

  1. query parameters, for example http://example.com?lang=fr
  2. domain name or path parameters, i.e. http://fr.example.com or http://example.com/fr
  3. persistent application preferences
    • cookies
    • customer profile or settings
  4. browser accept-language headers
  5. geolocation hints
  6. default application language

Query Parameters

Query parameters are often used to override every other language or application signal. If parameters are used, your customer (QE engineers or even end users) are intentionally trying to coerce the application into ignoring all other language signals. Query parameters beat out any other language signal when they are provided in the same request.

Domain name or Path Parameters

Sometimes you will partition your localized sites by domain name or by language tag paths. A domain name partition means that you select different or even localized domain names for specific markets. For example, your French site could be http://fr.example.com. You can also distinguish language preference on the path like this: http://example.com/fr or http://example.com/en-gb. When query parameters don’t exist, this is the next choice in our prioritization.

Persistent Settings

Of course, if your application has allowed the user to select a language preference, the application should honor that preference. The preference may be stored in a cookie or even in a user profile attribute on the server.

Accept-Language Header

Most browsers provide a list of user language preferences in each request. These languages are provided in request headers as values of the accept-language attribute. This attribute can have 1 or more language codes, and they indicate the priority of the user when requesting content. In the absence of other signals, your application should respond to the accept-language header.

Geolocation Hints

The last signal that actually provides information about the user is the geographic location from which the user is accessing your content. Although imperfect and imprecise, geography can provide a hint to your customer’s language preference. It’s definitely not the best indicator because multiple languages can be spoken in any geographic location. In a pinch, though, you may be able to provide a language selection tool that provides a list of the most prominent languages spoken in a specific area of the world.

Default Application Language

Finally, when all else fails and there have been no other indicators, you can provide the UI in the default language of the application. If your company is in Germany, maybe the default is German. If it’s the U.S., your default language is most likely English…or maybe even Spanish. You have to display the application in some language, and the default at this point is your last option.

In Summary

To summarize, a web application can serve a global audience. In doing so, it may accommodate customers in a variety of languages. Your application’s user interface may be selected from numerous possibilities, numerous signals from the user. Those signals are important data points to consider when making the language choice to present to the user. Using the signals described in this article, you’ll be able to consider some of the more important language preference indicators. Follow the prioritization I’ve outlined here, and you’ll make the right language choice most of the time…until you don’t. And there will be times when you don’t make the right choice from all these signals. When that happens, and it will happen, you have to give your users some way to indicate that problem. Take a look at my previous blog entry about language selection widgets for help with that.



VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Unicode 6.1 Released

February 1st, 2012 joconner No comments


The Unicode Consortium announced the release of Unicode 6.1.0 yesterday. The new version adds characters for additional languages in China, other Asian countries and Africa. This version of the standard introduces 732 new characters.

In addition, the standard also added “labels” for character properties that will supposedly help implementers create better regular expressions that are both easier to read and easier to validate. I admit little knowledge about these labels at the moment, but will research and report on them in the future if time allows.

One of the oddities of the new version is the inclusion of 200 emoji variants. This is perhaps the only issue of the standard that I just don’t understand. Back in the day when I was more involved in Unicode development, we had a huge effort to unify variants of Chinese characters. We preached that Unicode characters were abstract entities with glyph renderings that were determined by font, style preferences of developers and apps. Now it appears that the Unicode consortium has changed its position on this.  Or maybe partially?. The addition of 200 emoji “variants” just seems unnecessary, but that’s just my opinion and I admit that I may not know all the issues that formed the consortium’s decision.

We have some examples, straight from the announcement, that show only 4 of the 200 new emoji variants:

Emoji tents

As the image shows, the “TENT” emoji has two variants — a text style and a more colorful, graphical emoji style. The standard defends these variants by saying that it allows implementations to distinguish preferred display styles. I think that is what fonts are for. Personally, I just don’t think variants are needed. And, I think that the variants make things more difficult for applications.

What do you think about variants in general? And what about emoji variants specifically?


VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Fonts, Standards, Unicode Tags: , ,