Monthly Archives: February 2012

Language Signals on the Web

Languages

Presenting a user interface in the customer’s language should be a high priority from your product management team. If not, they’re not doing their job in my opinion. Assuming you have the feature in your product roadmap, how do you choose the UI language of your customer on the web. After all, web applications have multiple, sometimes conflicting language signals.

A language signal is an indicator that gives your application a hint of your customer’s preferred language. In a web application, these signals are numerous. To help you in choosing from all these signals, I believe you should honor the preferences in the following priority. That is, check each signal for its existence in this order, and use the first signal that is available:

  1. query parameters, for example http://example.com?lang=fr
  2. domain name or path parameters, i.e. http://fr.example.com or http://example.com/fr
  3. persistent application preferences
    • cookies
    • customer profile or settings
  4. browser accept-language headers
  5. geolocation hints
  6. default application language

Query Parameters

Query parameters are often used to override every other language or application signal. If parameters are used, your customer (QE engineers or even end users) are intentionally trying to coerce the application into ignoring all other language signals. Query parameters beat out any other language signal when they are provided in the same request.

Domain name or Path Parameters

Sometimes you will partition your localized sites by domain name or by language tag paths. A domain name partition means that you select different or even localized domain names for specific markets. For example, your French site could be http://fr.example.com. You can also distinguish language preference on the path like this: http://example.com/fr or http://example.com/en-gb. When query parameters don’t exist, this is the next choice in our prioritization.

Persistent Settings

Of course, if your application has allowed the user to select a language preference, the application should honor that preference. The preference may be stored in a cookie or even in a user profile attribute on the server.

Accept-Language Header

Most browsers provide a list of user language preferences in each request. These languages are provided in request headers as values of the accept-language attribute. This attribute can have 1 or more language codes, and they indicate the priority of the user when requesting content. In the absence of other signals, your application should respond to the accept-language header.

Geolocation Hints

The last signal that actually provides information about the user is the geographic location from which the user is accessing your content. Although imperfect and imprecise, geography can provide a hint to your customer’s language preference. It’s definitely not the best indicator because multiple languages can be spoken in any geographic location. In a pinch, though, you may be able to provide a language selection tool that provides a list of the most prominent languages spoken in a specific area of the world.

Default Application Language

Finally, when all else fails and there have been no other indicators, you can provide the UI in the default language of the application. If your company is in Germany, maybe the default is German. If it’s the U.S., your default language is most likely English…or maybe even Spanish. You have to display the application in some language, and the default at this point is your last option.

In Summary

To summarize, a web application can serve a global audience. In doing so, it may accommodate customers in a variety of languages. Your application’s user interface may be selected from numerous possibilities, numerous signals from the user. Those signals are important data points to consider when making the language choice to present to the user. Using the signals described in this article, you’ll be able to consider some of the more important language preference indicators. Follow the prioritization I’ve outlined here, and you’ll make the right language choice most of the time…until you don’t. And there will be times when you don’t make the right choice from all these signals. When that happens, and it will happen, you have to give your users some way to indicate that problem. Take a look at my previous blog entry about language selection widgets for help with that.

 

 

Unicode 6.1 Released

Unicodelogo

The Unicode Consortium announced the release of Unicode 6.1.0 yesterday. The new version adds characters for additional languages in China, other Asian countries and Africa. This version of the standard introduces 732 new characters.

In addition, the standard also added “labels” for character properties that will supposedly help implementers create better regular expressions that are both easier to read and easier to validate. I admit little knowledge about these labels at the moment, but will research and report on them in the future if time allows.

One of the oddities of the new version is the inclusion of 200 emoji variants. This is perhaps the only issue of the standard that I just don’t understand. Back in the day when I was more involved in Unicode development, we had a huge effort to unify variants of Chinese characters. We preached that Unicode characters were abstract entities with glyph renderings that were determined by font, style preferences of developers and apps. Now it appears that the Unicode consortium has changed its position on this.  Or maybe partially?. The addition of 200 emoji “variants” just seems unnecessary, but that’s just my opinion and I admit that I may not know all the issues that formed the consortium’s decision.

We have some examples, straight from the announcement, that show only 4 of the 200 new emoji variants:

Emoji tents

As the image shows, the “TENT” emoji has two variants — a text style and a more colorful, graphical emoji style. The standard defends these variants by saying that it allows implementations to distinguish preferred display styles. I think that is what fonts are for. Personally, I just don’t think variants are needed. And, I think that the variants make things more difficult for applications.

What do you think about variants in general? And what about emoji variants specifically?