The absolute minimum you should know about internationalization

By | 2014-09-13

Worldmapboards

Internationalization is a design and engineering task that prepares your software product to be localized. It doesn’t create a localized product; instead, it puts your product in a state that allows localization. The goal of internationalization should be a single code base that can be used as-is to create multiple localized versions of your product.

This article provides a high-level description of some issues you must resolve during internationalization. This is not  a comprehensive list:

  • Character sets
  • Resource externalization
  • User interface design
  • Data formats
  • Sorting

Character Sets

Your application will most likely manage, manipulate, store, and display information. Much of the information will be user-readable text. One property of text is it’s character set.

If you want a global-ready application, the choice of character set is simple: Unicode. Unicode allows you to manage text in practically any script without losing data due to character conversion problems. Regardless of the default  character set of the underlying host OS, your application should convert text to Unicode for internal manipulation. Additionally, your application should transmit and store text as Unicode. Doing anything else is unnecessarily complicated and completely unnecessary in any modern operating environment.

Unicode has several possible encodings, including UTF-16 and UTF-8. My experience is that developers rarely get to use just one of these. However, they are BOTH Unicode. Their only significant difference is how a specific code point is encoded in code units. Unless you have a well-understood reason for doing otherwise, I suggest you store and transmit text in UTF-8. Your specific programming language may require you to use UTF-16 for text operations. When displaying text to your user, you might use UTF-16 in a desktop application. When rendering HTML views, you can typically use UTF-16 or UTF-8. I suggest you use UTF-8 everywhere possible.

Resource Externalization

A resource is any text label, message, graphic image, video, audio, or other application file that you intend to present to the user. Instead of hard-coding these resources into your application code, you should extract them into external files that can be used at run-time. By extracting user-facing resources into resource files, you make translation and localization easier. Practically every programming environment provides a mechanism for creating external resource files. 

User Interface Design

User interface layout is often affected by the length of text labels, fields and other visable text. When designing layouts, remember that field and label sizes will increase for some language translations. Design your user interface with the largest label and field lengths in mind. Additionally, follow the typical rules for avoiding culturally sensitive images, hand gestures, and body parts. Also, avoid concatenating shorter pieces of text to build up larger sentences. When translated, the concatenated text rarely has correct syntax or meaning.

Some languages are written from right-to-left. If targeting those languages, remember that the entire layout of page components is often arranged from right-to-left. You may need to create a “reversible” layout that can accommodate those languages and cultures.

Data Formats

Numbers and dates have different formats around the world. Digit separators, currency symbols, and date field orders are all part of the many differences that you’ll need to consider. Fortunately, you don’t have to discover the correct formats and standards for every culture. Many programming environments already provide libraries to format numbers, currencies, and dates using the Common Locale Data Repository (CLDR) formats. 

The main point I want to share about formats is this: separate concerns for data formats by storing and manipulating data in a canonical, non-localized form and apply localized formats only in the “view” layer of your application.

Sorting

Languages have sorting rules. Those rules help you find names or products in long lists. Dictionaries, phone books, and product catalogs use linguistic sorting to help people find information quickly. When presenting long lists to your users, your application should use those sorting rules as well. Learn and use the sorting or collation libraries in your programming language or technology environment.

Conclusion

Internationalization is an effort to create products that can be translated and localized for many languages and cultures. Creating an internationalized product requires that you consider and plan for a variety of common technical issues. A few of those issues are character set choice, user-interface design, data formats and sorting. You rarely have to solve those issues yourself; you can often find and use existing libraries for this purpose.

More Resources

 

Leave a Reply

Your email address will not be published. Required fields are marked *