JavaScript Internationalization Libraries

7564199138 f613b0fc16 m
Creating a browser-based web application requires a combination of HTML, CSS, JavaScript, and backend services. Creating an internationalized application requires additional techniques and libraries to provide and manage localized data formats and resources. This document focuses just on front-end JavaScript needs.

Depending on your application, your internationalization needs will be different. However, common concerns include these issues:

  • date, time, number, and currency formatting
  • text localization
  • pluralization and complex messages
  • sorting and collation

More demanding use-cases may even require these:

  • phone number formatting
  • alternative calendars

Finally, considering that EcmaScript now includes an internationalization API that is already supported in Chrome and Firefox, I think the following is important:

In my opinion, using a single library that supports all of the above is preferable to using multiple libraries. However, one solution doesn’t always work for every situation and a single solution doesn’t even seem to exist. You may have to use two or more different libraries to get all the functionality you need. For example, libraries that provide general number formatting support typically don’t provide phone number support.

Considering all the features above, you might consider these JavaScript internationalization libraries:

EcmaScript Intl Library

EcmaScript, the language specification for JavaScript, has defined a new Internationalization API. This API is already implemented in Chrome and Firefox browsers, and may be partially implemented in other browsers as well. The Internationalization API defines a new namespace Intl that provides objects for the following:

  • collation
  • number formatting
  • date/time formatting

The new Intl library is extremely important because it means that you get some internationalization support without loading an external library. Additionally, implementations support the CLDR data, which is the best-practice for format patterns.

Unfortunately, this API is not available in all browser-supplied JavaScript implementations, Safari and Internet Explorer being notable holdouts. Additionally, many mobile browsers do not yet provide the APIs.

If you need to use a separate library, you should favor those that provide a polyfill for this API when possible. This will make future transitions smoother.


The Intl.js library is a polyfill for the EcmaScript Intl library. It provides the number and date/time formatting APIs. However, it does not provide the collation API. Because it uses CLDR data, you can feel comfortable that the actual formats created by this library will be the same as–or very similar to–the EcmaScript Intl library. However, because collation isn’t available, you obviously have to look elsewhere if that is needed. If you don’t need collation, this may be a good option for basic data formatting.

JQuery Globalize

TheJQuery Globalize library now provides CLDR support. It also provides the number, date, and time formatting. However, it gives you a couple additional features above Intl.js that make it worth considering:

  • message translation API
  • pluralization

Message translation is a localization API for common message strings. The library defines a file format for translatable message strings and gives you API for retrieving those strings after translation.

Pluralization is a feature that lets you accommodate the differences in word choice that are needed when word forms depend on their count. For example, the pluralization library let’s you conveniently handle word choice for 0, 1, or n number of mouse or mice. Both Polish and Russian, for example, have several plural forms for 2, 3, 4, 5, or even more instances of a particular noun.


The Format.js library was recently released by Yahoo. It builds upon the Intl.js library and adds support for the following:

  • support for template libraries like Handlebars, React, and Dust
  • cache support for Intl format objects
  • ICU message syntax for pluralization, gender, and other types of message variability
  • relative time (5 min ago, 2 hours ago, etc)

The set of APIs is modular. If you only need the relative time support, you can load just that library instead of other items.

Dojo I18n

If you are already using Dojo, Dojo’s own internationalization libraries are an obvious choice. With support for string resource bundles, date and time formatting, and number and currency formatting, Dojo’s library has a lot to offer. This library’s additional benefit is its support for the CLDR patterns.

Phone Number Library

The libphonenumber library solves a very specific need for phone number parsing and formatting. If your application needs this, this well-supported library from Google should handle your common use-cases. In addition to JavaScript, the library exists for Java, C++, and other languages. The library helps you parse, format, and validate phone numbers for many countries of the world. Surprisingly, the library also helps you determine the type of a phone number, for example, fixed-line, mobile, toll-free, etc.


No single JavaScript library exists for all internationalization needs. However, you should be able to use one of either Intl.js, Globalize.js, Format.js, or Dojo for basic data formatting. For phone numbers, the libphonenumber library seems to be the only real choice for now, but it seems to be well supported and adequate for many use cases. Unfortunately, I haven’t found a widespread collation solution yet. However, given the typical size and download-time constraints for JavaScript applications, especially in mobile settings, sorting on the server side before transmitting sortable data may be a better solution for now.

Good luck in your own internationalization work in JavaScript. If you haven’t already adopted a set of support libraries, consider some of the ones mentioned above.

JavaScript file encodings

All text files have a character encoding regardless of whether you explicitly declare it. JavaScript files are no exception. This article describes both how and why you should declare an encoding when importing script files into an HTML document.

JavaScript’s Character Model

A JavaScript engine’s internal character set is Unicode. The Ecmascript 5.1 Standard standard says that all strings are encoded in 16-bit code units described by UTF-16. Once inside the JavaScript interpreter, all characters and strings are stored and accessed as UTF-16 code units. However, before being processed by the JavaScript engine, a JavaScript file’s charset can be anything, not necessarily a Unicode encoding.

Character Encoding Conversion

When you import a JavaScript file into an HTML document, by default he browser uses the document’s charset to convert the JavaScript file into the interpreter’s encoding (UTF-16). You can also use an explicit charset when importing a file. When an HTML file charset and a JavaScript file charset are different, you will most likely see conversion mistakes. The results are mangled, incorrect characters.

Conversion Problems

I created a simple demonstration of the potential problem. The demo has 5 files:

  • jsencoding.html — base HTML file, UTF-8 charset
  • stringmgr.js — a basic string resource mgr, UTF-8 charset
  • resource.js — an English JavaScript resource file containing the word family, UTF-8 charset
  • resource_es.js — a Spanish file containing the word girl, ISO-8859-1 charset
  • resource_ja.js — a Japanese file containing the word baseball, SHIFT-JIS charset

In the base HTML file, I’ve imported 3 JavaScript resource files using the following import statements:

    <script src="resource.js"></script>
    <script src="resource_es.js"></script>
    <script src="resource_ja.js"></script>


The image shows how the text resources have been converted incorrectly. The browser imported the Spanish JavaScript file using the HTML file’s UTF-8 encoding even though the file is stored using ISO-8859-1. The Japanese resource script is stored as SHIFT-JIS and doesn’t convert correctly either.

After updating the import statements, we see a better result:

    <script src="resource.js" charset="UTF-8"></script>
    <script src="resource_es.js charset="ISO-8859-1"></script>
    <script src="resource_ja.js" charset="SHIFT-JIS"></script>

Correct conversions


To avoid charset conversion problems when importing JavaScript files and JavaScript resources, you should include the file charset. An even better practice is to use UTF-8 as your charset in all files, which minimizes these conversion problems significantly.

You can checkout the code for this article on my github account here:
I18n Examples

Google’s Dart is JavaScript++?

Several years ago, Google was unhappy with the pace of change in the Java language and community. Their solution was to create the Dalvik VM and their Android platform. They were careful to not call it a Java platform, a Java implementation, JVM, or JRE; instead, they said it was the Java language on the Dalvik VM.

Now is Google doing something similar to JavaScript? It is true that the JavaScript language is evolving slowly. Google certainly has shown that it can innovate without a standards body before (see Java above). Is Google trying the same thing again, but with JavaScript? Is Google attempting to ignore the Ecmascript/JavaScript standards community and move ahead without them?

Google has a new language called Dart. This language is like JavaScript, but has many new language features. It will be interesting to see if Google gets as much benefit and praise for this as they did Android and the Dalvik VM.

More on Dart:

What I find so interesting about the Dart announcement is that Google already has a great tool for developing web applications without coding directly in JavaScript — It’s GWT, the Google Web Toolkit. Basically, it’s their very popular toolset for writing applications in Java and compiling it down to browser neutral JavaScript. If you’re unfamiliar with GWT, yes, you read that correctly…a compiler from Java to JavaScript. As a user of GWT, I can say that it works great! And this has great appeal in the developer community already. So why Dart? Why yet another language that they compile to JavaScript?

Encoding URLs for non-ASCII query params

Are you a web service API developer? The web truly is a world-wide web. Unfortunately, a great number of globally unaware developers are on the global web. This creates an odd situation in which web services are globally accessible but only locally or regionally aware.

There are a few important things to remember when creating a global web service. Let’s just cover ONE today: non-ASCII query parameters are valid, useful, and often necessary for a decent, global web service.

It seems so obvious to me, and it probably does to you. Sometimes a service needs to exchange or process non-ASCII data. The world is a big place, and although English is an important part of the global web, more people speak a different language. English is a big percent, but lots of people use Chinese or an Indic language too. Let’s make sure your web service can process all those non-ASCII characters in English or any other language!

Let’s look at some examples of non-ASCII query params:


In these examples, you must perform two steps to get the query params (both keys and values) into the correct form:

  1. Convert the keys and their values to UTF-8 if they are not already.
  2. Perform the “percent encoding” on each UTF-8 code unit

To do #1, you’ll need to use whatever character conversion utility you have in your developer’s library: the Java charset encoding converters, whatever.

The #2 step is the important one for this blog. For each hexadecimal code unit in the UTF-8 query portion, you must “percent encode” the code unit. Let’s look at the first example query params:


The JavaScript function encodeURI actually does a good job of doing this for us:

encodeURI("name=田中&city=東京") produces the string:


Notice that you should also include this encoding for the keys in the param list. In the next example, I’ve used Japanese values for both keys and values.

encodeURI(“名前=田中&市=東京”) produces this string:


Note that both the keys and vaues have been “percent encoded”.

On the server side, your server will understand how to decode these values into their correct UTF-8 string values if you have configured it correctly. Correct configuration of a server usually involves a charset conversion filter for a servlet container and sometimes just a config setting for Apache.

More on this at a later time.


Encoding URIs and their components

As you pass data from the browser to the application server to the database, opportunities for data loss lurk. I highlighted some of those conversion points earlier, but I neglected a browser issue. The JavaScript layer has its own lossy points of interest. One of those points is the escape function.

The escape function “encodes” a string by replacing non-ASCII letters and some other punctuation symbols with escape sequences of the form %XX, where X is a hex digit. Unicode characters from \u0080 through \u00FF are converted to the %XX form as well. Unicode characters in higher ranges take the form %uXXXX. So, as an example, the name José will take the form Jos%E9. Go ahead, give it a try below:

The problem with this is that the escape mechanism is broken if you want to use UTF-8 as your document encoding. If you were dynamically composing URL strings with parameters, those parameters will definitely not be escaped correctly. Instead of Jos%E9 that URI component should really be Jos%C3%A9.

Fortunately, JavaScript has resolved the problem, but the solution means you’ll have to use another function. The escape function is deprecated in ECMAScript v 3. Instead, you should use the function encodeURI or encodeURIComponent. These functions convert their argument to the UTF-8 encoding and then %XX encode all the non-ASCII characters. Two forms of the function exist so that you have greater control over whether characters like “?” and “&” are encoded. You’ll need to check your documentation for details. You can experiment with the encodeURIComponent function here:

What’s this mean for you? Maybe nothing if you’re hopelessly attached to ISO-8859-1. However, if you’re trying to reach a global market with your product, chances are very good that you’ve decided to use UTF-8 for your character set encoding. That’s an excellent choice, but you’ll have to manage the conversion points. In a nutshell, that simply means that you’ll need to use UTF-8 from front to back consistently.

Part of managing those conversion points is consistently providing well-formed URIs to your application server. If you use JavaScript to manipulate data or to create dynamic URIs in your application, make sure you toss aside that deprecated escape function. Take a look at encodeURI and encodeURIComponent instead.

Localizing JavaScript with JSPs

Last week a friend asked an interesting web-based localization question. He surprised me with it. I wish that I had considered it before, but I had not. In my less than complete analysis of his problem, I found a solution, but I don’t know if he’ll like it. Hell, I barely like it myself, but it’s what I came up with quickly.

Here’s the question:

I want to localize a bit of JavaScript, nothing fancy now, just localized text strings. Once the JavaScript in pulled down into my browser, I don’t want to make any more trips to the server to get text. You have any idea how I can get localized strings in my JavaScript?

So here are a couple options:

Read more