Tag Archives: Unicode

Best practices for character sets

You may not understand every language, but that doesn’t mean your applications can’t. Regardless of your customer’s language choice, your application should be able to process, transfer, and store their data. Even if you don’t provide a localized user interface, your application should allow your customer to enter text in their own language and in… Read More »

Unicode Characters and Alternative Glyphs

Unicode defines thousands of characters. Some “characters” are surprising, and others are obvious. When I look at the Unicode standard and consider the lengthy debates that occur when deciding upon whether a character should be included, I can imagine the discussion and rationalization that occurs. Deciding on including a character can be difficult. One of… Read More »

Standard Charsets in Java 7

Once in a while I poke my nose through the release notes of new Java releases. It’s not a particularly rewarding activity, but this time I did find something interesting. Oddly enough, it was interesting for what it did NOT say. I was surprised, so I thought you might want to know about a new… Read More »

Internationalization & Unicode Conference 36 Call for Papers

The IUC 36 call for papers went out last week — http://www.unicodeconference.org/e/IUC36-CfP-03-29-12.htm
This conference event brings together the best minds, ideas, and practices in the worlds of internationalization and localization, There’s content sessions to please everyone including technical engineers, project managers, and product managers.

Unicode 6.1 Released

The Unicode Consortium announced the release of Unicode 6.1.0 yesterday. The new version adds characters for additional languages in China, other Asian countries and Africa. This version of the standard introduces 732 new characters. In addition, the standard also added “labels” for character properties that will supposedly help implementers create better regular expressions that are… Read More »

Terminology: Unicode Character Encoding

In a recent blog, I described the terms character set, charset, and coded character set. In this blog, we’ll take a small step forward to define a few more terms: encoding form code unit encoding scheme Before going to much further, you can get all the information in this blog from a much more authoritative… Read More »

Unicode Terminology

I am sometimes asked whether Unicode is a 16-bit character set. The answer is not a simple no, but it is no. The question always reminds me how important terminology is too. Terminology is the focus of this particular post. At one point long ago, when Unicode was a relative newcomer to the character set… Read More »

Using Combining Sequences for Numbers

Today I just happened to be looking through some of the precomposed Unicode circled numbers, numbers like ①, ②, ③, and so on. Just in case your system, doesn’t support the fonts for these characters, here’s an image that shows what I mean: I wasn’t all that surprised to see these CIRCLED DIGIT ZERO, CIRCLED… Read More »