Category Archives: Unicode

Unicode Characters and Alternative Glyphs

Unicode defines thousands of characters. Some “characters” are surprising, and others are obvious. When I look at the Unicode standard and consider the lengthy debates that occur when deciding upon whether a character should be included, I can imagine the discussion and rationalization that occurs. Deciding on including a character can be difficult. One of… Read More »

Unicode 6.1 Released

The Unicode Consortium announced the release of Unicode 6.1.0 yesterday. The new version adds characters for additional languages in China, other Asian countries and Africa. This version of the standard introduces 732 new characters. In addition, the standard also added “labels” for character properties that will supposedly help implementers create better regular expressions that are… Read More »

Terminology: Unicode Character Encoding

In a recent blog, I described the terms character set, charset, and coded character set. In this blog, we’ll take a small step forward to define a few more terms: encoding form code unit encoding scheme Before going to much further, you can get all the information in this blog from a much more authoritative… Read More »

Not forgotten

No, I haven’t forgotten my promise to cover a few more Unicode terms. However, please excuse me while I recover from my recent vacation. In this case, my vacation has rendered me useless for a couple days after my return. Hundreds of emails have gathered in my email INBOX, and I’m still processing them. I… Read More »

Unicode Terminology

I am sometimes asked whether Unicode is a 16-bit character set. The answer is not a simple no, but it is no. The question always reminds me how important terminology is too. Terminology is the focus of this particular post. At one point long ago, when Unicode was a relative newcomer to the character set… Read More »

Using Combining Sequences for Numbers

Today I just happened to be looking through some of the precomposed Unicode circled numbers, numbers like ①, ②, ③, and so on. Just in case your system, doesn’t support the fonts for these characters, here’s an image that shows what I mean: I wasn’t all that surprised to see these CIRCLED DIGIT ZERO, CIRCLED… Read More »

Encoding URLs for non-ASCII query params

Are you a web service API developer? The web truly is a world-wide web. Unfortunately, a great number of globally unaware developers are on the global web. This creates an odd situation in which web services are globally accessible but only locally or regionally aware. There are a few important things to remember when creating… Read More »

Attending IUC 34 and career longevity

After a few years being away from the internationalization crowd, I’m attending the Internationalization and Unicode Conference again this year. How great to see old friends and to make new ones. Some things are new — some new people. However, many things are old or definitely older. What’s old? Well, for one, the problems. It’s… Read More »