Category Archives: Unicode

Best practices for character sets

You may not understand every language, but that doesn’t mean your applications can’t. Regardless of your customer’s language choice, your application should be able to process, transfer, and store their data. Even if you don’t provide a localized user interface, your application should allow your customer to enter text in their own language and in… Read More »

Unicode Characters and Alternative Glyphs

Unicode defines thousands of characters. Some “characters” are surprising, and others are obvious. When I look at the Unicode standard and consider the lengthy debates that occur when deciding upon whether a character should be included, I can imagine the discussion and rationalization that occurs. Deciding on including a character can be difficult. One of… Read More »

Unicode 6.1 Released

The Unicode Consortium announced the release of Unicode 6.1.0 yesterday. The new version adds characters for additional languages in China, other Asian countries and Africa. This version of the standard introduces 732 new characters. In addition, the standard also added “labels” for character properties that will supposedly help implementers create better regular expressions that are… Read More »

Terminology: Unicode Character Encoding

In a recent blog, I described the terms character set, charset, and coded character set. In this blog, we’ll take a small step forward to define a few more terms: encoding form code unit encoding scheme Before going to much further, you can get all the information in this blog from a much more authoritative… Read More »

Not forgotten

No, I haven’t forgotten my promise to cover a few more Unicode terms. However, please excuse me while I recover from my recent vacation. In this case, my vacation has rendered me useless for a couple days after my return. Hundreds of emails have gathered in my email INBOX, and I’m still processing them. I… Read More »

Unicode Terminology

I am sometimes asked whether Unicode is a 16-bit character set. The answer is not a simple no, but it is no. The question always reminds me how important terminology is too. Terminology is the focus of this particular post. At one point long ago, when Unicode was a relative newcomer to the character set… Read More »

Using Combining Sequences for Numbers

Today I just happened to be looking through some of the precomposed Unicode circled numbers, numbers like ①, ②, ③, and so on. Just in case your system, doesn’t support the fonts for these characters, here’s an image that shows what I mean: I wasn’t all that surprised to see these CIRCLED DIGIT ZERO, CIRCLED… Read More »

Encoding URLs for non-ASCII query params

Are you a web service API developer? The web truly is a world-wide web. Unfortunately, a great number of globally unaware developers are on the global web. This creates an odd situation in which web services are globally accessible but only locally or regionally aware. There are a few important things to remember when creating… Read More »