Unicode 6.1 Released


The Unicode Consortium announced the release of Unicode 6.1.0 yesterday. The new version adds characters for additional languages in China, other Asian countries and Africa. This version of the standard introduces 732 new characters.

In addition, the standard also added “labels” for character properties that will supposedly help implementers create better regular expressions that are both easier to read and easier to validate. I admit little knowledge about these labels at the moment, but will research and report on them in the future if time allows.

One of the oddities of the new version is the inclusion of 200 emoji variants. This is perhaps the only issue of the standard that I just don’t understand. Back in the day when I was more involved in Unicode development, we had a huge effort to unify variants of Chinese characters. We preached that Unicode characters were abstract entities with glyph renderings that were determined by font, style preferences of developers and apps. Now it appears that the Unicode consortium has changed its position on this.  Or maybe partially?. The addition of 200 emoji “variants” just seems unnecessary, but that’s just my opinion and I admit that I may not know all the issues that formed the consortium’s decision.

We have some examples, straight from the announcement, that show only 4 of the 200 new emoji variants:

Emoji tents

As the image shows, the “TENT” emoji has two variants — a text style and a more colorful, graphical emoji style. The standard defends these variants by saying that it allows implementations to distinguish preferred display styles. I think that is what fonts are for. Personally, I just don’t think variants are needed. And, I think that the variants make things more difficult for applications.

What do you think about variants in general? And what about emoji variants specifically?


Using Combining Sequences for Numbers

Circled number fifty

Today I just happened to be looking through some of the precomposed Unicode circled numbers, numbers like ①, ②, ③, and so on. Just in case your system, doesn’t support the fonts for these characters, here’s an image that shows what I mean:

Precomp circled numbers

I wasn’t all that surprised to see these CIRCLED DIGIT ZERO, CIRCLED DIGIT ONE, CIRCLED DIGIT TWO, through CIRCLED DIGIT NINE characters. However, I was surprised to see precomposed characters for other numbers, numbers all the way up to 50:

Circled number fifty CIRCLED NUMBER FIFTY

Why stop at 50? Well, obviously Unicode can’t encode every number. Although Unicode doesn’t define a CIRCLED NUMBER FIFTY ONE, how can I create this using combining sequences? For example, for the above single digits, I have a couple options for displaying these:

  1. a precomposed character like U+2460 — ①
  2. a combining sequence like U+0031 U+20DD —   1⃝  the digit 1 followed by the COMBINING ENCLOSING CIRCLE character     ⃝

Again, if you can’t see that character sequence, here’s the image of U+0031 U+20DD:

Combining one


Alright, so there we have a great example of using two Unicode code points together to form a single visual glyph on-screen. But how do I get the COMBINING ENCLOSING CIRCLE character to combine over two previous digits? What if there were not a precomposed CIRCLED NUMBER FIFTY ONE? There’s isn’t one, by the way. And yet I want to enclose two or more arbitrary digits with the COMBINING ENCLOSING CIRCLE character. Hmmm….

Sigh…. I have to admit that I don’t actually know how to do this. I suspect that I can use some of Unicode’s control characters like START OF GUARDED AREA and END OF GUARDED AREA or …. I don’t really know.

When I find out, I’ll repost. If you know, please share!