Archive

Archive for the ‘Standards’ Category

Unicode 6.1 Released

February 1st, 2012 joconner No comments

Unicodelogo

The Unicode Consortium announced the release of Unicode 6.1.0 yesterday. The new version adds characters for additional languages in China, other Asian countries and Africa. This version of the standard introduces 732 new characters.

In addition, the standard also added “labels” for character properties that will supposedly help implementers create better regular expressions that are both easier to read and easier to validate. I admit little knowledge about these labels at the moment, but will research and report on them in the future if time allows.

One of the oddities of the new version is the inclusion of 200 emoji variants. This is perhaps the only issue of the standard that I just don’t understand. Back in the day when I was more involved in Unicode development, we had a huge effort to unify variants of Chinese characters. We preached that Unicode characters were abstract entities with glyph renderings that were determined by font, style preferences of developers and apps. Now it appears that the Unicode consortium has changed its position on this.  Or maybe partially?. The addition of 200 emoji “variants” just seems unnecessary, but that’s just my opinion and I admit that I may not know all the issues that formed the consortium’s decision.

We have some examples, straight from the announcement, that show only 4 of the 200 new emoji variants:

Emoji tents

As the image shows, the “TENT” emoji has two variants — a text style and a more colorful, graphical emoji style. The standard defends these variants by saying that it allows implementations to distinguish preferred display styles. I think that is what fonts are for. Personally, I just don’t think variants are needed. And, I think that the variants make things more difficult for applications.

What do you think about variants in general? And what about emoji variants specifically?

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Fonts, Standards, Unicode Tags: , ,

Deconstructing BCP 47

November 29th, 2011 joconner No comments

BCP 47 stands for Best Common Practice 47, and even without the acronym, the name alone means almost nothing. So, what is BCP 47?

BCP 47 is the current best practice for creating language codes. A language code is a text identifier that specifies a specific human language, and the code provides the means to define the language in terms of a basic language, a script used to write that language, and even a particular region in which the language is used. BCP 47 prescribes the code and its parts with enough precision to uniquely identify a natural, human language and distinguish it from other languages.

BCP 47 is a standard that uses other standards, and it prescribes how to combine those standards together to create a language code. BCP 47 is a combination of at least the following existing standards:

Why is this important to you in the internationalization or localization business? It is important because our industry requires common standards and agreement for how to communicate, transfer, and exchange language data. A BCP 47 tag is necessary to accurately identify language text across different applications and tools.

Lots of existing applications, tools, and platforms already use BCP 47:

This is not an exhaustive list, but hopefully it gives you a sense of the importance of this standard. When you need to tag data with a language identifier, you should seriously consider BCP 47 instead of any home-grown convention.

Having provided plenty of links in this post, I hope you’ll take some time to familiarize yourself with this important language tagging standard. Happy reading!

 

 

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Categories: Language, Standards, Web Tags: