John O'Conner

Software internationalization

Tuesday Apr 08, 2008

Basic Definitions for a Unicode Discussion

If you want to communicate, defining confusing terms right up front is always a good first step. So I'll try to define some Unicode terms:

CharacterThe smallest unit of meaning in a written language. This unit typically has a common shape and meaning, although specific shapes can vary quite dramatically. Specific shapes are more commonly called glyphs.
Character SetAn unordered collection of characters.
Coded Character Set An ordered character set in which each character has an assigned integer value.
Code PointThe integer value of a character within a coded character set.
Character EncodingA mapping of code points to a series of bytes.
Code UnitA single octet or byte of an encoded character.
CharsetOften used as a synonym for Coded Character Set.
.

You can always see more terms by visiting the Unicode Glossary.

Comments:

Post a Comment:
Comments are closed for this entry.

Archives
Links
Referrers