Roundtrip conversion: From Kanji to romaji and back again

By | 2011-12-16

Henkan

Recently a friend asked me this question:

Can you do roundtrip conversions from Japanese characters to Latin text (romaji) and then back to Japanese characters?

The frustrating answer is that it depends…

Converting from Japanese text in any script (kanji, hiragana, or katakana) to romaji/latin text is easy actually. Every kanji has a relatively trivial mapping to a hiragana or even katakana representation. Also, we have well-known and simple conversion maps from hiragana and katakana to Latin characters (romaji).

Converting from romaji back to hiragana or katakana is also trivial.

The difficult conversion that almost always requires human intervention is going from romaji/hiragana/katakana to kanji. That’s what input method editors (IME) do, but usually with human interaction. Human interaction is usually needed because the conversion from kana or romaji to kanji is a 1:M relationship. That is, many kanji have the same hiragana representation. You could say that many kanji are “homophones”. I am not a linguist, so I can’t for certain say that “homophone” is the correct term for this. However, I can most definitely say that many kanji have the same hiragana representation and are pronounced the same.

Converting from kanji to romaji loses meaning and retains only the sound of the original word. Because there are many, many homophones, converting from romaji or even hiragana to kanji requires some help from a person…or your algorithm will inevitably convert to the wrong kanji out of a list of many homophones.

That said, there are many open-source libraries for doing this conversion, and these are usually part of input method editors (IME). One such library is WNN:

(Thanks to Ken Lunde, Twitter @ken_lunde, for help determining the correct Kanji to use for my lead-in image)

 

Leave a Reply

Your email address will not be published. Required fields are marked *