Roundtrip conversion: From Kanji to romaji and back again


Recently a friend asked me this question:

Can you do roundtrip conversions from Japanese characters to Latin text (romaji) and then back to Japanese characters?

The frustrating answer is that it depends…

Converting from Japanese text in any script (kanji, hiragana, or katakana) to romaji/latin text is easy actually. Every kanji has a relatively trivial mapping to a hiragana or even katakana representation. Also, we have well-known and simple conversion maps from hiragana and katakana to Latin characters (romaji).

Converting from romaji back to hiragana or katakana is also trivial.

The difficult conversion that almost always requires human intervention is going from romaji/hiragana/katakana to kanji. That’s what input method editors (IME) do, but usually with human interaction. Human interaction is usually needed because the conversion from kana or romaji to kanji is a 1:M relationship. That is, many kanji have the same hiragana representation. You could say that many kanji are “homophones”. I am not a linguist, so I can’t for certain say that “homophone” is the correct term for this. However, I can most definitely say that many kanji have the same hiragana representation and are pronounced the same.

Converting from kanji to romaji loses meaning and retains only the sound of the original word. Because there are many, many homophones, converting from romaji or even hiragana to kanji requires some help from a person…or your algorithm will inevitably convert to the wrong kanji out of a list of many homophones.

That said, there are many open-source libraries for doing this conversion, and these are usually part of input method editors (IME). One such library is WNN:

(Thanks to Ken Lunde, Twitter @ken_lunde, for help determining the correct Kanji to use for my lead-in image)


Additional Characters in the Joyo Kanji List


Japanese language students must learn 196 additional Kanji to consider themselves literate. These additional characters will be added to the already daunting 1945 characters that are part of the “Joyo Kanji” list, which brings the total count to 2136. Joyo kanji are the basic, fundamental characters of the language…the minimal set that an adult or post high school person should know.

I remember my college days learning Japanese. I thought I was pretty good to have learned the Joyo kanji in my 4 year career. At this point 4 years would barely be enough for me to choke down the additional requirements.

I think the motivation for adding the characters is interesting. As you might imagine, it is much easier to recognize a written kanji than it is to write it oneself. In this digital age, we have lots of help writing kanji. Input methods make it…dare I say…almost easy to write kanji. And since we read so much more than we typically write, and since input methods simplify text entry, we don’t really have to worry about recalling every stroke of any kanji. The software handles this for us nicely. So, not particularly concerned that students be able to accurately write the new characters, Japan’s Agency for Cultural Affairs has added these characters with the primary hope that students should at least read and recognize them.

You can learn a bit more about the Joyo Kanji: