November 18, 2009

Thinking more about hyphenation

Microsoft Reader is the only ebook reader that automatically hyphenates, a feature that really improves the "book feel" of the display in full-justification mode. It probably leverages the hyphenation module from Microsoft Word, but I wonder (linguistically and typographically) if purely algorithm-based hyphenation would be "good enough" for devices like the Kindle.

Such an algorithm would look for common prefixes and suffixes and double consonants and the like, and then use a few arithmetic formulae to calculate the maximum kerning and word-spacing boundaries and a simple decision matrix to decide if and where to throw in a hyphen.

Japanese character fonts are non-proportional. "Lined paper" for writing kanji is slightly modified grid paper. When typesetting Japanese, justification (the bottom margin) is by default. "Words" can wrap anywhere. You just keep reading characters until you hit a punctuation mark. It's not as confusing as you might imagine.

(This is due to the orthographically cataclysmic decision a thousand-plus years ago to import Chinese characters into what was the world's most elegant syllabary. The clear kana/kanji "word" boundaries and the use of explicit case markers makes parsing the written language fairly straightforward.)

My own feeling is that other than the most glaring violations of the basic syllable structure of the language, most readers would scan right through a "misplaced" hyphen and not notice it unless they stopped and thought about it.

