06 May 2009

The Diving Bell and the Linear Search

I watched The Diving Bell and the Butterfly recently. (Note: This isn’t much of a spoiler, but if you don’t like even little spoilers then stop reading.)

In the move, the editor of Elle magazine, Jean-Dominique Bauby, has a stroke and wakes up unable to speak or move any part of his body except for one eye. A speech therapist helps him to communicate by reading the letters of the French alphabet one-by-one. If he blinks his eye during a letter, she writes it down and starts the process over again. Together they slowly spell out words. Using this technique he is eventually able to dictate a book (which inspired the movie).
From wikipedia (unsourced, of course):
"The book took about 200,000 blinks to write and an average word took approximately two minutes"
Two minutes per word? Algorithms could have changed this man’s life!

The approach they used (often called linear search) is inefficient because each operation (the interpreter reading a letter, followed by Jean-Dominique either blinking or not blinking) only rules out at most 1 letter. Using a simple linear search, without sorting for frequency, you’d need 13.5 operations per letter (sometimes just 1, sometimes 26). The speech therapist, however, sorts the list by frequency (putting ‘e’ first) which helps a lot: reducing it to 7.5 operations per letter on average. Check my math here.

However, we can do better by ruling out more than 1 letter with each operation. The best approach is to rule out half of the remaining letters each time. Such a technique might look like this: The interpreter asks: “Blink if the letter is between A and M (inclusive).” Whichever option is chosen, 13 letters are eliminated. Say (for example) it’s a blink for ‘yes,’ we repeat the process on the remaining letters: “Blink if the letter is between A and G,” say there is no blink (meaning ‘no’). Now: “Blink if the letter is between H and J” (yes), “H and I?” (yes), “H?” (no) results in producing an “I” in 5 operations. In fact every letter is reachable in (at most) 5 operations. This is a 33% savings. If you trust Wikipedia, we’ve cut the 2 minute average per word down to 1 minute and 20 seconds. You might notice that as long as you use the same splits every time, the series of blinks and not-blinks will be the same for ever letter every time: A will always be YYYYY, B: YYYYN, etc. Substituting ‘1’s for Y and ‘0’s for N results in a technique that is similar to ASCII codes. ASCII is a very common format for storing text in a computer because it allows letters to be (fairly) efficiently represented in a simple 1 and 0 format.

If all letters were equally common, this would be the best we could do. However, as with linear search we can use letter frequency to our advantage. For example: since ‘e’ is the most common letter, we would like to make it reachable in less than 5 operations, even if that means ‘z’ will take more than 5 operations. In 1954, David Huffman developed a technique which provably produces the most efficient codes (in terms of operations per letter) for a given distribution of letters (the technique is known as Huffman coding). The technique is actually quite simple [1], the proof of optimality is a little more complicated (it’s section 16.3 of CLRS). I wrote a little bit of JavaScript to generate Huffman Codes for English here. The codes I found come out to an expected length of 4.2 operations per letter. Note these codes use Morse Code format (- for a long blink, . for a short blink), since asking questions would be a lot more complicated with these codes. Instead you’d memorize the codes (which seems difficult, but many telegraph operators get quite fast with it). Morse Code, of course, would be another good alternative: It’s slightly less optimal than Huffman codes, but it’s close and people might already be familiar with it). 4.2 operations is a 44% speedup over the baseline, which would get us down to 1 minute 8 seconds per word. Capitalizing on unequal distributions to save space using Huffman codes forms the basis of the zip file format.

It’s possible that the difficulty of dictation was an essential aspect of the book. (Neal Stehpenson writes his books (including the 900+ page Anathem) with a fountain pen to force himself to think about every sentence before he writes it down.) However, it seems unquestionable that this would have aided day-to-day communications.

[1] To create Huffman codes for a distribution:
- Begin by listing all the possible letters, with their probabilities
- Repeatedly merge the two smallest remaining probabilities, into a new hybrid node. The node will have a left child and a right child (the old nodes) and a probability equal to their sum. Repeat this process until there is only one node (which will have all other nodes as children).
- To intepret the resulting tree: Assign ‘0′ to left children and ‘1′ to right children. Any letter’s code is the path to get to it (in terms of 0’s and 1’s) starting from the root node.