In the move, the editor of Elle magazine, Jean-Dominique Bauby, has a stroke and wakes up unable to speak or move any part of his body except for one eye. A speech therapist helps him to communicate by reading the letters of the French alphabet one-by-one. If he blinks his eye during a letter, she writes it down and starts the process over again. Together they slowly spell out words. Using this technique he is eventually able to dictate a book (which inspired the movie).
From wikipedia (unsourced, of course):
"The book took about 200,000 blinks to write and an average word took approximately two minutes"Two minutes per word? Algorithms could have changed this man’s life!
The approach they used (often called linear search) is inefficient because each operation (the interpreter reading a letter, followed by Jean-Dominique either blinking or not blinking) only rules out at most 1 letter. Using a simple linear search, without sorting for frequency, you’d need 13.5 operations per letter (sometimes just 1, sometimes 26). The speech therapist, however, sorts the list by frequency (putting ‘e’ first) which helps a lot: reducing it to 7.5 operations per letter on average. Check my math here.
However, we can do better by ruling out more than 1 letter with each operation. The best approach is to rule out half of the remaining letters each time. Such a technique might look like this: The interpreter asks: “Blink if the letter is between A and M (inclusive).” Whichever option is chosen, 13 letters are eliminated. Say (for example) it’s a blink for ‘yes,’ we repeat the process on the remaining letters: “Blink if the letter is between A and G,” say there is no blink (meaning ‘no’). Now: “Blink if the letter is between H and J” (yes), “H and I?” (yes), “H?” (no) results in producing an “I” in 5 operations. In fact every letter is reachable in (at most) 5 operations. This is a 33% savings. If you trust Wikipedia, we’ve cut the 2 minute average per word down to 1 minute and 20 seconds. You might notice that as long as you use the same splits every time, the series of blinks and not-blinks will be the same for ever letter every time: A will always be YYYYY, B: YYYYN, etc. Substituting ‘1’s for Y and ‘0’s for N results in a technique that is similar to ASCII codes. ASCII is a very common format for storing text in a computer because it allows letters to be (fairly) efficiently represented in a simple 1 and 0 format.
It’s possible that the difficulty of dictation was an essential aspect of the book. (Neal Stehpenson writes his books (including the 900+ page Anathem) with a fountain pen to force himself to think about every sentence before he writes it down.) However, it seems unquestionable that this would have aided day-to-day communications.
 To create Huffman codes for a distribution:
- Begin by listing all the possible letters, with their probabilities
- Repeatedly merge the two smallest remaining probabilities, into a new hybrid node. The node will have a left child and a right child (the old nodes) and a probability equal to their sum. Repeat this process until there is only one node (which will have all other nodes as children).
- To intepret the resulting tree: Assign ‘0′ to left children and ‘1′ to right children. Any letter’s code is the path to get to it (in terms of 0’s and 1’s) starting from the root node.