Welcome to ...

The place where the world comes together in honesty and mirth.
Windmills Tilted, Scared Cows Butchered, Lies Skewered on the Lance of Reality ... or something to that effect.


Tuesday, April 5, 2011

Using CAPTCHA to Decipher Old Text

If you think that CAPTCHA, the squiggly lines you have to decipher in order to login or place your comments on many websites, are only there to keep out spammers, think again!
There is actually another use of the annoying feature: to correct mistakes in scanning old text. Guy Gugliotta of the New York Times explains:
For vintage 19th-century texts in English, O.C.R. programs mess up or miss 10 percent to 30 percent of the words. Only humans can fix the errors. The standard method, called key and verify, uses two transcribers to type the text independently and compares the results. This is time-consuming and extremely expensive.
But in 2006, Dr. von Ahn’s team figured out a way around this obstacle. The ubiquitous Captchas, familiar to even the most casual Web user, were the perfect tools. Captchas, short for “completely automated public Turing test to tell computers and humans apart,” are impossible for machines to decipher, but easy for humans. (The test is named for the British computer pioneer Alan Turing.)
Dr. von Ahn’s group estimated that humans around the world decode at least 200 million Captchas per day, at 10 seconds per Captcha. This works out to about 500,000 hours per day — a lot of applied brainpower being spent on what Dr. von Ahn regards as a fundamentally mindless exercise.
“So we asked, ‘Can we do something useful with this time?’ ” Dr. von Ahn recalled in a telephone interview. Instead of making Captchas out of random words printed in a woozy way, why not ask Web users to translate problem words from archival texts?

No comments: