Google‘s purchase of reCAPTCHA is hailed as win-win for users and the search giant. Not so much for critics of the search giant’s controversial book scanning project, which the purchase bolsters.
In the deal, Google gets crowdsourced help for its massive book scanning projects. Users get access to the scanned printed material, though under what terms remains the subject of considerable controversy.
You’ve probably done this many times: You want to sign up for something online and at the end are expected to retype jumbled or distorted characters that only real humans are supposed to be able to read. That’s a CAPTCHA.
The idea is that a robot won’t be able to read and reenter the characters and the site will be protected from bogus spam entries created by computers. Users don’t directly benefit from this and some find it can take several tries to enter the characters correctly.
CAPTCHA’s are widely used and reCAPTCHA is only one implementation of the technology, which has been changed as hackers have learned how to parse the distorted text used by Yahoo and others.
The underlying technology was developed at Carnegie Mellon University, which sold reCAPTCHA to Google, in deal announced Wednesday on Google’s corporate blog. Terms were not disclosed.
reCAPTCHA uses images from books as its human-readable text. They include words that a book-scanning engine has been unable to recognize. By entering words into the reCAPTCHA box, users are helping the scanner do its job, Google said.
So far, reCAPTCHA has helped index back issues of the New York Times, according to program founder Luis von Ahn. The idea behind tying CAPTCHA technology to training optical book scanning is Ahn’s attempt to put the time users were spending answering CAPTCHA challenges to good use.
von Ahn appears in this three-year-old Google corporate video discussing the project.
Part of the attraction of reCAPTCHA for end users is the sense that it helps both fight spam and turn old books into modern digital volumes. reCAPTCHA’s slogan, “Stop spam, read books” is an easy concept to like.
reCAPTCHA and similar CAPTCHA programs have, thus far, been mostly successful in defeating robotic attacks. But, the ability for hackers to create software that can recognize the distorted characters, even that grabbed from old newspapers and books, is a growing threat over time.
Until that threat is realized, and anti-CAPTCHA hacks become widely used, reCAPTCHA is a win for everyone, provided you think Google’s controversial book scanning is a good idea. And that, I am not so sure about.
David Coursey has used reCAPTCHA on his own Web site. He tweets as @techinciter and can be contacted via his site.