Absorbing reCAPTCHA, the word-verification Internet security organization, was a natural progression for the Google Book Project. The CATPCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) technology creates randomly generated words that appear as images, making it difficult for computers to read. These words are derived from scanned print materials.
That means reCAPTCHA has a firm grip on what it takes to read digitized print. For its book project, Google wants to convert these images into plain text "because plain text can be searched, easily rendered on mobile devices, and displayed to visually impaired users," Google wrote on its blog.
The software used is Optical Character Recognition (OCR), which helps extract words from flat images. This is important in the book scanning process, as many older works have been damaged, faded, or otherwise mangled beyond computer recognition capabilities. Once humans type words into these CAPTCHA boxes, reCAPTCHA collects the data. This data is priceless; it is, in essence, the crevices and canyons of a computer's brain.
Google is also psyched about the security potential reCAPTCHA will bring to its group. The resource will surely tighten belts. But won't implementing CAPTCHA for the purposes of Google Books ironically turn the tables on its Internet security capabilities? If we teach computers to read text that computers aren't supposed to read, CAPTCHA becomes useless. Seeing as how CAPTCHA is used as a security measure by a multitude of Web sites, those sites will have to dramatically overhaul their efforts in order to keep up with the Google Book Project's steamrolling ambition.
Privacy issues have surfaced based on this acquisition. Many SEO and Webmasters are dropping reCAPTCHA from their Web sites, according to Search Engine Roundtable. Commenters on WebmasterWorld said, "Now busily removing reCAPTCHA from all our sites. Google has enough beacons already. No need for more," and, "next month you're all going to get ads showing in your CAPTCHA box."
I'm quasi-confident that Google has already thought this one through and has a solution to the problem. Maybe it's not a problem at all. But given Google's almost blind desire to see its Books Project come to fruition, and the potential that it has disregarded the philosophy behind reCAPTCHA, makes me a little wary of the damaging potential of this project. Google may be getting too smart for its own good, and making computers "think" far too well.