We’ve all struggled with those often unreadable security Captcha codes from time to time, but if some new research out of Stanford is any clue, machines might be better at it than us very soon. By using machine vision algorithms, the Stanford team was able to defeat 66% of Visa’s Captchas, 70% of those used by Blizzard, and about a quarter of Wikipedia’s Captchas. This may spell trouble for a multitude of other sites, as well.
If you’ve ever joined a website that uses reCAPTCHA, you’re familiar with the interface. You’re presented with two English words partially obscured with lines. You must prove that you are not a robot by correctly entering them. One of the words is actually text from a scanned book that an OCR program couldn’t read. You’re just helpfully transcribing it and it has no effect on gaining access. Now, Jonathan Wilkins of iSEC Partners is saying some robots may soon be slipping through as well.
In a series of tests, the iSEC automated system was able to manage a 17.5% reCAPTCHA success rate. While this doesn’t sound like a lot, those wishing to bypass reCAPTCHA authentication could have access to botnets of thousands of infected machines. Even a small success rate could spell big problems for website security. The system guessed 10% correct outright, and got one word correct in an additional 25%. Since it can be assumed that 50% of those words were the unknown book text, the success rate works out to the stated figure of 17.5%.
Google, who recently acquired reCAPTCHA, explained that the data was gathered in 2008 and doesn’t take into account changes to the system since then. "Therefore, this study does not reflect the effectiveness of reCAPTCHA's current technology against machine solvers. We've found reCAPTCHA to be far more resilient while also striking a good balance with human usability, and we've received very positive feedback from customers," Google said in a statement. Whether or not reCAPTCHA is broken, the internet arms race is sure to continue.
Google announced today that they acquired reCAPTCHA, the popular anti-bot service. reCAPTCHA offers a first line of defense against internet bots that exploit web forms with malicious intent. They are also widely known for their participation in helping to digitize print media formats. No surprises in why Google would be interested in such a project.
ReCAPTCHA advertises that they are currently helping to digitize old print versions of the New York Times. However, it’s not too far a leap to assume Google will be using reCAPTCHAs to bolster its own text scanning efforts (Google Books). Approximately 200 million captchas are solved by humans each day, and each one moves digitizing projects one step closer to improving the way computers recognize words on paper.
“Improving the availability and accessibility of all the information on the Internet is really important to us, so we're looking forward to advancing this technology with the reCAPTCHA team” said Luis von Ahn (co-founder of reCAPTCHA) and Will Cathcart (Google Product manager) on the official Google blog.