‘Spot the bot’ just got harder

Web admins who rely on reCAPTCHA v2 to protect their sites against automated account creation and spam might want to consider implementing the latest version of the Google service, as researchers once again successfully circumvented the tool’s anti-bot mechanism.

A team of academics from the University of Maryland made news last year when they launched UnCaptcha (PDF), a Python-based program that could solve the audio challenges posed by reCAPTCHA.

Using browser automation software, UnCaptcha could identify the spoken numbers of an audio version of the challenge and automatically re-enter them into the answer field. The proof-of-concept tool had an 85% success rate.

Soon after UnCaptcha was released, Google updated the reCAPTCHA audio challenges to include spoken phrases rather than digits, making the attack obsolete.

It seems, however, that these mitigations served to make solving the challenges easier than ever. UnCaptcha2 – a modified version designed to handle the new, phrase-based audio challenges – has a success rate of more than 90%.

“The reCAPTCHA update changed the audio CAPTCHA to use a short phrase instead – typically three to four words,” George Hughey of the University of Maryland’s Advanced Cybersecurity Honors College told The Daily Swig this week.

“While this rendered our original (digits-only) UnCaptcha design obsolete, the use of phrases is actually much more in line with the type of speech these services are designed to handle, making the attack easier to pull off.”

According to Hughey, this new attack works well with a range of speech-to-text services (ironically, including one developed by Google) without any post-processing.

Released on December 31, the tool also features a rudimentary screen clicker to imitate human interactions. Full details can be found on the UnCaptcha2 repo.

Controlling the flow

Hughey was quick to point out that, like the original version, UnCaptcha2 is intended to be a proof of concept demo rather than a hacking tool.

“We didn’t want the attack to be used for malicious purposes, which is why we published only a relatively crude attack, using a screen clicker to navigate,” he said.

According to the researcher, Google has formally classified the attack as “intended behavior” and one which is out of scope for its bug bounty program.

For Hughey, however, this is an oversight that could have “potentially widespread” impact on sites around the world.

“There are a number of sites that use reCAPTCHA as their primary (and sometimes sole) line of defense against attackers,” he said. “For example, some sites use CAPTCHAs to prevent DDoS attacks. Others use reCAPTCHA to prevent [automated] account creation, such as Reddit.”

He added: “Limiting account creation is very important, particularly in the age of using bots to influence public discourse. If an attacker could quickly create millions of Reddit [accounts], absent of other defense mechanisms, they could ostensibly control the flow of information on that website.”

V3 engine

Google launched v3 of reCAPTCHA last October. The latest version of the tech firm’s popular user verification tool shuns the ‘I’m not a robot’ checkbox authentication model in favor of a more sophisticated, score-based system.

While Hughey explained that UnCaptcha2 does not work against sites using reCAPTCHA v3, he noted that the earlier version is still widely used by sites of all sizes.

“ReCAPTCHA v3 is designed to monitor user activity on a site and give the user a programmatic score for the ‘legitimacy’ of the interaction for site administrators to use and take further action with,” he said.

“Because v3 doesn’t pose challenges directly to the user like v2 does, v3 is not susceptible to unCaptcha as it is written. ReCAPTCHA v2 is still very widely used across the internet, however, and has served as the inspiration for other similar CAPTCHA systems, and v3 is still relatively in its infancy.”

The Daily Swig has approached Google for comment. This article will be updated as and when we receive a response.