Optional whitespaces were ‘a recurring source of vulnerabilities’ in regex implementations

UPDATED A newly launched regex-scanning tool has been used by its architects to unearth multiple regular expression denial-of-service (ReDoS) vulnerabilities in popular NPM, Python, and Ruby dependencies.

Released yesterday (March 11), Regexploit extracts regular expressions and scans them for widespread security weaknesses that, if exploited, can “bring a server to its knees”, said Doyensec researcher Ben Caller in a technical blog post.

Upon finding a suspected ReDoS issue, researchers from the appsec firm manually tried to reach the developers of applications with dubious regexes that allowed untrusted input.

What is a ReDoS attack?

Web apps with a search function often make use of regular expressions, or ‘regex’, which allow the user (or developer) to define a search pattern.

In some scenarios, specially crafted strings can force computations that overwhelm an app’s regex engine, causing the underlying web servers to work themselves to a standstill.

This is known as a ‘regular expression denial-of-service’ (ReDoS) attack.

Unlike DDoS attacks, ReDoS can be achieved with as little as a single request.

“While ReDoS is certainly not new, many developers still remain unaware of the danger of computational-expensive regular expressions,” Luca Carettoni, co-founder of Doyensec, told The Daily Swig. “We've had success detecting exploitable regexes in all sorts of open-source software and during client engagements.”

Regexploit: Perfect match

Whereas similar hacking tools typically hunt for regexes with “exponential worst-case complexity” (eg, (a+)+b), Regexploit can also flag serious security risks in cubic complexity regexes (such as a+a+a+b).

The tool, which has built-in support for extracting regexes from Python, JavaScript, TypeScript, C#, JSON, and YAML, “tries to find ambiguities where a single character could be captured by multiple repeating parts”.


RELATED Unpatched regex bug leaves Node.js apps open to ReDoS attacks


It then attempts to make the regular expression not match in order to force the regex engine to backtrack, explained Caller.

Poorly designed regexes, “where input can be matched in different ways”, can mean that malicious input triggers resource-intensive backtracking loops of the sort that caused an outage at Cloudflare in 2019.

Mishandling optional whitespace

The mishandling of optional whitespace was “a recurring source of vulnerabilities”, as was the case with a cubic ReDoS bug in how cpython’s http.cookiejar processed cookie expiry dates with compatibility for certain deprecated date formats.

If a remote, malicious server responded to a HTTP request like requests.get('http://evil.server') with Set-Cookie form headers, said Caller, Python’s 65,506-space limit on HTTP header lines means “the client will take over a week to finish processing the header.”


Read more about the latest hacking tools


The researchers also noticed that the “troublesome regexes” they uncovered “had mostly remained untouched since they first entered the codebase”.

This, Caller speculated, indicated that not only had they caused “no issues in normal conditions”, but were perhaps also “too illegible to maintain”.

Doyensec’s Luca Carettoni said “feedback and engagement on our social channels” in relation to Regexploit “have been overwhelmingly positive”.

After being contacted by The Daily Swig for comment, security researcher Somdev Sangwan tested Regexploit against three exploitable regexes that he had previously found in ModSecurity CRS and “it was able to flag two of them.

“This is a much-needed tool and it works well,” he added. “Being an open-source project, it will only get better with time.”

Mitigations

Caller said whitespace ambiguity could be addressed by using a simple regex and trimming spaces adjacent to the result.

He also advised developers to consider using “‘possessive quantifiers’ to mark sections as non-backtrackable”, where practical, and consider using deterministic finite automaton to ensure regex matching unfolds in “linear time regardless of input” (albeit this can entail a performance trade-off, as with Google’s RE2 regex engine).


This article was updated on March 12 with comments from researcher Somdev Sangwan, and again on March 15 with comments from Luca Carettoni of Doyensec.


DON’T FORGET TO READ Blind regex injection: Theoretical exploit offers new means of forcing web apps to spill secrets