Researchers urge developers to secure code by disallowing non-ASCII characters

Security researchers have detailed how backdoors can be concealed within JavaScript by Unicode characters that are either invisible or readily confused with other characters.

As a result, they contend, malicious code can evade detection during even otherwise thorough code reviews.

Inherently effective at obfuscating code, “Unicode should be kept in mind when doing reviews of code from unknown or untrusted contributors” – something particularly applicable to open source projects, said Wolfgang Ettlinger, security researcher at Austrian cybersecurity firm Certitude Consulting, in a blog post.

Leave no trace

The hacking technique was inspired by a Subreddit post documenting a developer’s struggles to identify a syntax error resulting from an invisible Unicode character hidden in JavaScript source code.

Resolving to implant a backdoor that trails no evidence of its presence, the researchers chose “ㅤ” (called a ‘HANGUL FILLER’) as their invisible Unicode character because it has the ID_Start property and can therefore appear in a JavaScript variable.

RECOMMENDED ‘Focus on brilliance at the basics’ – GitHub CSO Mike Hanley on shifting left and securing the software supply chain

The following code snippet visualizes how the invisible character could pass unnoticed by replacing the character in question with its escape sequence representation: const { timeout,\u3164} = req.query;.

A destructuring assignment retrieves from req.query the timeout and “ㅤ” parameters, and if passed the “ㅤ” is assigned to the invisible variable, explained Ettlinger.

Similarly, when the checkCommands array is constructed, this hidden variable makes it into the array, whose constituent elements are passed to the exec function, which duly executes OS commands.

An attacker could execute arbitrary OS commands by passing the “ㅤ” parameter to the endpoint in its URL-encoded form, said Ettlinger.

“This approach cannot be detected through syntax highlighting as invisible characters are not shown at all and therefore are not colorized by the IDE/text editor,” he added.

Homoglyph attacks

Meanwhile, the researchers found that homoglyph attacks could be mounted using Unicode ‘confusables’ – such as /, −,+, ⩵, ❨, ⫽, ꓿, and ∗ – that resembled operators.

Ettlinger posted a script in which the ‘ǃ’ character is actually an ‘ALVEOLAR CLICK’. The relevant line – if(environmentǃ=ENV_PROD){ – therefore assigns the ‘PRODUCTION’ string to the previously undefined environmentǃ variable.

“Thus, the expression within the ‘if’ statement is always true,” said Ettlinger.

Real-world probability

“We haven’t holistically studied what additional factors could prevent this approach, therefore we cannot make any definitive statement to the real-world probability of such attacks,” Ettlinger tells The Daily Swig.

Catch up with the latest hacking techniques news and analysis

He points out that unexpected behavior from auto-formatters might tip-off developers, and that “the Webstorm IDE can at least highlight the ‘invisible’ characters”.

He also recommends that using certain tools and conducting regular security code reviews throughout the development lifecycle can expose backdoor code.

Stick to ASCII

The researcher suggested developers protect their code from such attacks by proscribing the use of non-ASCII characters, which “are pretty rare in code” since development teams typically favor English language-based, ASCII characters.

“Translation into other languages is often done using dedicated files,” said Ettlinger. “When we review German language code, we mostly see non-ASCII characters being substituted with ASCII characters (e.g. ä → ae, ß → ss).”

Coincidentally, researchers from the University of Cambridge recently documented a similar attack centred on the Unicode bidirectional mechanism (Bidi), proposing restricted use of Bidi Unicode characters as mitigation.

YOU MIGHT ALSO LIKE Apache Storm maintainers patch two pre-auth RCE vulnerabilities