Update web scraping tool now
A security researcher has discovered a vulnerability in web scraping tool Scrapy that takes advantage of the telnet service to access the local network and localhost.
Clamp-audio Salazar, CEO of Alertot, says the issue affects Scrapy <1.5.2, introduced in January this year.
It includes a telnet service, enabled by default, that is designed to make debugging easier.
As such, telnet services aren’t restricted to a set of functions; instead, the telnet console provides a python shell in the context of the spider.
This, says Salazar, “makes it powerful for debugging and interesting if someone gets access to it”.
Because the console is available without any authentication, it’s open to any local user. And if there’s a spider running, it’s possible to define a reverse shell to gain access to the machine.
More worryingly, though, the vulnerability can also be exploited remotely, thanks to the allowed_domains that the spider is allowed to crawl.
“An interesting behavior happens when there’s a request to a page in an allowed domain but redirects to a not allowed domain, since it won’t be filtered and will be processed by the spider,” Salazar explains in a blog post.
“Abusing [the] allowed_domains behavior, the malicious actor could [make it so] that the spider sends requests to domains of its interest."
Salazar has dubbed the potential attack ‘Spider Side Request Forgery’ – Server Side Request Forgery.
“It’s quite significant, because Scrapy spiders are client-side software and they are supposed to extract data from websites, but not the other way around. This raises new concerns about running this type of software and how to do it safely,” he tells The Daily Swig.
“There are web scraping projects in other programming languages, and they could have similar issues that compromise the user running the software. That’s part of our research this year ¬– this is just the beginning.”
Salazar advises updating to Scrapy 1.5.2 or higher, and disabling the telnet service unless it’s truly necessary.
“As a recommendation, you should run spiders in containers to avoid being affected by a malicious website,” he adds.
Mitigating risk
Ed Williams, director EMEA for SpiderLabs at Trustwave says he’s unsurprised by the news.
“The advice is always to reduce the attack surface and ensure that you’re running the latest, stable release of software,” he told The Daily Swig.
“What is worrying is that we’re still seeing Telnet being used – as a community we should be in a position where any mention of ‘dear old Telnet’ is instantly understood as bad practice and removed.
“Sadly, I don’t believe we are there yet, and we continue to see security issues abound with its use.”
Jarno Niemela, principal researcher at F-Secure Labs, offered some advice on how to protect against web spider vulnerabilities.
He told The Daily Swig: “Being able to take over a web spider component is quite significant, especially as many companies have basically no internal security in their server backend.
“What needs to be done is the same that should always be done when working with a component that handles external content. Isolate both inside and outside the box.
“Any component that processes internal code should run in a security container. For example, a well configured Docker container or SE-linux hardened VM.
“Also the server or VM that does input fetching and processing should be very restricted on what connections it can do and where it can connect to.”