Since the release of Browser powered scanning back in Burp Suite Professional 2020.8.1 we have had a lot of customers asking us about our motivation for choosing to integrate with Chromium and for more information about how the feature works.
In this post I aim to drill into some of the technical challenges we faced when building this feature, share some details about how the integration works under the hood and highlight a few of the cool features we plan to build in the coming year.
Finding the attack surface
15 years ago, back when Netscape Navigator was still a thing, the web was largely synchronous. A single HTTP request would usually result in an HTML response containing all the outgoing links, and once the response was sent to the client it very rarely changed. With this model, finding and mapping the attack surface was relatively easy as each location in the application usually mapped to a single HTTP request and response. As the DOM was usually fully constructed on the server side, finding new locations to visit was as simple as scraping all the links from the DOM and adding them to a list to visit in the future.
Along comes XHR, AJAX & microservices
How does this shift impact writing a crawler?
Before browser powered scanning, web crawlers would generally just request the main document, parse the DOM, conclude that there are no links which could be followed, and treat this page as a dead-end. To correctly navigate around the simple example above several things need to happen. We must load the initial DOM, fetch the react.js script, execute the script, and wait for the new links to appear in the DOM.
What’s the fix?
The Chromium codebase evolves at an alarming rate, as a smaller independent firm there was no way we would be able to keep up to date with all the new features being added - even Microsoft has admitted defeat in the browser wars and their latest incarnation of the Edge browser is essentially just a skin on top of Chromium.
Burp Suite’s Chromium integration
Integrating Chromium with Burp Suite was far from a trivial task. Embedding the browser directly into Burp didn't work: any browser crash would have also crashed Burp. The integration made the build process unacceptably complex. Security was also an issue: Burp needs to be secure, and applying security patches in a timely manner was next to impossible.
To address these drawbacks we came up with a different approach. We decided that instead of trying to embed the browser directly within Burp we would attempt to control it remotely using the DevTools protocol, this is the same technology that is used within Chromium's Inspector tab and can be used to control almost every aspect of the browser.
This approach simplified our build processes significantly. We can now automatically trigger new builds of our embedded browser immediately as they are released and can usually get them updated within Burp Suite the same day. We also regularly build the beta and development versions of Chromium so that we can get an early warning should there be any breaking changes to the DevTools protocol.
Browser Powered Scanning
Now that we could easily keep up to date with the latest stable builds of Chromium, we set about integrating it within the crawler. The DevTools Protocol itself was relatively simple to set up and connect to. It's made up of JSON commands which can be sent over a WebSocket to the browser, and response events about what the browser is doing: all familiar technology.
Our integration in Burp Scanner looks similar to this:
- Once we have started an instance of Chromium and connected to DevTools via the WebSocket we instruct the browser to navigate to the target web application.
- Chromium constructs the HTTP request it would make to the target and sends that back over the WebSocket to Burp Suite.
- The request is sent to the target via Burp Suite’s regular network stack, respecting scope and throttling rules as normal.
- Burp Suite receives its response.
- And passes it back over the WebSocket to Chromium.
Steps 2 to 4 repeat for all the resources loaded on the page.
- Chromium tells us via the WebSocket that the page has fully loaded. (Roughly equivalent to the ‘onload’ DOM event).
- Chromium tells us via the WebSocket the DOM is ready. (Roughly equivalent to the DOMContentLoaded event).
- Once Burp Suite thinks that the DOM has settled down and is no longer mutating we ask via the WebSocket for a snapshot of the DOM. This is quite a simplification of what actually turned out to be a very challenging problem. I could probably fill an entire blog post talking about some of the issues we had here. As is common in software engineering, this was much more complex and took a lot longer than expected!
- Chromium returns the exact DOM which is currently being displayed in the browser to Burp.
Once we have a snapshot of the DOM, this is fed into the crawler and processed in the same way as a non browser-powered scan. When the crawler has decided the next link it wishes to follow the process repeats itself with one minor difference. Instead of instructing the browser to navigate to the next page (like typing a URL directly into the address bar) we use DevTools to simulate the user clicking on a link, this way we ensure that any onclick handlers are invoked before sending any requests to the target.
Now that we have got the basics of the integration working, we are all genuinely excited about the possibilities this has opened up. We have ambitious plans to vastly improve our crawl and audit coverage for modern single page applications and API endpoints throughout 2021 as highlighted in our public roadmap. We are also hoping to further leverage the power of our browser integration with some new manual tools as well as continuing to blog some more about Burp Scanner features.