Document My Pentest: you hack, the AI writes it up!

Gareth Heyes

Researcher

Published: 23 April 2025 at 13:17 UTC
Updated: 24 April 2025 at 08:41 UTC

A person sat at a computer with Document My Pentest on the screen

Tired of repeating yourself? Automate your web security audit trail. In this post I'll introduce a new Burp AI extension that takes the boring bits out of your pen test.

Web security testing can be a grind: documenting every step, writing the same notes over and over, and repeating it all across every engagement. But what if your workflow could document itself - while you hacked?

Meet "Document My Pentest", your silent co-analyst for security testing. It’s an open-source Burp Suite extension that watches your requests in real time, understands what you’re probing for, and automatically builds a clean, structured record of your findings - capturing exactly what you did and how you did it. When you’re ready, hand it off to AI and generate a report. No more boring note taking. Just results.

The concept

The PortSwigger research team has been exploring new AI extensions using Burp AI features , and it's surprisingly quick to get a functional prototype up and running. Within just a few days, I had a working extension.

I quickly learned that the AI isn't very good at analysing a whole request and response, especially for vulnerabilities like XSS. It was good at spotting Path Traversal where a response gave a clear indication that it had worked because the directory listing was displayed.

With this in mind I began to come up with a strategy to identify reflected data. My first thought to accomplish this was to use canaries and look at where the canary is reflected but there are a couple of issues here: a) We'd need to send an extra request for every request and b) We'd have to alter the user sent request. Then I thought why don't we just use the tested input as the canary and translate it to a regular expression. It worked like this:

Consider the input <script>alert(1)</script> this can be transformed in a plethora of ways but often the alphanumeric characters will stay consistent, so I wrote an input to regex translator which transforms the input into:

.{1,6}script.{1,6}alert.{1,6}1.{1,6}.{1,6}.{1,6}script.{1,6}

This would mean it would match transformations like:

&lt;script&gt;alert(1)&lt;/script&gt;
%3Cscript%3Ealert(1)%3C%2Fscript%3E
%253Cscript%253Ealert(1)%253C%252Fscript%253E

This can give the AI the exact transformation and extract a more focussed part of the reflection enabling the AI to even quote what the input was transformed to without any specific instructions. After a lot of testing this seemed to work pretty well but we quickly found that it wasn't suitable for other attacks such as Request Smuggling. In this case where parameter/header modifications couldn't be detected we decided to send the whole request and response with a different AI prompt that produced much better results.

Whilst building this extension I often found the AI would misidentify vulnerabilities and this was due to the instructions given in the prompt. For example:

*Note* if HTML entities are found they very rarely indicate an XSS vulnerability.

The problem with this prompt is that it is uncertain to the AI if it's a vulnerability or not. My thinking was that you can use entities inside "srcdoc" attributes to cause XSS but this vague language causes the LLM to label vulnerabilities as potential XSS even when it's HTML encoded. The solution to this is to create more precise language in the prompt:

If the response reflection contains HTML-encoded input (e.g., &lt;script&gt;
), that is not a vulnerability."

You can even get the LLM to analyse its own response and tell you why it thinks there's a vulnerability when there clearly isn't. Here's the prompt I used:

Look at this LLM response and point out why it thinks there's XSS when there clearly isn't:

LLM RESPONSE GOES HERE

This returned detailed analysis of why the LLM was misidentifying the issue and suggested ways to improve it. Then I took the actual prompt and asked the LLM to improve it:

"How can I improve this prompt to prevent this kind of issue?"YOUR PROMPT GOES HERE

The LLM gave some very precise instructions on how to improve the prompt. This produced much better analysis and reduced false positives.

This whole process highlighted just how important careful prompt engineering is when working with LLMs for security analysis. The underlying model can be powerful, but without clear, unambiguous instructions and tightly scoped input, it's prone to hallucinations or overly cautious responses. By iterating on prompts, experimenting with input formatting, and tuning what data the model sees, we were able to push its capabilities to find a wide range of vulnerabilities. It’s not perfect, but with the right setup, it can meaningfully assist in vulnerability triage and even explain its reasoning in ways that help refine both the AI and the human using it.

Installation instructions

In Burp Suite Professional, go to Extensions → BApp store and search for "Document My Pentest". Click the install button and then navigate to the installed tab then select "Document My Pentest" and check the "Use AI" checkbox in the Extension tab.

How to use it

Just use Repeater like you normally would while testing a target. When you're ready to document your work, skip digging through Repeater history - simply right-click and select Extensions → Document My Pentest → Document my work. The AI will generate notes for you automatically.

You can also right click on the proxy history and document a pen test as separate requests or as a collection of requests and responses.

Right-click on a single or multiple proxy history items and select Extensions → Document My Pentest→ Document my work (separately). This will create notes on each request and response as a separate attack. Extensions → Document My Pentest → Document my work (as collection) will create a combined notes on all the requests and responses and put the notes into the last selected item. You can also configure Document My Pentest to automatically send notes to the Organizer as you hack the target by going to Document My Work->Settings->Auto invoke after Repeater requests and Document My Work->Settings->Auto send notes to Organizer.

Of course, AI isn't flawless - sometimes it gets things wrong. No problem: you can manually edit the notes and make corrections.

Feeling inspired? Try creating an AI-powered extension yourself using Burp's built-in Montoya API and its dedicated interfaces for handling traffic between your extension and PortSwigger's trusted AI platform.

AI security & privacy

We've updated our docs to reflect how we handle data sent to the AI please check out the detailed documentation and the blog post.

Pen Testing AI Burp Suite

Back to all articles

Document My Pentest: you hack, the AI writes it up!

Gareth Heyes

The concept

Installation instructions

How to use it

AI security & privacy

Related Research

Repeater Strike: manual testing, amplified

Shadow Repeater

Using form hijacking to bypass CSP

nOtWASP bottom 10: vulnerabilities that make you cry