Academics use reinforcement learning to automate the SQLi-exploitation process

Machine learning offers fresh approach to tackling SQL injection vulnerabilities

UPDATED A new machine learning technique could make it easier for penetration testers to find SQL injection exploits in web applications.

Introduced in a recently published paper by researchers at the University of Oslo, the method uses reinforcement learning to automate the process of exploiting a known SQL injection vulnerability.

While the technique comes with quite a few caveats and assumptions, it provides a promising path toward developing machine learning models that can assist in penetration testing and security assessment tasks.

Reinforcement learning

Reinforcement learning is a branch of machine learning in which an AI model is given the possible actions and rewards of an environment and is left to find the best ways to apply those actions to maximize the reward.

“It's inevitable that AI and machine learning are also applied in offensive security,” Laszlo Erdodi, lead author of the paper and postdoctoral fellow at the department of informatics at the University of Oslo, told The Daily Swig.

“We decided to try machine learning for penetration testing and we found RL (reinforcement learning) to be a very promising approach. As expected it demonstrated that vulnerabilities to SQL injection can easily be exploited with RL.”


Catch up on the the latest SQL injection security news


SQL injection (SQLi) is a web security vulnerability that allows an attacker to interfere with the queries that an application makes to its database. Attackers use SQLi exploits to view data that they are not normally able to retrieve or to make modifications to the database.

For their proof of concept, the researchers posed the problem as a capture-the-flag competition in which the reinforcement learning agent must obtain a piece of information from a target website through an SQLi vulnerability. The agent’s possible actions are the queries it sends to the system, and the reward is the flag token it must retrieve.

wewDeep Q-learning has shown early promise in enhancing SQLi research

The researchers started by sending a lot of random queries and analyzing the rewards. Gradually, they developed a model in which they could stage a successful attack within an average of 4-5 queries.

Existing automated SQLi tools rely on static, predefined rules, Erdodi says, which can limit their application. “The big advantage of using RL for such problems is that the attack logic is not defined and not static.

“The agent only has an action set and it learns the optimal strategy through examples. In the beginning, the agent has to learn trivial things, but as the learning advances it can learn non-trivial or hidden characteristics of the SQL injection exploitation or consider additional characteristics for the exploitation such as manipulating the website content.”

‘Some limitations’

While the results of the research are seemingly impressive, the work is still in its preliminary stages and the researchers had to simplify the challenge to make it possible for the reinforcement learning agent to tackle it.

The challenge assumes a static environment that doesn’t change as the attacker sends queries. The agent also knows the SQL vulnerability and the target database schema beforehand and only needs to find the right query to exploit the flaw.

“The current solution has some limitations in terms of the assumptions we made,” Erdodi acknowledged. “On the other hand, these assumptions were only introduced to reduce the complexity of the problem for the first approach.

“Considering more and more options for the agent, it increases the action space and the state space.”

‘Promising results’

The researchers tested two variants of reinforcement learning on the problem.

First was Q-learning, a simple reinforcement learning algorithm that creates a table of different actions and reward values as the model explores the problem space. The model succeeded in finding a working solution but required several gigabytes of space to store the Q-table with all the possible actions and states.

The second technique was deep Q-learning, which combines Q-learning with deep neural networks. Deep Q-learning can tackle tasks with a more complicated set of states and actions.

In recent years, scientists have used deep Q-learning to create AI systems that can solve complex problems, such as mastering complicated games like Go and StarCraft 2.


Read more of the latest artificial intelligence security news


The deep Q-learning model reached a satisfactory performance level on the SQLi challenge much faster than the classic Q-learning agent. The final model was also much smaller, reaching no more than a few hundred kilobytes in size.

“Deep Q learning showed very promising results,” Erdodi says. “This approach can be used in the future for more realistic simulations, possibly without the limiting assumptions.

“We are currently working on these simulations, where the aim is to use RL in real-word pentesting.

“Our first approach was only a proof of concept. As the results are promising we think we are able to solve much more complex problems in the future.”

Other work in the field includes a machine learning model developed by the cybersecurity firm NCC Group.

NCC uses the support vector machine (SVM) machine learning algorithm to detect SQLi vulnerabilities on web pages based on responses received from servers.

Erdodi and his co-authors have submitted their paper for review to the 13th International Conference on Cyber Conflict (Cycon 2021). Their work is part of a wider research program at the University of Oslo that aims to develop intelligent agents for both automated pentesting and automated response.

“This perspective on the future of cybersecurity can be called ‘cyberwar of algorithms’, where both the cyber-attacks and the defenses become automated,” Erdodi says.


This article has been updated to reference other ML models being used to detect SQLi vulnerabilities.


RECOMMENDED ‘Triggerless’ backdoors can infect machine learning models without leaving a trace – research