Machine learning technique detects phishing sites based on markup visualization

Researchers showcase new method for improving the detection of fake websites

Machine learning models trained on the visual representation of website code can help improve the accuracy and speed of detecting phishing websites.

This is according to a paper (PDF) by security researchers at the University of Plymouth and the University of Portsmouth, UK.

The researchers aim to address the shortcomings of existing detection methods, which are either too slow or not accurate enough.

Turning web code into images

The technique developed by the researchers uses “binary visualization” libraries to transform the markup and code of web pages into images.

Using this method, they created a dataset of legitimate and phishing images of websites.

Image: Barlow, et al // 'A Novel Approach to Detect Phishing Attacks using Binary Visualisation and Machine Learning'
Visual differences between the legitimate PayPal login page and a phishing equivalent

The dataset was then used to train a machine learning model to classify legitimate and phishing websites based on the differences in their binary visualization.

To test a new website, the target webpage’s code is transformed through binary visualization and run through the trained model.

Accurate detection of phishing websites

According to the researchers’ experiments, the model reached 94% accuracy in detecting phishing websites. And since it uses a very small neural network, it can run on user devices and provide near-real-time results.

“We have tested the technique with actual phishing and legit sites,” Stavros Shiaeles, one of the paper’s co-authors, told The Daily Swig.

This is not the first time that binary visualization and machine learning has been used in cybersecurity. In 2019, Shiaeles, who is a cybersecurity lecturer at the University of Portsmouth, was among the co-authors of another technique that used ML and binary visualization to detect malware with promising results.

After testing the phishing website detection system, the team is now taking the next step to make the technique ready for adoption.

“We are working on a new extended method and we are trying to apply for a patent,” Shiaeles said. “Based on the results we initially have I don't see the point not to be adopted. The accuracy is 100%.”

Machine learning technique detects phishing sites based on markup visualization

Turning web code into images

Accurate detection of phishing websites

We’re going teetotal – It’s goodbye to The Daily Swig

Bug Bounty Radar

Indian gov flaws allowed creation of counterfeit driving licenses