Researchers showcase new method for improving the detection of fake websites

Machine learning technique detects phishing sites based on markup visualization

Machine learning models trained on the visual representation of website code can help improve the accuracy and speed of detecting phishing websites.

This is according to a paper (PDF) by security researchers at the University of Plymouth and the University of Portsmouth, UK.

The researchers aim to address the shortcomings of existing detection methods, which are either too slow or not accurate enough.

Turning web code into images

The technique developed by the researchers uses “binary visualization” libraries to transform the markup and code of web pages into images.

Using this method, they created a dataset of legitimate and phishing images of websites.

Visual differences between the legitimate PayPal login page and a phishing equivalent

The dataset was then used to train a machine learning model to classify legitimate and phishing websites based on the differences in their binary visualization.

To test a new website, the target webpage’s code is transformed through binary visualization and run through the trained model.

Read more of the latest phishing news

To speed up the model’s performance, the researchers used MobileNet, a neural network that has been optimized to run on resource-constrained devices as opposed to cloud servers.

The system also gradually builds up a database of legitimate and phishing websites to avoid excessive and unnecessary inferences.

Overview of the proposed approach

Accurate detection of phishing websites

According to the researchers’ experiments, the model reached 94% accuracy in detecting phishing websites. And since it uses a very small neural network, it can run on user devices and provide near-real-time results.

“We have tested the technique with actual phishing and legit sites,” Stavros Shiaeles, one of the paper’s co-authors, told The Daily Swig.

This is not the first time that binary visualization and machine learning has been used in cybersecurity. In 2019, Shiaeles, who is a cybersecurity lecturer at the University of Portsmouth, was among the co-authors of another technique that used ML and binary visualization to detect malware with promising results.

After testing the phishing website detection system, the team is now taking the next step to make the technique ready for adoption.

“We are working on a new extended method and we are trying to apply for a patent,” Shiaeles said. “Based on the results we initially have I don't see the point not to be adopted. The accuracy is 100%.”

YOU MAY LIKE Deserialization bug in TensorFlow machine learning framework allowed arbitrary code execution