Machine Learning, Frequent Pattern Mining and Statistical based Web Honeypot Attack Data Analysis

Github Repository: https://github.com/yehezkiel1086/ml-web-honeypot

Demo video URL: Web Honeypot Attack Data Analysis demo video

Full Thesis Report: https://drive.google.com/file/d/10RNNbaG8djuI5Evz_un2xzh37tWXANFz/view?usp=sharing

Description

Project Overview: First I setup the Web Honeypot on Ubuntu VPS, then utilizes Machine Learning, Frequent Pattern Mining, as well as a statistical approach for honeypot log data analysis. The results of the data analysis will later be visualized in the form of a web-based dashboard with Metabase to make it easier for users to understand the outline and various methods and behaviors of attackers.

Type: Research project (Bachelors Thesis)

Technologies

Honeypot: Ubuntu 22.04 VPS, Docker
Analysis: Python
Python Libraries: MlXtend, NumPy, Pandas, Matplotlib
Security Scanners: OWASP Zap, Nikto, Nuclei

Challenge and Approaches

Challenges:

Honeypots' log datas aren't information, they have to be analyzed first to make sense.
Attackers behaviors trends aren't always obvious

Approach:

Fixed the outdated Honeypot library, which is its Docker and Python scripts so the Honeypot could run.
Setup the Ubuntu VPS using Docker Compose.
Generated the training data using security scanners: OWASP Zap, Nikto, Nuclei
Generated the training, validation and test datas with VPS using real attackers' traffic.
Developed the analysis mechanism from statistical, frequent pattern to machine learning.
Visualized the results with Metabase.

Poster

ML Honeypot

Results

Web Honeypot logs are running on the left, Nuclei is scanning on the right: Honeypot Nuclei

One of metabase visualization: Honeypot Metabase

Honeypot Attack Patterns

Statistical Analysis:

Attack Frequent Pattern Analysis: Honeypot Attack Patterns