Machine Learning, Frequent Pattern Mining and Statistical based Web Honeypot Attack Data Analysis

September 1, 2025

Github Repository: https://github.com/yehezkiel1086/ml-web-honeypot

Demo video URL: Web Honeypot Attack Data Analysis demo video

Full Thesis Report: https://drive.google.com/file/d/10RNNbaG8djuI5Evz_un2xzh37tWXANFz/view?usp=sharing

Description

Project Overview: First I setup the Web Honeypot on Ubuntu VPS, then utilizes Machine Learning, Frequent Pattern Mining, as well as a statistical approach for honeypot log data analysis. The results of the data analysis will later be visualized in the form of a web-based dashboard with Metabase to make it easier for users to understand the outline and various methods and behaviors of attackers.

Type: Research project (Bachelors Thesis)

Technologies

  • Honeypot: Ubuntu 22.04 VPS, Docker
  • Analysis: Python
  • Python Libraries: MlXtend, NumPy, Pandas, Matplotlib
  • Security Scanners: OWASP Zap, Nikto, Nuclei

Challenge and Approaches

Challenges:

  • Honeypots' log datas aren't information, they have to be analyzed first to make sense.
  • Attackers behaviors trends aren't always obvious

Approach:

  • Fixed the outdated Honeypot library, which is its Docker and Python scripts so the Honeypot could run.
  • Setup the Ubuntu VPS using Docker Compose.
  • Generated the training data using security scanners: OWASP Zap, Nikto, Nuclei
  • Generated the training, validation and test datas with VPS using real attackers' traffic.
  • Developed the analysis mechanism from statistical, frequent pattern to machine learning.
  • Visualized the results with Metabase.

Poster

ML Honeypot

Results

Web Honeypot logs are running on the left, Nuclei is scanning on the right: Honeypot Nuclei

One of metabase visualization: Honeypot Metabase

Honeypot Attack Patterns

Statistical Analysis: Statistical Analysis

Attack Frequent Pattern Analysis: Honeypot Attack Patterns