Github Repository: https://github.com/yehezkiel1086/ml-web-honeypot
Demo video URL: Web Honeypot Attack Data Analysis demo video
Full Thesis Report: https://drive.google.com/file/d/10RNNbaG8djuI5Evz_un2xzh37tWXANFz/view?usp=sharing
Description
Project Overview: First I setup the Web Honeypot on Ubuntu VPS, then utilizes Machine Learning, Frequent Pattern Mining, as well as a statistical approach for honeypot log data analysis. The results of the data analysis will later be visualized in the form of a web-based dashboard with Metabase to make it easier for users to understand the outline and various methods and behaviors of attackers.
Type: Research project (Bachelors Thesis)
Technologies
- Honeypot: Ubuntu 22.04 VPS, Docker
- Analysis: Python
- Python Libraries: MlXtend, NumPy, Pandas, Matplotlib
- Security Scanners: OWASP Zap, Nikto, Nuclei
Challenge and Approaches
Challenges:
- Honeypots' log datas aren't information, they have to be analyzed first to make sense.
- Attackers behaviors trends aren't always obvious
Approach:
- Fixed the outdated Honeypot library, which is its Docker and Python scripts so the Honeypot could run.
- Setup the Ubuntu VPS using Docker Compose.
- Generated the training data using security scanners: OWASP Zap, Nikto, Nuclei
- Generated the training, validation and test datas with VPS using real attackers' traffic.
- Developed the analysis mechanism from statistical, frequent pattern to machine learning.
- Visualized the results with Metabase.
Poster

Results
Web Honeypot logs are running on the left, Nuclei is scanning on the right:

One of metabase visualization:


Statistical Analysis:

Attack Frequent Pattern Analysis:
