🦅 Webhawk 2.0

Machine Learning based web attacks detection.

About

Webhawk is an open source machine learning powered Web attack detection tool. It uses your web logs as training data. Webhawk offers a REST API that makes it easy to integrate within your SoC ecosystem. To train a detection model and use it as an extra security level in your organization, follow the following steps.

Setup

Using a Python virtual env

python -m venv webhawk_venv
source webhawk_venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Unsupervised detection Usage

Encode your http logs and save unsupervised detection results into a csv file

python encode.py -a -t apache -l ./SAMPLE_DATA/raw-http-logs-samples/aug_sep_oct_2021.log -d ./SAMPLE_DATA/labeled-encoded-data-samples/aug_sep_oct_2021.csv

Please note that two already encoded data files are available in ./SAMPLE_DATA/labeled-encoded-data-samples/, in case you would like to move directly to the next step.

Run the unsupervised detection script

Get inspired from this example:

python unsup_hawk.py -l ./SAMPLE_DATA/labeled-encoded-data-samples/aug_sep_oct_2021.csv -j 50000 -v -e 5000 -s 5

Create a settings.conf file

Copy settings_template.conf file to settings.conf and fill it with the required parameters as the following.

[MODEL]
model:MODELS/the_model_you_will_train.pkl
[FEATURES]
features:length,params_number,return_code,size,upper_cases,lower_cases,special_chars,url_depth

Supervised detection Usage

Encode your http logs and save supervised detection results into a csv file

python encode.py -a -l ./SAMPLE_DATA/raw-http-logs-samples/aug_sep_oct_2021.log -d ./SAMPLE_DATA/labeled-encoded-data-samples/aug_sep_oct_2021.csv

Please note that two already encoded data files are available in ./SAMPLE_DATA/labeled-encoded-data-samples/, in case you would like to move directly to the next step.

Train a model and test the prediction

Use the http log data from May to July 2021 to train a model, and test it with the data from August to October 2021.

python train.py -a 'dt' -t ./SAMPLE_DATA/labeled-encoded-data-samples/may_jun_jul_2021.csv -v ./SAMPLE_DATA/labeled-encoded-data-samples/aug_sep_oct_2021.csv

Make a prediction for a single log line

python predict.py -m 'MODELS/the_model_you_will_train.pkl' -l '198.72.227.213 - - [16/Dec/2018:00:39:22 -0800] "GET /self.logs/access.log.2016-07-20.gz HTTP/1.1" 404 340 "-" "python-requests/2.18.4"'

REST API

Launch the API server

In order to use the API to need first to launch it's server as the following

python -m uvicorn api:app --reload --host 0.0.0.0 --port 8000

Make a prediction request

You can use the following code which based on Python 'requests' (the same in test_api.py) to make a prediction using the REST API

import requests
import json
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json',
}
data = {
    'log_type':'apache',
    'http_log_line': '187.167.57.27 - - [15/Dec/2018:03:48:45 -0800] "GET /honeypot/Honeypot%20-%20Howto.pdf HTTP/1.1" 200 1279418 "http://www.secrepo.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/61.0.3163.128 Safari/534.24 XiaoMi/MiuiBrowser/9.6.0-Beta"'
}
response = requests.post('http://127.0.0.1:8000/predict', headers=headers, data=json.dumps(data))
print(response.text)

It will return the following:

{"prediction":"0","confidence":"0.9975490196078431","log_line":"187.167.57.27 - - [15/Dec/2018:03:48:45 -0800] \"GET /honeypot/Honeypot%20-%20Howto.pdf HTTP/1.1\" 200 1279418 \"http://www.secrepo.com/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/61.0.3163.128 Safari/534.24 XiaoMi/MiuiBrowser/9.6.0-Beta\""}

Using Docker

Launch the API server (with Docker)

To launch the prediction server using docker

docker compose build
docker compose up

Used sample data

The data you will find in SAMPLE_DATA folder comes from
https://www.secrepo.com.

Documentation

Details on how this tool is built could be found at
http://enigmater.blogspot.fr/2017/03/intrusion-detection-based-on-supervised.html

Todo

To extract/add more features (Eg: hour of the day, day of the week, week, month).
To find a better way to label training data
To add the possibility to use unsupervised learning.

Contribution

All feedbacks, testing and contribution are very welcome! If you would like to contribute, fork the project, add your contribution and make a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦅 Webhawk 2.0

About

Setup

Using a Python virtual env

Unsupervised detection Usage

Encode your http logs and save unsupervised detection results into a csv file

Run the unsupervised detection script

Create a settings.conf file

Supervised detection Usage

Encode your http logs and save supervised detection results into a csv file

Train a model and test the prediction

Make a prediction for a single log line

REST API

Launch the API server

Make a prediction request

Using Docker

Launch the API server (with Docker)

Used sample data

Documentation

Todo

Contribution

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
MODELS		MODELS
SAMPLE_DATA		SAMPLE_DATA
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENCE		LICENCE
README.md		README.md
api.py		api.py
docker-compose.yml		docker-compose.yml
encode.py		encode.py
helpers.py		helpers.py
predict.py		predict.py
requirements.txt		requirements.txt
settings_template.conf		settings_template.conf
test_api.py		test_api.py
train.py		train.py
unsup_hawk.py		unsup_hawk.py

License

powned/Intrusion-and-anomaly-detection-with-machine-learning

Folders and files

Latest commit

History

Repository files navigation

🦅 Webhawk 2.0

About

Setup

Using a Python virtual env

Unsupervised detection Usage

Encode your http logs and save unsupervised detection results into a csv file

Run the unsupervised detection script

Create a settings.conf file

Supervised detection Usage

Encode your http logs and save supervised detection results into a csv file

Train a model and test the prediction

Make a prediction for a single log line

REST API

Launch the API server

Make a prediction request

Using Docker

Launch the API server (with Docker)

Used sample data

Documentation

Todo

Contribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages