This project implements a machine learning system for detecting anomalous (potentially illicit) transactions in the Bitcoin network. It utilizes the Elliptic Data Set, which contains features from Bitcoin transactions, some of which are labeled as licit or illicit.
- Features
- Requirements
- Installation
- Usage
- Project Structure
- Data Description
- Model
- Visualization
- Results
- Contributing
- License
- Data preprocessing and feature engineering
- Machine learning model for anomaly detection
- Visualization of transaction data and model results
- Performance evaluation metrics
- Python 3.7+
- pip (Python package installer)
-
Clone or download the repository to your local machine.
-
Navigate to the project directory:
cd path/to/bitcoin-anomaly-detection
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Ensure your data files (elliptic_txs_features.csv, elliptic_txs_classes.csv, elliptic_txs_edgelist.csv) are in the
data
directory. -
Run the main script:
python main.py
-
View the results in the console output and the generated visualization files.
main.py
: The entry point of the applicationconfig.py
: Configuration settings for the projectdata_loader.py
: Functions for loading and preprocessing datafeature_engineering.py
: Feature engineering and selectionmodel_training.py
: Machine learning model definition and traininganomaly_detector.py
: Functions for detecting anomalies in transactionsvisualization.py
: Functions for creating visualizationsutils.py
: Utility functions used across the projectrequirements.txt
: List of Python package dependencies
The Elliptic Data Set consists of:
- 203,769 node transactions
- 234,355 edges (flows between transactions)
- 166 features per transaction
- 49 time steps
Class distribution:
- Illicit: 2% (4,545 transactions)
- Licit: 21% (42,019 transactions)
- Unknown: 77% (157,205 transactions)
The project uses a Random Forest classifier for anomaly detection. The model is trained on labeled data and evaluated using metrics such as precision, recall, and F1-score.
The project generates several visualizations:
- t-SNE plot of transactions
- Feature importance bar chart
- Anomaly score distribution histogram
Results are output to the console and saved in final_results.csv
. This includes classification reports and performance metrics for the training, validation, and test sets.
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.
This project is distributed under the MIT License.
For more detailed information about the project architecture and specifications, please refer to the project_specification.md
file in the project root directory.