detect unusual card transactions that may be fraud or money laundering.
fraud is very rare (appr 0.17% of transactions).
goal was to test anomaly detection methods and compare them.
dataset: credit card fraud detection (europe, 2013).
284,807 transactions, 492 frauds.
features: pca components v1–v28, plus time, amount.
label: class (0 normal, 1 fraud).
- z-score baseline
- isolation forest
- dbscan
all methods trained unsupervised, labels used only for evaluation.
summary table:
method | precision | recall | f1 |
---|---|---|---|
zscore | 0.01 | 0.90 | 0.02 |
isolation forest | 0.15 | 0.44 | 0.23 |
dbscan | 0.00 | 0.24 | 0.01 |
chart of precision vs recall is in /docs/precision_recall_bar.png
.
conclusion:
- zscore finds almost all fraud but too many false alerts.
- isolation forest is more balanced.
- dbscan performs poorly here.
- fraud is highly imbalanced, so accuracy is not useful.
- precision vs recall trade-off → false positives waste time, false negatives miss fraud.
- why anomaly detection? good when labels are scarce.
- why isolation forest worked best here.
- real world: need feature engineering (velocity, geography, peer comparison).
- real world: alerts must be manageable volume for investigators.
- clone repo
- install requirements
pip install -r requirements.txt