This is the implementation of SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks (USENIX Security'25).
mia_llms_benchmark/
├── README.md # This file
├── environment.yml # Conda environment specification
├── config_finetune.yaml # Training configuration
├── config_auc_tpr.yaml # Evaluation configuration
├── finetune.py # Main fine-tuning script
├── main.py # Evaluation script
├── utils.py # Utility functions
├── data/
│ ├── obfuscation.py # Obfuscation implementations
│ └── prepare.py # Dataset loading and tokenization
├── attacks/ # MIA attack implementations
│ ├── __init__.py
│ ├── loss.py
│ ├── ratio.py
│ ├── mink.py
│ ├── minkplusplus.py
│ ├── zlib.py
│ ├── lowercase.py
│ ├── recall.py
│ ├── conrecall.py
│ ├── bag_of_words.py
│ ├── ensemble_classifier.py
│ └── utils.py # Attack utilities
└── output/ # Evaluation results
# Create python environment
conda env create -f environment.yml
conda activate mia
# Single GPU training
python finetune.py --config config_finetune.yaml --select_ratio X
# Multi-GPU training with DeepSpeed
deepspeed --num_gpus=8 finetune.py --config config_finetune.yaml --select_ratio X
The metrics include AUC-ROC, [email protected], and [email protected].
python main.py \
-c config_auc_tpr.yaml \
--run-all \
--output "./output/" \
--target-model "checkpoints/Llama-3.2-X/epoch-X" \
--dataset "arxiv" \
--split "ngram_13_0.8"
- Source: iamgroot42/mimir
- Description: Curated subset of The Pile dataset with membership labels
- Splits: Various n-gram and threshold combinations (e.g.,
ngram_13_0.8
) - Domains: ArXiv papers, Wikipedia, GitHub code, PubMed, and more
- Source: LLM-MIA/editing-syn-pr0.5-mimir-arxiv-ngram_13_0.8
- Description: Paraphrased version of the ArXiv subset using advanced text transformation
- Usage: Ready-to-use obfuscated data for immediate training
The data/obfuscation.py
module provides tools to create obfuscated datasets:
# Set up environment variables
export OPENAI_API_KEY="your-api-key"
export HF_TOKEN="your-huggingface-token"
# Using OpenAI API for paraphrasing
python data/obfuscation.py
The framework supports different prompts for various content types:
Text Paraphrasing Prompt:
message = [
{"role": "system", "content": "You are a helpful text rewriting assistant."},
{"role": "user", "content":
f"Rewrite the following paragraph by replacing every word with an alternative term that does not share the same root or spelling. Preserve the same meaning and sentence structure as much as possible.\n\"\"\"\n{original_text}\n\"\"\""},
]
Code Obfuscation Prompt:
message = f"Rewrite the following code so it preserves the same functionality and flow, but changes all variable names, function names, and comments. Maintain the same input-output behavior. Keep it in the same programming language.\n\"\"\"\n{original_text}\n\"\"\""
The framework implements 10+ state-of-the-art MIA attacks:
Attack Method | Description | Key Parameters |
---|---|---|
Loss | Basic loss-based attack | - |
Zlib | Compression-based attack | - |
Lowercase | Case-sensitivity attack | - |
Min-K% Prob | Minimum k-probability attack | k |
Min-K%++ | Enhanced MinK with calibration | k |
Ratio | Loss ratio with reference model | reference_model_path |
Bag of Words | Feature-based ML attack | - |
ReCall | Prefix-based recall attack | n_shots , extra_non_member_dataset |
CON-ReCall | Conditional recall attack | n_shots , extra_non_member_dataset |
Ensemble | Combined multiple attacks | - |
# Evaluate specific attacks only
python main.py \
-c config_auc_tpr.yaml \
--attacks "loss,ratio,mink" \
--target-model "path/to/model" \
--dataset "arxiv" \
--split "ngram_13_0.8"
If you use this framework in your research, please cite:
@inproceedings{zhang2025soft,
title = {SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks},
author = {Zhang, Kaiyuan and Cheng, Siyuan and Guo, Hanxi and Chen, Yuetian and Su, Zian and An, Shengwei and Du, Yuntao and Fleming, Charles and Kundu, Ashish and Zhang, Xiangyu and Li, Ninghui},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
address = {Seattle, WA},
publisher = {USENIX Association}
}
- Mimir Dataset for providing the evaluation benchmark
- The Pile for the underlying text corpus
- HuggingFace for the model and dataset hosting infrastructure