Skip to content
/ SOFT Public

[USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks

License

Notifications You must be signed in to change notification settings

KaiyuanZh/SOFT

Repository files navigation

SOFT

Python 3.10+ PyTorch License arXiv

This is the implementation of SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks (USENIX Security'25).

Table of Contents

Code Structure

mia_llms_benchmark/
├── README.md                       # This file
├── environment.yml                 # Conda environment specification
├── config_finetune.yaml            # Training configuration
├── config_auc_tpr.yaml             # Evaluation configuration
├── finetune.py                     # Main fine-tuning script
├── main.py                         # Evaluation script
├── utils.py                        # Utility functions
├── data/
│   ├── obfuscation.py              # Obfuscation implementations
│   └── prepare.py                  # Dataset loading and tokenization
├── attacks/                        # MIA attack implementations
│   ├── __init__.py                 
│   ├── loss.py                     
│   ├── ratio.py                    
│   ├── mink.py                     
│   ├── minkplusplus.py             
│   ├── zlib.py                     
│   ├── lowercase.py                
│   ├── recall.py                   
│   ├── conrecall.py                
│   ├── bag_of_words.py             
│   ├── ensemble_classifier.py      
│   └── utils.py                    # Attack utilities
└── output/                         # Evaluation results

Quick Start

1. Install Dependencies

# Create python environment
conda env create -f environment.yml
conda activate mia

2. Fine-tune Model with Defense

# Single GPU training
python finetune.py --config config_finetune.yaml --select_ratio X

# Multi-GPU training with DeepSpeed
deepspeed --num_gpus=8 finetune.py --config config_finetune.yaml --select_ratio X

3. Evaluate Privacy Protection

The metrics include AUC-ROC, [email protected], and [email protected].

python main.py \
    -c config_auc_tpr.yaml \
    --run-all \
    --output "./output/" \
    --target-model "checkpoints/Llama-3.2-X/epoch-X" \
    --dataset "arxiv" \
    --split "ngram_13_0.8"

Dataset Information

Original Dataset

  • Source: iamgroot42/mimir
  • Description: Curated subset of The Pile dataset with membership labels
  • Splits: Various n-gram and threshold combinations (e.g., ngram_13_0.8)
  • Domains: ArXiv papers, Wikipedia, GitHub code, PubMed, and more

Example of Obfuscated Dataset

Data Obfuscation

Generate Your Own Obfuscated Data

The data/obfuscation.py module provides tools to create obfuscated datasets:

# Set up environment variables
export OPENAI_API_KEY="your-api-key"
export HF_TOKEN="your-huggingface-token"

# Using OpenAI API for paraphrasing
python data/obfuscation.py

Obfuscation Prompts

The framework supports different prompts for various content types:

Text Paraphrasing Prompt:

message = [
    {"role": "system", "content": "You are a helpful text rewriting assistant."},
    {"role": "user", "content":
     f"Rewrite the following paragraph by replacing every word with an alternative term that does not share the same root or spelling. Preserve the same meaning and sentence structure as much as possible.\n\"\"\"\n{original_text}\n\"\"\""},
]

Code Obfuscation Prompt:

message = f"Rewrite the following code so it preserves the same functionality and flow, but changes all variable names, function names, and comments. Maintain the same input-output behavior. Keep it in the same programming language.\n\"\"\"\n{original_text}\n\"\"\""

Evaluation

Available Attack Methods

The framework implements 10+ state-of-the-art MIA attacks:

Attack Method Description Key Parameters
Loss Basic loss-based attack -
Zlib Compression-based attack -
Lowercase Case-sensitivity attack -
Min-K% Prob Minimum k-probability attack k
Min-K%++ Enhanced MinK with calibration k
Ratio Loss ratio with reference model reference_model_path
Bag of Words Feature-based ML attack -
ReCall Prefix-based recall attack n_shots, extra_non_member_dataset
CON-ReCall Conditional recall attack n_shots, extra_non_member_dataset
Ensemble Combined multiple attacks -
Custom Evaluation
# Evaluate specific attacks only
python main.py \
    -c config_auc_tpr.yaml \
    --attacks "loss,ratio,mink" \
    --target-model "path/to/model" \
    --dataset "arxiv" \
    --split "ngram_13_0.8"

Citation

If you use this framework in your research, please cite:

@inproceedings{zhang2025soft,
    title = {SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks},
    author = {Zhang, Kaiyuan and Cheng, Siyuan and Guo, Hanxi and Chen, Yuetian and Su, Zian and An, Shengwei and Du, Yuntao and Fleming, Charles and Kundu, Ashish and Zhang, Xiangyu and Li, Ninghui},
    booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
    year = {2025},
    address = {Seattle, WA},
    publisher = {USENIX Association}
}

Acknowledgments

  • Mimir Dataset for providing the evaluation benchmark
  • The Pile for the underlying text corpus
  • HuggingFace for the model and dataset hosting infrastructure

About

[USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages