Skip to content

shawkui/BackdoorBenchER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BackdoorBenchER: Evaluating & Revisiting the Auxiliary Data in Backdoor Purification

English | 简体中文

Paper | Citation


📢 Announcements

Updated on 2025-04-02: Now, it supports two new clean-label attacks: COMBAT (AAAI 2024) and Narcissus (CCS 2023), and one new data-free defense: OTBR (AAAI 2025).

Updated on 2025-03-26: Based on latest feedback from reviewers, we have added support for the ImageNette dataset. ImageNette is a subset of the ImageNet dataset, featuring a significantly larger size compared to datasets in the CIFAR family, Tiny ImageNet, and GTSRB.

Updated on 2025-03-04: Now, it supports synthetic dataset based on MMGen. You can follow the guidance from MMGen to train or download the pretrained weight, to generate the synthetic dataset.

Updated on 2025-02-12: Initial release now available. It supports the evaluation of backdoor purification on auxiliary datasets categorized as Seen (Train), Reserved (Split), and OOD (Transformations & External from ImageNet).


📝 Introduction

Welcome to the official repository for the paper "Revisiting the Auxiliary Data in Backdoor Purification". This project aims to establish a framework for evaluating backdoor purification techniques under practical conditions using diverse auxiliary datasets, moving beyond the assumption of idealized, in-distribution data.


📊 Project Overview

Backdoor attacks exploit vulnerabilities during model training to induce specific behaviors when triggered. To counteract these threats, backdoor purification techniques are employed, often relying on a small clean dataset known as auxiliary data. Despite advancements, the impact of auxiliary data characteristics on purification efficacy remains understudied. This project investigates how different types of auxiliary datasets—ranging from in-distribution to synthetic or externally sourced—affect purification outcomes, providing insights crucial for selecting or constructing effective defense mechanisms.

Overview


🛠️ Getting Started

Follow these steps to set up the project:

  1. Clone Repository

    git clone https://github.com/shawkui/BackdoorBenchER.git
    cd BackdoorBenchER
  2. Install Dependencies

    bash sh/install.sh
  3. Initialize Folders

    bash sh/init_folders.sh

⚙️ Usage Instructions

🧪 Creating Auxiliary Datasets

For example, with CIFAR-10:

  1. Download the dataset into /data.
  2. Split it:
    python dataset/generate_split.py --dataset cifar10 --split_ratio 0.05 --random_seed 0
  3. Generate OOD auxiliary data:
    python dataset/generate_ood.py --dataset cifar10_split_5_seed_0 --ood_type brightness
  4. Create a CIFAR-10-like dataset from ImageNet:
    bash sh/cinic_download.sh
    python dataset/generate_cifar10_from_imagenet.py --dataset cifar10_split_5_seed_0 --ood_type imagenet

🛡️ Performing Attacks & Defenses

Simulate an attack:

python attack/badnet.py --save_folder_name badnet_demo --dataset cifar10_split_5_seed_0

Apply a defense:

python defense/ft.py --result_file badnet_demo --dataset cifar10_split_5_seed_0 --reserved_type reserved 

Customize configurations for all methods by editing sh/config_edit.py.

📄 Managing Results

All defense results are saved according to configurations specified in the --yaml_path argument.

For example,

python defense/ft.py --result_file badnet_demo --dataset cifar10_split_5_seed_0 --reserved_type reserved --yaml_path ./config/defense/ft/demo.yaml

will save the results in record/badnet_demo/defense/ft/demo/


📋 TODO

📅 Upcoming Features:

  1. Release Code for Generating Synthetic Data: We will soon provide code to generate synthetic auxiliary datasets, expanding the variety of datasets available for testing and evaluation.

  2. Release Dataset: In addition to the code, we plan to release a curated dataset specifically designed for backdoor purification research.

  3. Release Guided Input Calibration: We plan to release Guided Input Calibration, the first attempt to align auxiliary datasets with in-distribution datasets, facilitating more effective backdoor purification.

  4. More evaluation: Our long-term updates will include a broader evaluation framework, incorporating additional purification techniques, models, and tasks. Contributions from the research community are highly encouraged.

Stay tuned for updates!


📄 Citation

Please cite our work if used in your research:

@misc{wei2025revisitingauxiliarydatabackdoor,
      title={Revisiting the Auxiliary Data in Backdoor Purification}, 
      author={Shaokui Wei and Shanchao Yang and Jiayin Liu and Hongyuan Zha},
      year={2025},
      eprint={2502.07231},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2502.07231}, 
}

🎖️ Acknowledgments

Our work builds upon BackdoorBench. Consider giving them a star if their work is useful to you.

Our work is built upon previous works, including but not limited to:


📞 Contact

For inquiries or feedback, open an issue or email [email protected].

About

BackdoorBenchER: Evaluating & Revisiting the Auxiliary Data in Backdoor Purification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published