Skip to content

luka-group/RedCoder

Repository files navigation

REDCODER: Automated Multi-Turn Red Teaming for Code LLMs

Model on HuggingFace

Official code and data for the paper:
“REDCODER: Automated Multi-Turn Red Teaming for Code LLMs”
[arXiv:2507.22063]


🚀 Overview

Pipeline Overview

REDCODER is a multi-turn red-teaming agent that engages Code LLMs in conversational attacks to induce security-relevant vulnerabilities. It is built via a multi-agent gaming process that produces:

  • (1) Prototype adversarial conversations
  • (2) A strategy arsenal for retrieval-augmented attacks

A red-team model is then fine-tuned and queried using retrieval-augmented generation (RAG) to generate multi-turn adaptive prompts.

Key highlights:

  • Multi-turn attacks using learned strategy patterns
  • Outperforms previous attack baselines (e.g., 65.29% attack success on Qwen2.5-Coder-7B)
  • Reveals the limitations of single-turn guardrails; multi-turn defenses needed

🔧 Installation

Python: 3.9–3.11 recommended

git clone https://github.com/luka-group/RedCoder.git
cd RedCoder
pip install -r requirements.txt

If using API-based models (e.g., OpenAI), set your API keys (e.g., OPENAI_API_KEY).


⚙️ Quickstart

1) Run REDCODER Against a Victim Model

python redcoder.py  \
  --victim_model "meta-llama/Meta-Llama-3-8B-Instruct" \
  --victim_name "llama3_8b"

2) Run gaming process to collect your own prototype conversations.

python gaming_process.py

Model 🤖️ and 📦 Data

  • We release the REDCODER backbone model and relevant assets on Hugging Face 🤗: 🔗 jackysnake/RedCoder
  • gaming_cwe.txt — CWE vulnerability task prompts for prototype generation
  • eval_set.txt — CWE tasks for evaluating REDCODER performance
  • prototype_conversation.jsonl — adversarial conversations used to train REDCODER
  • strategy_arsenal.json — extracted tactics and prompt fragments for RAG-based prompting

📝 Citation

If you find this work useful, please cite:

@article{mo2025redcoder,
  title   = {REDCODER: Automated Multi-Turn Red Teaming for Code LLMs},
  author  = {Wenjie Jacky Mo and Qin Liu and Xiaofei Wen and Dongwon Jung and
             Hadi Askari and Wenxuan Zhou and Zhe Zhao and Muhao Chen},
  journal = {arXiv preprint arXiv:2507.22063},
  year    = {2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published