ZeroDay.Tools - Gen AI Hardening & Attack Suite

ZeroDay.Tools - Gen AI Hardening & Attack Suite

Note: For per-integration logging & monitoring, see LatentSpace.Tools.

This repository provides an up-to-date AI/ML Hardening Framework and a Multimodal Attack Suite for Generative AI. It is built around the security notions of a Kill Chain x Defense Plan, primarily focusing on Gen AI, with illustrative examples from Discriminative ML and Deep Reinforcement Learning. This work is predicated on:

The universal and transferable nature of attacks against Auto-Regressive models.
The conserved efficiency of text-based attack modalities (see: Figure 3) even for multimodal models.
The non-trivial nature of hardening GenAI systems.

🛡️ AI Security Framework: Kill Chain & Defense

Our approach to AI security is systematically structured around understanding, identifying, and mitigating threats across a defined AI Kill Chain. This framework enables a robust defense plan for Generative AI, Discriminative ML, and Deep Reinforcement Learning systems.

Vulnerability Visualization Example: Pre-Processed Optimization Attack

This GIF demonstrates an attack utilizing per-model templates to generate adversarial strings. It employs Greedy Coordinate Gradient optimization of target input/outputs, achieving results in minutes on consumer hardware when starting from a template.

📋 Core Components: AI/ML Hardening Checklist

The following checklist summarizes key exposures and core dependencies for each step in the AI kill chain. For detailed takeaways, mitigation strategies, and in-line citations, please refer to the links provided, which lead to the "Detailed Vulnerability Remediation & Mitigation Strategies" section.

Download the Observability Powerpoint for additional context on monitoring and defense.

🚨 Gen AI Vulnerabilities x Exposures (Click to Expand)

Kill Chain Step 1) Optimization-Free Attacks

Key Exposure: Brand Reputation Damage & Performance Degradation Dependency: Requires specific API fields; no pre-processing

Kill Chain Step 2) System Context Extraction

Key Exposure: Documentation & Distribution of System Vulnerabilities; Non-Compliance with AI Governance Standards Dependency: Requires API Access over time; ‘time-based blind SQL injection’ for Multimodal Models

Kill Chain Step 3) Model Context Extraction

Key Exposure: Documentation & Distribution of Model-Specific Vulnerabilities Dependency: API Access for context window retrieval; VectorDB Access for decoding embeddings

Kill Chain Step 4) Pre-Processed Attacks

Key Exposure: Data Loss via Exploitation of Distributed Systems Dependency: Whitebox Attacks require a localized target of either Language Models or Mutlimodal Models; multiple frameworks (e.g. SGA, VLAttack, etc) also designed to enable Transferable Multimodal Blackbox Attacks and evade 'Guard Models'

Kill Chain Step 5) Training Data Extraction

Key Exposure: Legal Liability from Data Licensure Breaches; Non-Compliance with AI Governance Standards Dependency: Requires API Access over time; ‘rules’ defeated via prior system and model context extraction paired with optimized attacks

Kill Chain Step 6) Model Data Extraction

Key Exposure: IP Loss, Brand Reputational Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems” Dependency: System Access to GPU; net-new threat vector with myriad vulnerable platforms

Kill Chain Step 7) Supply Chain & Data Poisoning

Key Exposure: Brand Reputation Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems” Dependency: Target use of compromised data & models; integration of those vulnerabilities with CI/CD systems

Team Debrief re: Model-Specific Vulnerabilities

Key Exposure: Documentation & Distribution of System Vulnerabilities; Brand Reputation Damage & Performance Degradation Dependency: Lack of Active Assessment of Sensitive or External Systems

🛠️ Detailed Vulnerability Remediation & Mitigation Strategies

This section provides in-depth information on the dependencies, key exposures, and mitigation takeaways for each vulnerability outlined in the checklist.

Optimization-Free Attack Details

Dependency: Requires specific API fields; no pre-processing.
Key Exposure: Brand Reputation Damage & Performance Degradation.
Takeaway: Mitigate low-complexity priming attacks via evaluation of input/output embeddings against moving windows of time, as well as limits on what data is available via API (e.g., Next-Token Probabilities aka Logits). This also mitigates DDoS attacks and indicates instances of poor generalization.

System Context Extraction Details

Dependency: Requires API Access over time; ‘time-based blind SQL injection’ for Multimodal Models.
Key Exposure: Documentation & Distribution of System Vulnerabilities; Non-Compliance with AI Governance Standards.
Takeaway: Mitigate retrieval of information about the system and application controls from Time-Based Blind Injection Attacks via Application-Specific Firewalls and Error Handling Best-Practices. Augment detection for sensitive systems by evaluating conformity of inputs/outputs against pre-embedded attack strings, and flagging long-running sessions for review.

Model Context Extraction Details

Dependency: API Access for context window; Access to Embeddings for Decoding (e.g., VectorDB).
Key Exposure: Documentation & Distribution of Model Vulnerabilities & Data Access.
Takeaway: Reduce the risk from discoverable rules, extractable context (e.g., persistent attached document-based systems context), etc., via pre-defined rules. Prevent decodable embeddings (e.g., additional underlying data via VectorDB & Backups) by adding appropriate levels of noise or using customized embedding models for sensitive data.

Pre-Processed Attack Details

Dependency: Whitebox Attacks require a localized target; multiple frameworks (e.g., SGA, VLAttack, etc.) support Transferable Multimodal Blackbox Attacks and evade 'Guard Models'.
Key Exposure: Data Loss via Exploitation of Distributed Systems.
Takeaway: Defeat pre-processed optimization attacks by pre-defining embeddings for 'good' and 'bad' examples, logging, clustering, and flagging of non-conforming entries pre-output generation, as well as utilizing windowed evaluation of input/output embeddings against application-specific baselines.

Training Data Extraction Details

Dependency: Requires API Access over time; ‘rules’ defeated via prior system and model context extraction paired with optimized attacks.
Key Exposure: Legal Liability from Data Licensure Breaches; Non-Compliance with AI Governance Standards.
Takeaway: Prevent disclosure of underlying data while mitigating membership or attribute inference attacks with pre-defined context rules (e.g., “no repetition”), whitelisting & monitoring of allowed topics, as well as DLP paired with active statistical monitoring via pre/post-processing of inputs/outputs.

Model Data Extraction Details

Dependency: System Access to GPU; net-new threat vector with myriad vulnerable platforms.
Key Exposure: IP Loss, Brand Reputational Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems”.
Takeaway: Multiple Open-Source Attack frameworks are exploiting a previously underutilized data exfiltration vector in the form of GPU VRAM, which has traditionally been a shared resource without active monitoring. Secure virtualization and segmentation tooling exists for GPUs, but mitigating this vulnerability is an active area of research.

Supply Chain & Data Poisoning Details

Dependency: Target use of compromised data & models; integration of those vulnerabilities with CI/CD systems.
Key Exposure: Brand Reputation Damage & Performance Degradation; Non-Compliance with AI Governance Standards, especially for “high-risk systems”.
Takeaway: Mitigate Supply Chain & Data Poisoning attacks via use of Open-Source Foundation Models and Open-Source Data wherein Data Provenance/Lineage can be established, versions can be hashed, etc. Thereafter, affect access and version control of fine-tuning data, contextual data (i.e., augmented generation), etc.

Model Specific Vulnerability Details

Dependency: Lack of Active Assessment of Sensitive or External Systems.
Key Exposure: Documentation & Distribution of System Vulnerabilities; Brand Reputation Damage & Performance Degradation.
Takeaway: Utilize a Defense in Depth approach (e.g., Purple Teaming), especially for Auto Regressive Models, while staying up to date on the latest attack & defense paradigms. Utilize open-source code-generation and vulnerability assessment frameworks, contribute to the community, etc.

⚙️ Practical Applications & Use Cases

This framework and the accompanying attack suite can be utilized for:

Manipulation of AI Systems: Targeting Self-Supervised Systems, AI Assistants, Agentic Frameworks, and connected tools/plugins. This is achieved via direct or indirect injection of adversarial strings optimized to make Models designed to call external functions or access tooling frameworks return specific arguments.
- Example Impact: Unauthorized IAM Actions, Internal Database Access, aiding in privilege escalation.
Inference Attack Definition: Defining Membership & Attribute Inference Attacks for open-source, semi-closed, and closed-source models. This involves targeting behavior that elicits high-precision recall of underlying training data.
- Example Application: Validation of GDPR-compliant data deletion (alongside layer validation), Red/Blue Teaming of LLM Architectures & Monitoring.

🔬 Expanding Scope: Traditional ML & Reinforcement Learning

While the primary focus is Generative AI, these security principles and vulnerabilities also extend to other AI paradigms.

🔍 Examples of Traditional ML and Deep/Reinforcement Learning Vulnerabilities (Click to Expand)

Reinforcement Learning - Invisible Blackbox Perturbations Compound Over Time

Key Exposure: System-Specific Vulnerability & Performance Degradation.
Dependency: Lack of Actively Monitored & Versioned RL Policies.
Takeaway: Mitigate the compounding nature of poorly aligned & incentivized reward functions and resultant RL policies by actively logging, monitoring & alerting such that divergent policies are identified. While adversarial training increases robustness, these systems remain susceptible to attack.

Discriminative Machine Learning - Probe for Pipeline & Package Dependencies

Dependency: Requires Out-Of-Date Vulnerability Definitions and/or lack of image scanning when deploying previous builds.
Key Exposure: Brand Reputation Damage & Performance Degradation.
Takeaway: Mitigate commonly exploited repos and analytics packages by establishing best-practices with respect to vulnerability management, repackaging, and image scanning.

🚀 Getting Started & Key Resources

Explore the Attack Suite: Launch the Colab Notebook to see various attacks in action.
Review the Hardening Checklist: Familiarize yourself with the Gen AI Vulnerabilities x Exposures to understand potential risks.
Dive Deeper into Remediation: Use the Detailed Vulnerability Remediation & Mitigation Strategies section for specific guidance.
Understand Observability: Download the Observability Powerpoint for broader context on AI system monitoring.
Integrate with Monitoring Tools: For advanced per-integration logging & monitoring solutions, refer to LatentSpace.Tools.

📈 Key Benefits of This Framework

Understanding and addressing the vulnerabilities outlined in this repository provides significant advantages:

🛡️ Enhanced Security Posture: Proactively identify and mitigate a wide range of AI-specific threats.
📉 Reduced Risk Exposure: Minimize potential brand reputation damage, data loss, intellectual property theft, and performance degradation.
⚖️ Improved Compliance & Governance: Better align with AI governance standards and legal requirements (e.g., GDPR, regulations for “high-risk AI systems”).
💡 Informed Defense Strategies: Develop more robust and effective defense mechanisms based on a clear understanding of evolving attack vectors.
🤝 Community Engagement & Knowledge: Stay updated with the latest attack and defense paradigms and contribute to a safer AI ecosystem.

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
ABxJudge.mp4		ABxJudge.mp4
ABxJudge.py		ABxJudge.py
AttackEfficiency.mp4		AttackEfficiency.mp4
C4 - Context.jpg		C4 - Context.jpg
GraySwanAI_nanoGCG_ A fast + lightweight implementation of the GCG algorithm in PyTorch.pdf		GraySwanAI_nanoGCG_ A fast + lightweight implementation of the GCG algorithm in PyTorch.pdf
LICENSE		LICENSE
Observability.pptx		Observability.pptx
README.md		README.md
ZeroDayTools.ipynb		ZeroDayTools.ipynb
image.png		image.png
prompt		prompt
prompt_target_pairs.csv		prompt_target_pairs.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZeroDay.Tools - Gen AI Hardening & Attack Suite

🛡️ AI Security Framework: Kill Chain & Defense

Vulnerability Visualization Example: Pre-Processed Optimization Attack

📋 Core Components: AI/ML Hardening Checklist

Kill Chain Step 1) Optimization-Free Attacks

Kill Chain Step 2) System Context Extraction

Kill Chain Step 3) Model Context Extraction

Kill Chain Step 4) Pre-Processed Attacks

Kill Chain Step 5) Training Data Extraction

Kill Chain Step 6) Model Data Extraction

Kill Chain Step 7) Supply Chain & Data Poisoning

Team Debrief re: Model-Specific Vulnerabilities

🛠️ Detailed Vulnerability Remediation & Mitigation Strategies

Optimization-Free Attack Details

System Context Extraction Details

Model Context Extraction Details

Pre-Processed Attack Details

Training Data Extraction Details

Model Data Extraction Details

Supply Chain & Data Poisoning Details

Model Specific Vulnerability Details

⚙️ Practical Applications & Use Cases

🔬 Expanding Scope: Traditional ML & Reinforcement Learning

Reinforcement Learning - Invisible Blackbox Perturbations Compound Over Time

Discriminative Machine Learning - Probe for Pipeline & Package Dependencies

🚀 Getting Started & Key Resources

📈 Key Benefits of This Framework

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

rabbidave/ZeroDay.Tools

Folders and files

Latest commit

History

Repository files navigation

ZeroDay.Tools - Gen AI Hardening & Attack Suite

🛡️ AI Security Framework: Kill Chain & Defense

Vulnerability Visualization Example: Pre-Processed Optimization Attack

📋 Core Components: AI/ML Hardening Checklist

🛠️ Detailed Vulnerability Remediation & Mitigation Strategies

Optimization-Free Attack Details

System Context Extraction Details

Model Context Extraction Details

Pre-Processed Attack Details

Training Data Extraction Details

Model Data Extraction Details

Supply Chain & Data Poisoning Details

Model Specific Vulnerability Details

⚙️ Practical Applications & Use Cases

🔬 Expanding Scope: Traditional ML & Reinforcement Learning

Reinforcement Learning - Invisible Blackbox Perturbations Compound Over Time

Discriminative Machine Learning - Probe for Pipeline & Package Dependencies

🚀 Getting Started & Key Resources

📈 Key Benefits of This Framework

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages