Skip to content

fix(proxy): fix GCP Model Armor guardrail detection and circular reference issue #12991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

colesmcintosh
Copy link
Collaborator

Title

fix(proxy): fix GCP Model Armor guardrail detection and circular reference issue

Relevant issues

Fixes #12818

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

This PR fixes issue #12818 where GCP Model Armor was always returning "Success" even when it should block harmful content (like bomb-making instructions). The logs showed "standard_logging_guardrail_information": "CircularReference Detected".

Root Cause

  1. The Model Armor implementation was checking for incorrect fields in the API response (blocked, action) that don't exist in the actual GCP Model Armor API
  2. The guardrail logging was creating circular references by storing the entire request data dict

Changes Made

  1. Fixed _should_block_content to check the correct API response fields:

    • Now checks filterMatchState and individual filter results instead of non-existent fields
    • Properly detects when content should be blocked based on RAI filters, prompt injection detection, etc.
  2. Fixed _get_sanitized_content to extract sanitized text from the correct response location

  3. Added _process_response override to prevent circular references:

    • Stores only the Model Armor API response instead of the entire data dict
    • Prevents the "CircularReference Detected" error in logging
  4. Updated all test cases to use the actual GCP Model Armor API response format

  5. Added specific tests:

    • test_model_armor_bomb_content_blocked: Tests that harmful content is correctly blocked
    • test_model_armor_no_circular_reference_in_logging: Verifies no circular references in logging
    • test_model_armor_success_case_serializable: Ensures success cases are properly serializable

Testing

All Model Armor tests pass:

poetry run pytest tests/test_litellm/proxy/guardrails/guardrail_hooks/test_model_armor.py -v
============================== 24 passed in 1.70s ==============================

The fix ensures that harmful content like bomb-making instructions will be correctly detected and blocked by Model Armor, and that the guardrail information is properly logged without circular references.

…rence issue

Fixes BerriAI#12818 where GCP Model Armor was always returning success even for harmful content.

Changes:
- Fix _should_block_content to check correct API response fields (filterMatchState instead of non-existent blocked/action fields)
- Fix _get_sanitized_content to extract sanitized text from correct response location
- Override _process_response to store only Model Armor API response, preventing circular references in logging
- Update test cases to use actual GCP Model Armor API response format
- Add specific tests for harmful content detection and circular reference prevention

The issue was that the implementation was checking for fields that don't exist in the actual GCP Model Armor API response, causing harmful content to never be detected as blocked.
Copy link

vercel bot commented Jul 25, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 26, 2025 10:13pm

@colesmcintosh colesmcintosh marked this pull request as ready for review July 25, 2025 23:33
@colesmcintosh
Copy link
Collaborator Author

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot free trial expires on July 29, 2025
Learn more in the Cursor dashboard.

…date logging mechanism

- Updated `guardrail_status` to support a new "blocked" state in `CustomGuardrail` and `StandardLoggingGuardrailInformation`.
- Modified `ModelArmorGuardrail` to store the Model Armor response and status in request metadata, preventing race conditions and ensuring accurate logging for concurrent requests.
- Enhanced the logic for determining the guardrail status based on the Model Armor response.
- Added type ignore comment to `guardrail_status` assignment in `ModelArmorGuardrail` to suppress mypy warnings regarding the use of `metadata.get`.
- Ensured that the guardrail status logic remains intact while maintaining type safety.
- Adjusted the placement of the type ignore comment for `guardrail_status` in `ModelArmorGuardrail` to improve clarity while maintaining mypy compatibility.
- Ensured that the logic for determining guardrail status remains consistent with previous implementations.
- Removed the unnecessary comment about literal extension at runtime for `guardrail_status` in `ModelArmorGuardrail` to enhance code clarity.
- Maintained mypy compatibility while ensuring the logic for guardrail status remains unchanged.
@krrishdholakia
Copy link
Contributor

@colesmcintosh let me know when this has been manually qa'ed + ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: GCP Model armor always Success
2 participants