-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
fix(proxy): fix GCP Model Armor guardrail detection and circular reference issue #12991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
colesmcintosh
wants to merge
5
commits into
BerriAI:main
Choose a base branch
from
colesmcintosh:fix/gcp-model-armor-detection
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
fix(proxy): fix GCP Model Armor guardrail detection and circular reference issue #12991
colesmcintosh
wants to merge
5
commits into
BerriAI:main
from
colesmcintosh:fix/gcp-model-armor-detection
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rence issue Fixes BerriAI#12818 where GCP Model Armor was always returning success even for harmful content. Changes: - Fix _should_block_content to check correct API response fields (filterMatchState instead of non-existent blocked/action fields) - Fix _get_sanitized_content to extract sanitized text from correct response location - Override _process_response to store only Model Armor API response, preventing circular references in logging - Update test cases to use actual GCP Model Armor API response format - Add specific tests for harmful content detection and circular reference prevention The issue was that the implementation was checking for fields that don't exist in the actual GCP Model Armor API response, causing harmful content to never be detected as blocked.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
bugbot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugbot free trial expires on July 29, 2025
Learn more in the Cursor dashboard.
litellm/proxy/guardrails/guardrail_hooks/model_armor/model_armor.py
Outdated
Show resolved
Hide resolved
…date logging mechanism - Updated `guardrail_status` to support a new "blocked" state in `CustomGuardrail` and `StandardLoggingGuardrailInformation`. - Modified `ModelArmorGuardrail` to store the Model Armor response and status in request metadata, preventing race conditions and ensuring accurate logging for concurrent requests. - Enhanced the logic for determining the guardrail status based on the Model Armor response.
- Added type ignore comment to `guardrail_status` assignment in `ModelArmorGuardrail` to suppress mypy warnings regarding the use of `metadata.get`. - Ensured that the guardrail status logic remains intact while maintaining type safety.
- Adjusted the placement of the type ignore comment for `guardrail_status` in `ModelArmorGuardrail` to improve clarity while maintaining mypy compatibility. - Ensured that the logic for determining guardrail status remains consistent with previous implementations.
- Removed the unnecessary comment about literal extension at runtime for `guardrail_status` in `ModelArmorGuardrail` to enhance code clarity. - Maintained mypy compatibility while ensuring the logic for guardrail status remains unchanged.
@colesmcintosh let me know when this has been manually qa'ed + ready for review |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Title
fix(proxy): fix GCP Model Armor guardrail detection and circular reference issue
Relevant issues
Fixes #12818
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/
directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit
Type
🐛 Bug Fix
Changes
This PR fixes issue #12818 where GCP Model Armor was always returning "Success" even when it should block harmful content (like bomb-making instructions). The logs showed
"standard_logging_guardrail_information": "CircularReference Detected"
.Root Cause
blocked
,action
) that don't exist in the actual GCP Model Armor APIChanges Made
Fixed
_should_block_content
to check the correct API response fields:filterMatchState
and individual filter results instead of non-existent fieldsFixed
_get_sanitized_content
to extract sanitized text from the correct response locationAdded
_process_response
override to prevent circular references:Updated all test cases to use the actual GCP Model Armor API response format
Added specific tests:
test_model_armor_bomb_content_blocked
: Tests that harmful content is correctly blockedtest_model_armor_no_circular_reference_in_logging
: Verifies no circular references in loggingtest_model_armor_success_case_serializable
: Ensures success cases are properly serializableTesting
All Model Armor tests pass:
The fix ensures that harmful content like bomb-making instructions will be correctly detected and blocked by Model Armor, and that the guardrail information is properly logged without circular references.