-
Notifications
You must be signed in to change notification settings - Fork 52
LCORE-816: Add new E2E tests for shields endpoint #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LCORE-816: Add new E2E tests for shields endpoint #682
Conversation
WalkthroughServer configuration adds concrete shield entries and extends vector DB entries with embedding_dimension and provider_id. E2E tests add shield endpoint scenarios and a new step to validate shield response structure; models scenario adjusted to expect an error when llama-stack is unreachable and minor wording edits applied. Changes
Sequence Diagram(s)sequenceDiagram
participant TestClient as E2E Test Client
participant API as Server API
participant LS as Llama Stack
rect rgb(248,249,250)
TestClient->>API: GET /shields
API->>LS: request shield/provider info
alt LS reachable
LS-->>API: shield data (shield_id, provider_id, provider_shield_id, ...)
API-->>TestClient: 200 + shields array
else LS unreachable
LS-->>API: connection error
API-->>TestClient: 500 + error body
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/e2e/features/steps/info.py (1)
61-62: Consider enhancing the docstring.The docstring could be more informative by including an Args section to document the context parameter, following Google Python docstring style as specified in the coding guidelines.
Apply this diff:
-def check_shield_structure(context: Context) -> None: - """Check that the first shield has the correct structure and required fields.""" +def check_shield_structure(context: Context) -> None: + """Check that the first shield has the correct structure and required fields. + + Args: + context: Behave context containing the HTTP response to validate. + """As per coding guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
run.yaml(1 hunks)tests/e2e/configs/run-azure.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(1 hunks)tests/e2e/configs/run-rhaiis.yaml(1 hunks)tests/e2e/features/info.feature(2 hunks)tests/e2e/features/steps/info.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/e2e/features/steps/info.py
tests/e2e/features/steps/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place behave step definitions under tests/e2e/features/steps/
Files:
tests/e2e/features/steps/info.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/e2e/features/steps/info.py
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Write E2E tests as Gherkin feature files for behave
Files:
tests/e2e/features/info.feature
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (8)
tests/e2e/configs/run-azure.yaml (1)
123-125: LGTM!The shields configuration is properly formatted and correctly references the llama-guard provider defined in the safety providers section.
tests/e2e/configs/run-rhaiis.yaml (1)
126-128: LGTM!The shields configuration is consistent with other environment configs and properly references the llama-guard provider.
tests/e2e/configs/run-ci.yaml (1)
134-136: LGTM!The shields configuration is properly formatted and consistent across all environment configurations.
tests/e2e/features/steps/info.py (1)
40-57: LGTM!The variable rename from
gpt_modeltollm_modelimproves clarity and better reflects the function's purpose of validating any LLM model structure, not just GPT models.tests/e2e/features/info.feature (3)
40-40: LGTM!The scenario title now accurately reflects that it tests the error case when llama-stack is unreachable, improving test documentation clarity.
50-54: LGTM!The new scenario for testing the shields endpoint follows the established pattern and properly validates both the response status and structure.
57-65: LGTM!The error handling scenario for the shields endpoint is well-structured and maintains consistency with other error scenarios in the feature file.
run.yaml (1)
134-136: LGTM!The shields configuration is properly formatted and aligns with the llama-guard provider defined in the safety providers section.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/e2e/features/steps/info.py (2)
85-85: Fix copy-paste error in assertion message.The assertion message references "LLM model" but should reference "shield".
Apply this diff:
- assert found_shield is not None, "No LLM model found in response" + assert found_shield is not None, "No shield found in response"
88-88: Fix copy-paste error in assertion message.The assertion message incorrectly states "type should be 'model'" when it should state "type should be 'shield'".
Apply this diff:
- assert found_shield["type"] == "shield", "type should be 'model'" + assert found_shield["type"] == "shield", "type should be 'shield'"
🧹 Nitpick comments (1)
tests/e2e/features/environment.py (1)
24-45: Consider parameterizing hostname and port.The function hardcodes
hostname="localhost"andport=8080. While this likely matches your test setup, consider:
- Using the same configuration values that tests will use (e.g., from environment variables or a config file)
- Accepting these as parameters with defaults
This would improve flexibility if you need to test against different environments.
Apply this diff to parameterize the function:
-def _fetch_models_from_service(hostname: str = "localhost", port: int = 8080) -> dict: +def _fetch_models_from_service(hostname: str = "localhost", port: int = 8080) -> dict: """Query /v1/models endpoint and return first LLM model. + + Args: + hostname: Service hostname (default: localhost) + port: Service port (default: 8080) Returns: Dict with model_id and provider_id, or empty dict if unavailableNote: The function signature already has default parameters, so this is just a documentation improvement unless you want to read these from the environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
tests/e2e/features/conversations.feature(1 hunks)tests/e2e/features/environment.py(1 hunks)tests/e2e/features/info.feature(2 hunks)tests/e2e/features/query.feature(3 hunks)tests/e2e/features/steps/common_http.py(2 hunks)tests/e2e/features/steps/conversation.py(2 hunks)tests/e2e/features/steps/info.py(1 hunks)tests/e2e/features/steps/llm_query_response.py(4 hunks)tests/e2e/features/streaming_query.feature(3 hunks)tests/e2e/utils/utils.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/e2e/features/conversations.feature
🧰 Additional context used
📓 Path-based instructions (4)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Write E2E tests as Gherkin feature files for behave
Files:
tests/e2e/features/query.featuretests/e2e/features/streaming_query.featuretests/e2e/features/info.feature
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/e2e/features/steps/info.pytests/e2e/utils/utils.pytests/e2e/features/steps/common_http.pytests/e2e/features/steps/llm_query_response.pytests/e2e/features/steps/conversation.pytests/e2e/features/environment.py
tests/e2e/features/steps/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place behave step definitions under tests/e2e/features/steps/
Files:
tests/e2e/features/steps/info.pytests/e2e/features/steps/common_http.pytests/e2e/features/steps/llm_query_response.pytests/e2e/features/steps/conversation.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/e2e/features/steps/info.pytests/e2e/utils/utils.pytests/e2e/features/steps/common_http.pytests/e2e/features/steps/llm_query_response.pytests/e2e/features/steps/conversation.pytests/e2e/features/environment.py
🧬 Code graph analysis (3)
tests/e2e/features/steps/common_http.py (1)
tests/e2e/utils/utils.py (1)
replace_placeholders(147-160)
tests/e2e/features/steps/llm_query_response.py (1)
tests/e2e/utils/utils.py (1)
replace_placeholders(147-160)
tests/e2e/features/steps/conversation.py (1)
tests/e2e/utils/utils.py (1)
replace_placeholders(147-160)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (9)
tests/e2e/features/steps/common_http.py (1)
174-177: LGTM! Placeholder preprocessing enables dynamic test data.The placeholder substitution is correctly applied before JSON parsing, with a defensive fallback to
"{}". This enables test assertions to reference runtime-detected model and provider values.tests/e2e/features/query.feature (1)
45-45: LGTM! Consistent parameterization across scenarios.The placeholder usage for
{MODEL}and{PROVIDER}is consistent across request payloads and expected error responses, enabling runtime-based test parameterization.Also applies to: 79-79, 84-84, 92-92, 102-102
tests/e2e/features/steps/conversation.py (1)
113-116: LGTM! Consistent placeholder preprocessing pattern.The implementation mirrors the pattern used in other step files, correctly replacing placeholders before parsing the expected conversation data.
tests/e2e/features/streaming_query.feature (1)
54-54: LGTM! Parameterization aligns with other feature files.Placeholder usage is consistent with the pattern established in query.feature, covering both request payloads and expected validation errors.
Also applies to: 67-72, 79-79, 88-88
tests/e2e/features/info.feature (1)
50-65: LGTM! Shields endpoint coverage is comprehensive.The new scenarios for the shields endpoint mirror the existing patterns for the models endpoint, including both success and error cases. The error handling scenario ensures proper behavior when llama-stack is unreachable.
tests/e2e/features/steps/llm_query_response.py (1)
28-32: LGTM! Consistent placeholder preprocessing with helpful debug output.The placeholder replacement is correctly applied across all three step functions before JSON parsing. The added debug print statements will help troubleshoot test failures.
Also applies to: 43-47, 66-73
tests/e2e/features/environment.py (1)
53-63: Fallback aligns with test configuration. The fallback modelgpt-4-turbo/openaiis already defined in your e2e configs; no updates required.tests/e2e/utils/utils.py (1)
147-160: No issues:replace_placeholdersis safe to use afterbefore_all. Context attributes (default_model,default_provider) are set unconditionally in thebefore_allhook, so no changes required.tests/e2e/features/steps/info.py (1)
30-66: LGTM! Good generalization to runtime-discovered defaults.The refactored model structure validation correctly finds the first LLM model and validates it against runtime-discovered default model and provider from context. The logic, assertions, and error messages are clear and appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/e2e/features/steps/info.py (1)
91-99: Consider using context variables for shield provider values.The
provider_id(line 92) andidentifier(line 98) are hardcoded, which reduces test flexibility compared tocheck_model_structurethat usescontext.default_provider. If shield provider configuration changes, these hardcoded values would require code updates.Consider adding shield-specific context variables (e.g.,
context.default_shield_provider) inenvironment.pyand using them here, similar to how model validation works.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/e2e/features/steps/info.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/e2e/features/steps/info.py
tests/e2e/features/steps/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place behave step definitions under tests/e2e/features/steps/
Files:
tests/e2e/features/steps/info.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/e2e/features/steps/info.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
- GitHub Check: build-pr
🔇 Additional comments (2)
tests/e2e/features/steps/info.py (2)
68-76: LGTM! Function signature and initial validation are correct.The step definition follows behave conventions with proper type annotations and docstring. The initial JSON and shields field validation logic is sound.
78-85: LGTM! Shield finding logic is correct.The loop correctly filters shields by type and safely handles missing fields with
.get(). The assertion message properly references "shield" (previous issue was addressed).
da4bbc5 to
324c286
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
run.yaml (1)
134-153: Consider aligning shield and LLM model for consistency.The shield's
provider_shield_idreferencesgpt-3.5-turbo(line 137), while the LLM model isgpt-4-turbo(line 150). While this configuration is syntactically valid and demonstrates flexibility, the E2E test validation intests/e2e/features/steps/info.pyexpects the shield'sprovider_resource_idto match the default LLM model.For consistency with the E2E test expectations and to serve as a clearer example configuration, consider using the same model ID for both.
Apply this diff to align the configurations:
- provider_shield_id: "gpt-3.5-turbo" # Model to use for safety checks + provider_shield_id: "gpt-4-turbo" # Model to use for safety checks
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
run.yaml(1 hunks)tests/e2e/configs/run-azure.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(1 hunks)tests/e2e/configs/run-rhaiis.yaml(1 hunks)tests/e2e/features/info.feature(2 hunks)tests/e2e/features/steps/info.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/e2e/configs/run-ci.yaml
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/e2e/features/steps/info.py
tests/e2e/features/steps/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place behave step definitions under tests/e2e/features/steps/
Files:
tests/e2e/features/steps/info.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/e2e/features/steps/info.py
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Write E2E tests as Gherkin feature files for behave
Files:
tests/e2e/features/info.feature
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (4)
tests/e2e/configs/run-azure.yaml (1)
123-131: LGTM! Shield configuration is consistent with model setup.The shield configuration correctly references the Azure model
gpt-4o-mini, which is also defined in the models section, ensuring consistency between the shield and inference providers.tests/e2e/features/steps/info.py (1)
68-99: LGTM! Shield validation logic is comprehensive and well-structured.The function properly validates the shield structure by checking JSON validity, field presence, and specific shield attributes. The validation approach mirrors the existing
check_model_structurefunction, maintaining consistency across the test suite.tests/e2e/features/info.feature (1)
50-54: LGTM! Shield endpoint test scenario is well-structured.The scenario follows the established pattern for endpoint testing and correctly uses the new shield structure validation step.
tests/e2e/configs/run-rhaiis.yaml (1)
126-140: LGTM! Shield and model configuration is properly aligned.The shield configuration correctly references
meta-llama/Llama-3.1-8B-Instruct, which is defined as an LLM model in the configuration, ensuring consistency between safety and inference providers.
|
|
||
|
|
||
| Scenario: Check if models endpoint is working | ||
| Scenario: Check if models endpoint reports error when llama-stack in unreachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammar error in scenario title.
The scenario title contains a grammar error: "llama-stack in unreachable" should be "llama-stack is unreachable".
Apply this diff:
- Scenario: Check if models endpoint reports error when llama-stack in unreachable
+ Scenario: Check if models endpoint reports error when llama-stack is unreachable📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Scenario: Check if models endpoint reports error when llama-stack in unreachable | |
| Scenario: Check if models endpoint reports error when llama-stack is unreachable |
🤖 Prompt for AI Agents
In tests/e2e/features/info.feature around line 40, the scenario title contains a
grammar mistake ("llama-stack in unreachable"); change the word "in" to "is" so
the line reads "Scenario: Check if models endpoint reports error when
llama-stack is unreachable" and save the file.
| And The body of the response has proper shield structure | ||
|
|
||
|
|
||
| Scenario: Check if shields endpoint reports error when llama-stack in unreachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammar error in scenario title.
The scenario title contains the same grammar error: "llama-stack in unreachable" should be "llama-stack is unreachable".
Apply this diff:
- Scenario: Check if shields endpoint reports error when llama-stack in unreachable
+ Scenario: Check if shields endpoint reports error when llama-stack is unreachable📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Scenario: Check if shields endpoint reports error when llama-stack in unreachable | |
| Scenario: Check if shields endpoint reports error when llama-stack is unreachable |
🤖 Prompt for AI Agents
In tests/e2e/features/info.feature around line 57, the scenario title has a
grammar mistake "llama-stack in unreachable"; update the scenario line to read
"Scenario: Check if shields endpoint reports error when llama-stack is
unreachable" so the verb is correct.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit