-
Notifications
You must be signed in to change notification settings - Fork 52
RHAIIS e2e tests: Fix model change #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change configures e2e testing infrastructure to support OpenAI models alongside existing vLLM deployments. Environment variables for OpenAI API key and model ID are added to the GitHub Actions workflow and Docker Compose setup. The test configuration is updated to include an OpenAI provider, new gpt-4-turbo model entry, and shield configuration pointing to gpt-4-turbo. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docker-compose.yaml (1)
20-21: Consider making debug logging conditional in e2e tests.The
LLAMA_STACK_LOGGING=all=debugis now permanent for all e2e test runs. Debug logging can impact performance and increase verbosity. Consider gating this behind a conditional environment variable so it can be toggled on/off per test run.Example:
- LLAMA_STACK_LOGGING=${LLAMA_STACK_LOGGING:-all=info}This allows tests to run with default
infolevel but enabledebugwhen needed for diagnostics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/e2e_tests_rhaiis.yaml(1 hunks)docker-compose.yaml(2 hunks)tests/e2e/configs/run-rhaiis.yaml(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (4)
.github/workflows/e2e_tests_rhaiis.yaml (1)
17-20: Verify GitHub secret exists and reconsider hardcoded model value.Line 17 references
secrets.OPENAI_API_KEY, which must pre-exist in your GitHub repository secrets. If missing, the workflow will fail. Verify this secret is configured.Line 20 hardcodes
RHAIIS_MODEL: meta-llama/Llama-3.1-8B, but the PR summary claims the model was "parametrized." If the intent is parametrization for flexibility, consider using a secret or input parameter instead of a hardcoded value. This would allow testing different models without modifying workflow code.tests/e2e/configs/run-rhaiis.yaml (3)
82-85: LGTM: OpenAI provider configuration.The new OpenAI provider is properly configured with environment-sourced API key. Structure and syntax are correct.
141-148: LGTM: Model parametrization and gpt-4-turbo configuration.The main model is now correctly parametrized via
env.RHAIIS_MODELwith vllm provider. The newgpt-4-turbomodel entry properly links to the OpenAI provider. Configuration structure is sound.
130-133: Review concern about provider/model compatibility requires clarification of intended shield architecture.The review identifies a pattern where
provider_shield_id: gpt-4-turbo(OpenAI) is paired withprovider_id: llama-guardacross multiple configurations. However, investigation reveals:Key findings:
- Git history shows this was a deliberate change across all config files:
meta-llama/Llama-3.1-8B-Instruct→gpt-4-turbo- The inline comment in
run.yaml(line 137) says "# Model to use for safety checks" — phrasing suggests provider-agnostic resource selection- The shields endpoint performs no local validation of provider/model compatibility; it delegates to the external Llama Stack service
- Codebase contains no provider compatibility constraints
Critical uncertainty:
Whetherprovider_shield_idis designed as provider-specific (your assumption) or provider-agnostic (the pattern suggests). The shields architecture does not enforce this constraint locally — the external Llama Stack service handles actual resolution.To confirm if this is an actual compatibility issue or intentional design, verify:
- Llama Stack documentation: does
llama-guardprovider support arbitrary model identifiers?- Whether the current configurations are actively tested and passing
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Parametrized and updated the model used for RHAIIS, changed the shield model to use openai
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Chores