-
Notifications
You must be signed in to change notification settings - Fork 691
feat: Deployment for Dynamo EPP - aware gateway #2633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We need to use the latest architecture through the use of PlugIns
WalkthroughUpdates EPP images and introduces a Dynamo-specific EndpointPickerConfig delivered via a Helm-managed ConfigMap. The deployment mounts the config and appends -configFile to the EPP container args. Templates consolidate image selection logic and add volumes/volumeMounts for the config. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant GW as Inference Gateway
participant EPP as EPP (custom image)
participant PL1 as dyn-pre (inject worker ID)
participant PL2 as dyn-kv (KV-aware scorer)
participant FE as Frontend (KV scoring URL)
participant PK as Max-score Picker
participant BE as Selected Backend
Note over EPP: Loads /etc/epp/epp-config-dynamo.yaml
C->>GW: Request
GW->>EPP: Schedule request
EPP->>PL1: Preprocess (inject worker ID)
PL1-->>EPP: Worker ID attached
EPP->>PL2: Score endpoints
PL2->>FE: POST /v1/chat/completions (timeout 10s)
FE-->>PL2: Scores
PL2-->>EPP: Endpoint scores
EPP->>PK: Pick highest score
PK-->>EPP: Selected endpoint
EPP-->>GW: Routing decision
GW->>BE: Forward request
BE-->>GW: Response
GW-->>C: Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
deploy/inference-gateway/README.md (3)
112-120: Fix Helm flags: wrong image tag and list syntax will break install
- Image tag in docs uses v5-0 but chart values moved to v5-1.
--set imagePullSecrets='{docker-imagepullsecret}'passes a string, not a list; Helm will iterate chars and render brokenimagePullSecrets.Apply:
helm install dynamo-gaie ./helm/dynamo-gaie \ -n my-model \ -f ./vllm_agg_qwen.yaml \ --set eppAware.enabled=true \ - --set eppAware.eppImage=gitlab-master.nvidia.com:5005/dl/ai-dynamo/dynamo/dynamo-custom-epp:v5-0 \ - --set imagePullSecrets='{docker-imagepullsecret}' \ + --set eppAware.eppImage=gitlab-master.nvidia.com:5005/dl/ai-dynamo/dynamo/dynamo-custom-epp:v5-1 \ + --set imagePullSecrets={docker-imagepullsecret} \ --set-string epp.extraEnv[0].name=USE_STREAMING \ --set-string epp.extraEnv[0].value=true
172-175: Fix kubectl flags and include scheme in GATEWAY_URL
-o yaml -o jsonpathis invalid (duplicate-o). Also addhttp://so curl works.- GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}') + GATEWAY_URL="http://$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}')"
220-222: Fix typos in sample promptMinor typos in user text.
- ... hinting at ests that Aeloria holds a secret ... + ... hinting that Aeloria holds a secret ... - ... a search for lost familt clue is hidden. + ... a search for lost family, where a clue is hidden.deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml (1)
1-11: Add SPDX header and quote templated names (fixes pipeline + yamllint)
- Pipeline flagged missing copyright header.
- YAML linter error at Line 4 is commonly triggered by unquoted Helm template expressions in metadata fields.
Apply:
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0 + # See http://www.apache.org/licenses/LICENSE-2.0 for license details. apiVersion: v1 kind: ConfigMap metadata: - name: {{ include "dynamo-gaie.fullname" . }}-epp-config + name: {{ printf "%s-epp-config" (include "dynamo-gaie.fullname" .) | quote }} labels: - app.kubernetes.io/name: {{ include "dynamo-gaie.name" . }} - app.kubernetes.io/instance: {{ .Release.Name }} + app.kubernetes.io/name: {{ include "dynamo-gaie.name" . | quote }} + app.kubernetes.io/instance: {{ .Release.Name | quote }} data: epp-config-dynamo.yaml: | {{ (.Files.Get "epp-config-dynamo.yaml") | indent 4 }}deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (1)
46-65: Config file omitted when argsOverride is usedCurrently
-configFile /etc/epp/epp-config-dynamo.yamlis only emitted in the default-args path. Ifepp.argsOverrideis set, the EPP will start without the config. Append the-configFilearguments in both paths, and only when epp-aware is enabled.Apply:
args: - {{- if .Values.epp.argsOverride }} - {{- toYaml .Values.epp.argsOverride | nindent 8 }} - {{- else }} + {{- if .Values.epp.argsOverride }} + {{- toYaml .Values.epp.argsOverride | nindent 8 }} + {{- else }} - -poolName - "{{ .Values.model.shortName }}-pool" - -poolNamespace - "{{ .Release.Namespace }}" - -v - "4" - --zap-encoder - "json" - -grpcPort - "9002" - -grpcHealthPort - "9003" - - -configFile - - "/etc/epp/epp-config-dynamo.yaml" - {{- end }} + {{- end }} + {{- if .Values.eppAware.enabled }} + - -configFile + - "/etc/epp/epp-config-dynamo.yaml" + {{- end }}
🧹 Nitpick comments (6)
deploy/inference-gateway/README.md (1)
160-168: Typos and clarity in minikube tunnel paragraphSmall fixes improve readability.
- a. User minikube tunnel to expose the gateway to the host - This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b). + a. Use `minikube tunnel` to expose the gateway to the host. + This requires `sudo` on the host. Alternatively, use port-forward as shown in alternative (b).deploy/inference-gateway/helm/dynamo-gaie/values.yaml (1)
76-76: Private sidecar image: ensure pull secrets and runtime compatibilityThe sidecar now points to a private registry. Ensure:
imagePullSecretsis set (README shows one; after the fix, it will render a list).- The sidecar’s
--router-mode kvmatches the EPP config’s frontend URL and port (currently 8000), which it does.No code changes required; just a heads-up.
deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (1)
31-41: Optional hardening: add securityContext and resources for EPP containerConsider adding a minimal
securityContext(runAsNonRoot, readOnlyRootFilesystem) and base resource requests/limits for the EPP container to mirror the sidecar hygiene.I can propose a follow-up patch if we want to standardize these across charts.
Also applies to: 92-112
deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml (3)
10-13: dyn-pre plugin is defined but never referenced in the scheduling profile.If dynamo-inject-workerid is required to mutate or annotate candidates before scoring, you need to include it in the profile chain; otherwise it’s a dead definition.
If the plugin is intended to run in the main pipeline, consider adding it before scoring:
- name: default plugins: + - pluginRef: dyn-pre - pluginRef: dyn-kv weight: 1 - pluginRef: pickerIf dyn-pre runs in a special phase (e.g., preFilter/prepare), wire it in that phase per EPP’s schema instead of the generic plugins list.
Also applies to: 21-23
15-17: Avoid hard-coding the KV scorer endpoint; template via values and confirm localhost is correct.127.0.0.1 targets the same container. That’s only valid if the target frontend runs in the same container/network namespace. If it’s a sidecar or a Service, prefer a pod-local address (localhost with the correct sidecar port) or cluster DNS (e.g., http://SERVICE.NAMESPACE.svc:PORT/PATH).
- Validate that the frontend actually listens on port 8000 in the EPP container’s namespace.
- For multi-env flexibility, make this a Helm value and render with tpl so envs can override it:
Example change in this file (only if the ConfigMap template uses tpl to render file contents):
- frontendURL: http://127.0.0.1:8000/v1/chat/completions + frontendURL: {{ .Values.dynamoGaie.epp.kvScorer.frontendURL | quote }}Then set a sensible default in values.yaml (dev → localhost; prod → cluster DNS). If your current ConfigMap template in templates/epp-configmap.yaml uses .Files.Get to embed the file verbatim, switch to tpl so Helm expressions inside are evaluated.
17-17: Make timeout configurable and add a sane default.Bake configurability into Helm values to avoid editing the config for each environment.
- timeoutMS: 10000 + timeoutMS: {{ .Values.dynamoGaie.epp.kvScorer.timeoutMS | default 10000 }}Note: This requires the ConfigMap template to render via tpl.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
deploy/inference-gateway/README.md(1 hunks)deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml(1 hunks)deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml(3 hunks)deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml(1 hunks)deploy/inference-gateway/helm/dynamo-gaie/values.yaml(1 hunks)deploy/inference-gateway/values-epp-aware.yaml(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Copyright Checks
deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml
[error] 1-1: Copyright header missing or invalid in file.
deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml
[error] 1-1: Copyright header missing or invalid in file.
🪛 YAMLlint (1.37.1)
deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml
[error] 4-4: syntax error: expected , but found ''
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (7)
deploy/inference-gateway/values-epp-aware.yaml (1)
16-19: Image bump looks good; verify registry access and tag availabilityThe move to v5-1 aligns with chart defaults. Please confirm:
- The tag v5-1 exists and is immutable.
- The
docker-imagepullsecretis present in the target namespace(s) used in the README commands.If desired, I can add a preflight note to the README for creating the secret.
deploy/inference-gateway/helm/dynamo-gaie/values.yaml (1)
69-69: Confirm EPP image default and immutability policyBumping to v5-1 is fine. Please verify this tag is immutable to avoid surprise rollouts, or switch to a digest-pinned image for reproducibility.
deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (1)
44-44: Image selection conditional is clean and correctThe inline conditional picks
.Values.eppAware.eppImagewhen provided and falls back to.Values.extension.imageotherwise. Good consolidation.deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml (4)
7-9: Picker wiring looks correct.The max-score-picker is referenced at the end of the profile, which is the expected position following scoring. No issues here.
Also applies to: 21-23
1-24: Paths Verified: ConfigMap Key, VolumeMount, and -configFile Argument AlignConfirmed that:
- In
deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml, the ConfigMap defines
epp-config-dynamo.yaml:as the key for the file contents.- In
deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml, the container args include
- -configFile
- "/etc/epp/epp-config-dynamo.yaml".- The corresponding
volumeMountmounts theepp-configvolume at/etc/eppand maps the key
epp-config-dynamo.yaml→path: epp-config-dynamo.yaml.All references align correctly—no changes required.
18-23: No changes needed: “default” correctly matches the single-profile-handler.
- deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml, line 5:
type: single-profile-handler- deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml, line 19:
- name: default
1-1: No changes needed: license header is already present
The filedeploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yamlalready includes the standard Apache 2.0 SPDX header at its top, matching the repository convention. You can safely ignore this comment—CI’s “missing header” error must be coming from another file or a linter configuration issue.Likely an incorrect or invalid review comment.
cleanup README
Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: atchernych <[email protected]>
Signed-off-by: atchernych <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: atchernych <[email protected]> Co-authored-by: hhzhang16 <[email protected]>
Signed-off-by: atchernych <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Jason Zhou <[email protected]>
Signed-off-by: atchernych <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: atchernych <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: nnshah1 <[email protected]>
We need to use the latest architecture for the EPP.
It implies creating custom PlugIns and providing a config file.
See EPP: atchernych/gateway-api-inference-extension-dynamo#5
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation
Chores