Skip to content

Conversation

@atchernych
Copy link
Contributor

@atchernych atchernych commented Aug 22, 2025

We need to use the latest architecture for the EPP.
It implies creating custom PlugIns and providing a config file.
See EPP: atchernych/gateway-api-inference-extension-dynamo#5

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Introduces a configurable endpoint picker with KV-based scoring and max-score selection in a single-profile workflow.
    • EPP now loads a bundled configuration via a mounted ConfigMap (including worker ID injection and a 10s scoring timeout).
  • Documentation

    • Updated installation instructions to use the latest EPP container image.
  • Chores

    • Updated default container images for EPP and the runtime sidecar to newer internal builds.

We need to use the latest architecture
through the use of PlugIns
@atchernych atchernych changed the title featL Refactor Dynamo EPP customization feat: Refactor Dynamo EPP customization Aug 22, 2025
@github-actions github-actions bot added the feat label Aug 22, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

Updates EPP images and introduces a Dynamo-specific EndpointPickerConfig delivered via a Helm-managed ConfigMap. The deployment mounts the config and appends -configFile to the EPP container args. Templates consolidate image selection logic and add volumes/volumeMounts for the config.

Changes

Cohort / File(s) Summary
Docs: EPP image reference
deploy/inference-gateway/README.md
Updates EPP image in helm install command to a GitLab-hosted custom EPP tag.
Values: Image updates
deploy/inference-gateway/helm/dynamo-gaie/values.yaml, deploy/inference-gateway/values-epp-aware.yaml
Changes eppAware.eppImage to GitLab-hosted custom EPP; updates sidecar.image tag in chart values.
Helm: EPP config and mounts
deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml
Collapses image selection to one expression; adds -configFile=/etc/epp/epp-config-dynamo.yaml when argsOverride used; mounts read-only ConfigMap at /etc/epp; defines Pod volume from ConfigMap.
Helm: ConfigMap for EPP config
deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml
Adds ConfigMap templating that injects chart file epp-config-dynamo.yaml as data key epp-config-dynamo.yaml.
Config: EndpointPickerConfig
deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml
Adds EndpointPickerConfig with single-profile handler, Dynamo worker ID injection, KV-aware scorer (frontendURL, timeoutMS), and max-score picker.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant GW as Inference Gateway
  participant EPP as EPP (custom image)
  participant PL1 as dyn-pre (inject worker ID)
  participant PL2 as dyn-kv (KV-aware scorer)
  participant FE as Frontend (KV scoring URL)
  participant PK as Max-score Picker
  participant BE as Selected Backend

  Note over EPP: Loads /etc/epp/epp-config-dynamo.yaml

  C->>GW: Request
  GW->>EPP: Schedule request
  EPP->>PL1: Preprocess (inject worker ID)
  PL1-->>EPP: Worker ID attached
  EPP->>PL2: Score endpoints
  PL2->>FE: POST /v1/chat/completions (timeout 10s)
  FE-->>PL2: Scores
  PL2-->>EPP: Endpoint scores
  EPP->>PK: Pick highest score
  PK-->>EPP: Selected endpoint
  EPP-->>GW: Routing decision
  GW->>BE: Forward request
  BE-->>GW: Response
  GW-->>C: Response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

In charts I hop, with configs neat,
A picker scores where routes now meet.
A KV breeze, a worker’s name,
The gateway knows the winning lane.
New jars, new tags, a tidy map—
I twitch my nose, deploy, then nap. 🐇🚀

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
deploy/inference-gateway/README.md (3)

112-120: Fix Helm flags: wrong image tag and list syntax will break install

  • Image tag in docs uses v5-0 but chart values moved to v5-1.
  • --set imagePullSecrets='{docker-imagepullsecret}' passes a string, not a list; Helm will iterate chars and render broken imagePullSecrets.

Apply:

 helm install dynamo-gaie ./helm/dynamo-gaie \
   -n my-model \
   -f ./vllm_agg_qwen.yaml \
   --set eppAware.enabled=true \
-  --set eppAware.eppImage=gitlab-master.nvidia.com:5005/dl/ai-dynamo/dynamo/dynamo-custom-epp:v5-0 \
-  --set imagePullSecrets='{docker-imagepullsecret}' \
+  --set eppAware.eppImage=gitlab-master.nvidia.com:5005/dl/ai-dynamo/dynamo/dynamo-custom-epp:v5-1 \
+  --set imagePullSecrets={docker-imagepullsecret} \
   --set-string epp.extraEnv[0].name=USE_STREAMING \
   --set-string epp.extraEnv[0].value=true

172-175: Fix kubectl flags and include scheme in GATEWAY_URL

-o yaml -o jsonpath is invalid (duplicate -o). Also add http:// so curl works.

- GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}')
+ GATEWAY_URL="http://$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}')"

220-222: Fix typos in sample prompt

Minor typos in user text.

- ... hinting at ests that Aeloria holds a secret ...
+ ... hinting that Aeloria holds a secret ...
- ... a search for lost familt clue is hidden.
+ ... a search for lost family, where a clue is hidden.
deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml (1)

1-11: Add SPDX header and quote templated names (fixes pipeline + yamllint)

  • Pipeline flagged missing copyright header.
  • YAML linter error at Line 4 is commonly triggered by unquoted Helm template expressions in metadata fields.

Apply:

+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ # SPDX-License-Identifier: Apache-2.0
+ # See http://www.apache.org/licenses/LICENSE-2.0 for license details.
 apiVersion: v1
 kind: ConfigMap
 metadata:
-  name: {{ include "dynamo-gaie.fullname" . }}-epp-config
+  name: {{ printf "%s-epp-config" (include "dynamo-gaie.fullname" .) | quote }}
   labels:
-    app.kubernetes.io/name: {{ include "dynamo-gaie.name" . }}
-    app.kubernetes.io/instance: {{ .Release.Name }}
+    app.kubernetes.io/name: {{ include "dynamo-gaie.name" . | quote }}
+    app.kubernetes.io/instance: {{ .Release.Name | quote }}
 data:
   epp-config-dynamo.yaml: |
 {{ (.Files.Get "epp-config-dynamo.yaml") | indent 4 }}
deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (1)

46-65: Config file omitted when argsOverride is used

Currently -configFile /etc/epp/epp-config-dynamo.yaml is only emitted in the default-args path. If epp.argsOverride is set, the EPP will start without the config. Append the -configFile arguments in both paths, and only when epp-aware is enabled.

Apply:

         args:
-        {{- if .Values.epp.argsOverride }}
-        {{- toYaml .Values.epp.argsOverride | nindent 8 }}
-        {{- else }}
+        {{- if .Values.epp.argsOverride }}
+        {{- toYaml .Values.epp.argsOverride | nindent 8 }}
+        {{- else }}
           - -poolName
           - "{{ .Values.model.shortName }}-pool"
           - -poolNamespace
           - "{{ .Release.Namespace }}"
           - -v
           - "4"
           - --zap-encoder
           - "json"
           - -grpcPort
           - "9002"
           - -grpcHealthPort
           - "9003"
-          - -configFile
-          - "/etc/epp/epp-config-dynamo.yaml"
-        {{- end }}
+        {{- end }}
+        {{- if .Values.eppAware.enabled }}
+          - -configFile
+          - "/etc/epp/epp-config-dynamo.yaml"
+        {{- end }}
🧹 Nitpick comments (6)
deploy/inference-gateway/README.md (1)

160-168: Typos and clarity in minikube tunnel paragraph

Small fixes improve readability.

- a. User minikube tunnel to expose the gateway to the host
-    This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b).
+ a. Use `minikube tunnel` to expose the gateway to the host.
+    This requires `sudo` on the host. Alternatively, use port-forward as shown in alternative (b).
deploy/inference-gateway/helm/dynamo-gaie/values.yaml (1)

76-76: Private sidecar image: ensure pull secrets and runtime compatibility

The sidecar now points to a private registry. Ensure:

  • imagePullSecrets is set (README shows one; after the fix, it will render a list).
  • The sidecar’s --router-mode kv matches the EPP config’s frontend URL and port (currently 8000), which it does.

No code changes required; just a heads-up.

deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (1)

31-41: Optional hardening: add securityContext and resources for EPP container

Consider adding a minimal securityContext (runAsNonRoot, readOnlyRootFilesystem) and base resource requests/limits for the EPP container to mirror the sidecar hygiene.

I can propose a follow-up patch if we want to standardize these across charts.

Also applies to: 92-112

deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml (3)

10-13: dyn-pre plugin is defined but never referenced in the scheduling profile.

If dynamo-inject-workerid is required to mutate or annotate candidates before scoring, you need to include it in the profile chain; otherwise it’s a dead definition.

If the plugin is intended to run in the main pipeline, consider adding it before scoring:

   - name: default
     plugins:
+      - pluginRef: dyn-pre
       - pluginRef: dyn-kv
         weight: 1
       - pluginRef: picker

If dyn-pre runs in a special phase (e.g., preFilter/prepare), wire it in that phase per EPP’s schema instead of the generic plugins list.

Also applies to: 21-23


15-17: Avoid hard-coding the KV scorer endpoint; template via values and confirm localhost is correct.

127.0.0.1 targets the same container. That’s only valid if the target frontend runs in the same container/network namespace. If it’s a sidecar or a Service, prefer a pod-local address (localhost with the correct sidecar port) or cluster DNS (e.g., http://SERVICE.NAMESPACE.svc:PORT/PATH).

  • Validate that the frontend actually listens on port 8000 in the EPP container’s namespace.
  • For multi-env flexibility, make this a Helm value and render with tpl so envs can override it:

Example change in this file (only if the ConfigMap template uses tpl to render file contents):

-      frontendURL: http://127.0.0.1:8000/v1/chat/completions
+      frontendURL: {{ .Values.dynamoGaie.epp.kvScorer.frontendURL | quote }}

Then set a sensible default in values.yaml (dev → localhost; prod → cluster DNS). If your current ConfigMap template in templates/epp-configmap.yaml uses .Files.Get to embed the file verbatim, switch to tpl so Helm expressions inside are evaluated.


17-17: Make timeout configurable and add a sane default.

Bake configurability into Helm values to avoid editing the config for each environment.

-      timeoutMS: 10000
+      timeoutMS: {{ .Values.dynamoGaie.epp.kvScorer.timeoutMS | default 10000 }}

Note: This requires the ConfigMap template to render via tpl.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7076990 and 27ba464.

📒 Files selected for processing (6)
  • deploy/inference-gateway/README.md (1 hunks)
  • deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml (1 hunks)
  • deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (3 hunks)
  • deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml (1 hunks)
  • deploy/inference-gateway/helm/dynamo-gaie/values.yaml (1 hunks)
  • deploy/inference-gateway/values-epp-aware.yaml (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Copyright Checks
deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml

[error] 1-1: Copyright header missing or invalid in file.

deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml

[error] 1-1: Copyright header missing or invalid in file.

🪛 YAMLlint (1.37.1)
deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml

[error] 4-4: syntax error: expected , but found ''

(syntax)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (7)
deploy/inference-gateway/values-epp-aware.yaml (1)

16-19: Image bump looks good; verify registry access and tag availability

The move to v5-1 aligns with chart defaults. Please confirm:

  • The tag v5-1 exists and is immutable.
  • The docker-imagepullsecret is present in the target namespace(s) used in the README commands.

If desired, I can add a preflight note to the README for creating the secret.

deploy/inference-gateway/helm/dynamo-gaie/values.yaml (1)

69-69: Confirm EPP image default and immutability policy

Bumping to v5-1 is fine. Please verify this tag is immutable to avoid surprise rollouts, or switch to a digest-pinned image for reproducibility.

deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml (1)

44-44: Image selection conditional is clean and correct

The inline conditional picks .Values.eppAware.eppImage when provided and falls back to .Values.extension.image otherwise. Good consolidation.

deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml (4)

7-9: Picker wiring looks correct.

The max-score-picker is referenced at the end of the profile, which is the expected position following scoring. No issues here.

Also applies to: 21-23


1-24: Paths Verified: ConfigMap Key, VolumeMount, and -configFile Argument Align

Confirmed that:

  • In deploy/inference-gateway/helm/dynamo-gaie/templates/epp-configmap.yaml, the ConfigMap defines
    epp-config-dynamo.yaml: as the key for the file contents.
  • In deploy/inference-gateway/helm/dynamo-gaie/templates/dynamo-epp.yaml, the container args include
    - -configFile
    - "/etc/epp/epp-config-dynamo.yaml".
  • The corresponding volumeMount mounts the epp-config volume at /etc/epp and maps the key
    epp-config-dynamo.yamlpath: epp-config-dynamo.yaml.

All references align correctly—no changes required.


18-23: No changes needed: “default” correctly matches the single-profile-handler.

  • deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml, line 5: type: single-profile-handler
  • deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml, line 19: - name: default

1-1: No changes needed: license header is already present
The file deploy/inference-gateway/helm/dynamo-gaie/epp-config-dynamo.yaml already includes the standard Apache 2.0 SPDX header at its top, matching the repository convention. You can safely ignore this comment—CI’s “missing header” error must be coming from another file or a linter configuration issue.

Likely an incorrect or invalid review comment.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 22, 2025
@atchernych atchernych changed the title feat: Refactor Dynamo EPP customization feat: Deployment for Dynamo EPP - aware gateway Aug 22, 2025
Co-authored-by: hhzhang16 <[email protected]>
Signed-off-by: atchernych <[email protected]>
@atchernych atchernych merged commit 490cdc1 into main Aug 26, 2025
10 of 12 checks passed
@atchernych atchernych deleted the DEP-315-tok-k8s branch August 26, 2025 23:19
hhzhang16 added a commit that referenced this pull request Aug 27, 2025
Signed-off-by: atchernych <[email protected]>
Co-authored-by: hhzhang16 <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
krishung5 pushed a commit that referenced this pull request Aug 27, 2025
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
Signed-off-by: atchernych <[email protected]>
Co-authored-by: hhzhang16 <[email protected]>
Signed-off-by: Jason Zhou <[email protected]>
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: atchernych <[email protected]>
Co-authored-by: hhzhang16 <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: atchernych <[email protected]>
Co-authored-by: hhzhang16 <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants