refactor: vLLM to new Python UX #1983

alec-flowers · 2025-07-17T07:08:40Z

python -m dynamo.vllm --help

Summary by CodeRabbit

New Features
- Added a requirements file specifying dependencies for the vLLM backend.
- Introduced a new script for launching router and worker processes in the vLLM backend.
- Added an entry point for running vLLM as a Python module.
Bug Fixes
- Updated launch scripts and documentation to use Python module invocation for starting backend services.
Documentation
- Updated documentation and command examples to match the new directory structure and execution commands.
- Removed redundant Apache License headers, retaining SPDX identifiers.
Chores
- Adjusted build configuration to include the vLLM backend package.
- Updated test configurations to reflect new directory paths.
- Removed obsolete launch script from the examples directory.

coderabbitai · 2025-07-22T17:47:25Z

Walkthrough

This change removes Apache 2.0 license headers from multiple files, updates documentation and test paths to reflect a new directory structure, standardizes launch scripts to use Python module invocations, adds a requirements file, and introduces a Python module entry point for the vLLM backend. One obsolete launch script is deleted.

Changes

File(s)	Change Summary
.../README.md, .../multi-node.md, .../deepseek-r1.md	Removed Apache 2.0 license headers and updated paths/commands to match new structure
.../deploy/*.yaml	Removed Apache 2.0 license headers from deployment YAML files
.../launch/agg.sh, .../launch/dep.sh, .../launch/disagg.sh, .../launch/disagg_router.sh, .../launch/dsr1_dep.sh	Standardized launch commands to use Python module invocations instead of direct script/binary calls
.../launch/agg_router.sh	Added new launch script orchestrating router and worker processes using Python modules
.../src/dynamo/vllm/args.py, .../handlers.py, .../main.py, .../protocol.py, .../publisher.py	Removed Apache 2.0 license headers; updated imports to relative paths where needed
.../src/dynamo/vllm/main.py	Refactored to add a `main()` function, updated async runner to use `uvloop.run`, and changed import style
.../src/dynamo/vllm/main.py	Added new module entry point that runs the main function
.../requirements.txt	Added new requirements file with `uvloop` and a specific `vllm` GitHub dependency
.../examples/vllm/launch/agg_router.sh	Deleted obsolete launch script
pyproject.toml	Added vLLM backend source directory to wheel build targets
tests/serve/test_vllm.py	Updated test configuration paths to new directory structure

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LaunchScript
    participant PythonModule
    participant WorkerProcess

    User->>LaunchScript: Run agg.sh/dep.sh/disagg.sh/agg_router.sh
    LaunchScript->>PythonModule: python -m dynamo.frontend [--router-mode]
    LaunchScript->>PythonModule: python -m dynamo.vllm [args]
    PythonModule->>WorkerProcess: Start vLLM worker(s)

Estimated code review effort

2 (~15 minutes)

Possibly related PRs

refactor: Migrate to new UX2 for python launch #2003: Refactored TRTLLM backend launch scripts to use Python module invocations, similar to this PR's changes for vLLM.

Poem

In the warren of code, a rabbit did leap,
Sweeping old licenses, tidy and neat.
Launch scripts now hop with Pythonic delight,
Paths realigned, everything just right.
With a twitch of the nose and a flick of the ear,
vLLM’s backend is crystal clear!
🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (9)

components/backends/vllm/src/dynamo/vllm/args.py (1)

131-160: Race window after releasing the socket
You reserve the OS port, write to ETCD, then close the socket before the vLLM process binds. Another local process can theoretically grab it in the interval. Mitigate by re-binding immediately after ETCD success (kept open until worker starts) or by setting SO_EXCLUSIVEADDRUSE / SO_REUSEPORT semantics where available.
components/backends/vllm/deploy/disagg.yaml (1)
84-88: Launch command still points to components/main.py
The PR description says launch scripts are now module-based (python -m dynamo.vllm …). Update the manifest to avoid shipping a stale path.
-          args:
-            - "python3 components/main.py --model Qwen/Qwen3-0.6B --enforce-eager 2>&1 | tee /tmp/vllm.log"
+          args:
+            - "python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager 2>&1 | tee /tmp/vllm.log"
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
142-147: Possible AttributeError – blindly calling .data() on the response

prefill_response = await anext(...). is immediately passed to prefill_response.data(), but it’s unclear that the returned object actually exposes a data() method. If it is already bytes / str, this will crash.
-                prefill_response = MyRequestOutput.model_validate_json(
-                    prefill_response.data()
-                )
+                payload = (
+                    prefill_response
+                    if isinstance(prefill_response, (bytes, str))
+                    else prefill_response.data()
+                )
+                prefill_response = MyRequestOutput.model_validate_json(payload)
components/backends/vllm/deploy/agg_router.yaml (1)
40-50: Deployment paths still point to removed examples/vllm directory

workingDir: /workspace/examples/vllm and the command under args will 404 after the refactor. Update to the new layout or use module invocation, e.g.:
-  workingDir: /workspace/examples/vllm
+  workingDir: /workspace/components/backends/vllm
 …
-  - "python3 components/main.py …"
+  - "python -m dynamo.vllm …"
components/backends/vllm/deploy/disagg_router.yaml (2)

40-50: Same stale directory reference as in agg_router.yaml

workingDir and ingress command still use examples/vllm; containers will crash. Align with the new path (components/backends/vllm) or switch to module execution.

124-130: Prefill-worker command outdated

python3 components/main.py … no longer exists. Switch to python -m dynamo.vllm … or correct absolute path.
tests/serve/test_vllm.py (1)
200-206: agg-router still references the old examples path

directory="/workspace/examples/vllm" was not migrated. If the examples tree was removed in this refactor, this will raise FileNotFoundError and fail the pytest matrix.
-        directory="/workspace/examples/vllm",
+        directory="/workspace/components/backends/vllm",
components/backends/vllm/launch/disagg_router.sh (1)
8-20: Add wait logic to surface failures.

Same failure-propagation issue as the other scripts. A minimal addition after spawning all jobs:
wait -n
exit $?
components/backends/vllm/launch/dsr1_dep.sh (1)
79-83: Create the log directory before piping output into tee

$LOG_DIR is referenced on L79, but mkdir -p $LOG_DIR is executed only on L82.
If the directory does not exist, tee fails and the ingress process terminates silently, breaking the launch sequence.
-# run ingress if it's node 0
-if [ $NODE_RANK -eq 0 ]; then
-    DYN_LOG=debug python -m dynamo.frontend --router-mode kv 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log &
-fi
-
-mkdir -p $LOG_DIR
+# ensure the log directory exists early
+mkdir -p "$LOG_DIR"
+
+# run ingress if it's node 0
+if [ "$NODE_RANK" -eq 0 ]; then
+    DYN_LOG=debug python3 -m dynamo.frontend --router-mode kv 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" &
+fi

♻️ Duplicate comments (3)

components/backends/vllm/deploy/disagg_planner.yaml (1)

84-88: Same stale entry-point as above
Replicate the module-style invocation here to stay consistent with the new UX.

components/backends/vllm/deploy/agg.yaml (1)

84-88: Stale script path – switch to module execution
See earlier comment; update to python -m dynamo.vllm ….

components/backends/vllm/multi-node.md (1)

76-83: Same foreground/background concern for disaggregated ingress

Consider the same tweak (&) or explicit instructions.

🧹 Nitpick comments (16)

components/backends/vllm/src/dynamo/vllm/args.py (3)
22-24: DEFAULT_MODEL is declared but never used
The constant adds dead code and may confuse readers about where the model default is enforced.
-DEFAULT_MODEL = "Qwen/Qwen3-0.6B"
26-43: Consider turning Config into a real dataclass
The class is currently a mutable “bag-of-attrs” created via Config(); config.foo = ….
Using @dataclass (or pydantic) makes required / optional fields explicit, provides defaults, and catches typos at runtime & in type-checkers.
+from dataclasses import dataclass, field
+
+@dataclass
+class Config:
+    # Dynamo-specific
+    namespace: str = ""
+    component: str = ""
+    endpoint: str = ""
+    is_prefill_worker: bool = False
+    kv_port: int | None = None
+    side_channel_port: int | None = None
+
+    # vLLM
+    model: str = ""
+    served_model_name: str | None = None
+
+    # Engine args
+    engine_args: AsyncEngineArgs | None = field(default=None)
101-105: Hard-coded block_size = 16 may not fit all models
Large models frequently want 32/64 to keep KV-cache fragmentation low. Consider exposing this tweak as --default-block-size or piggy-backing on vLLM’s internal default instead of overriding unconditionally.
components/backends/vllm/deepseek-r1.md (1)
8-8: Minor typos in documentation
“seperate” → “separate”; also “…variety of different configurations”.
-Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a seperate dynamo component
+Dynamo supports running Deepseek R1 with data-parallel attention and wide expert parallelism. Each data-parallel attention rank is a separate Dynamo component
components/backends/vllm/src/dynamo/vllm/__main__.py (1)
4-7: Use an explicit relative import for robustness

Because __main__.py is executed with python -m dynamo.vllm, __package__ is already set to dynamo.vllm. A relative import (from .main import main) avoids hard-coding the full package path, insulating you from future top-level package renames or vendoring.
-from dynamo.vllm.main import main
+from .main import main
components/backends/vllm/launch/dep.sh (1)
8-15: Allow GPU count to be parameterised

Hard-coding {0..3} ties this script to exactly four GPUs. Making it dynamic improves portability, e.g.:
-GPU_IDS=${GPU_IDS:-0 1 2 3}
-for i in {0..3}; do
+GPU_IDS=${GPU_IDS:-0 1 2 3}
+for i in ${GPU_IDS}; do
Callers can then do GPU_IDS="0 1" ./dep.sh.
components/backends/vllm/multi-node.md (1)

108-109: Minor: keep backgrounding consistent

The large-model section backgrounds ingress (&) – update earlier snippets to match once decision made.
components/backends/vllm/launch/agg.sh (1)
8-12: Tiny formatting nit

Double space after vllm – not harmful, but trimming keeps scripts tidy.
-python -m dynamo.vllm  --model
+python -m dynamo.vllm --model
components/backends/vllm/launch/disagg.sh (1)
4-6: Harden the script with pipefail/nounset.
set -e alone will not detect failures in pipelines or unset variables. Tighten fail-fast behaviour:
-set -e
+set -euo pipefail
components/backends/vllm/README.md (2)
56-60: Docs still reference the removed dynamo run CLI.

The launch scripts in this PR now invoke the ingress via
python -m dynamo.frontend --router-mode kv, but the README paragraph (lines 58-60) tells users that “each shell script runs dynamo run …”. Please update to avoid confusion.

80-83: Path not updated to new directory structure.

Line 81 still instructs: cd examples/vllm, but the examples were moved to components/backends/vllm. Adjust for consistency:
-cd examples/vllm
+cd components/backends/vllm
components/backends/vllm/launch/agg_router.sh (1)
4-6: Adopt strict shell options.

Same rationale as in disagg.sh—add -u and pipefail:
-set -e
+set -euo pipefail
components/backends/vllm/src/dynamo/vllm/main.py (1)
203-205: Wrap worker in top-level coroutine for clearer control flow.

uvloop.run(worker()) works, yet hides the logical distinction between “parse args & set-up” and “start worker”. Consider:
-uvloop.run(worker())
+async def _entry():
+    await worker()   # allows future pre-run setup / error handling
+
+uvloop.run(_entry())
Not mandatory but improves readability and lets you insert pre-flight checks without editing two files (main.py & __main__.py).
components/backends/vllm/launch/disagg_router.sh (1)
4-6: Mirror strict mode in all launch scripts.

For consistency with other suggestions:
-set -e
+set -euo pipefail
components/backends/vllm/launch/dsr1_dep.sh (2)
79-80: Inconsistent interpreter (python vs python3) may pick different environments

python -m dynamo.frontend (L79) and python3 -m dynamo.vllm (L92) might resolve to different binaries/venvs, leading to mismatched package sets. Pick one and expose it via a single variable, e.g.:
PYTHON=${PYTHON:-python3}
…
DYN_LOG=debug "$PYTHON" -m dynamo.frontend …
…
"$PYTHON" -m dynamo.vllm …
Also applies to: 92-93

88-101: Quote variable expansions that can contain spaces

$LOG_DIR, $MASTER_ADDR, and others are unquoted when used in commands (tee, --data-parallel-address, etc.).
Quoting prevents word-splitting and unexpected globbing:
tee "$LOG_DIR/dsr1_dep_${dp_rank}.log" &
…
--data-parallel-address "$MASTER_ADDR" \

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d65ce1b and 875941b.

📒 Files selected for processing (24)

components/backends/vllm/README.md (5 hunks)
components/backends/vllm/deepseek-r1.md (1 hunks)
components/backends/vllm/deploy/agg.yaml (1 hunks)
components/backends/vllm/deploy/agg_router.yaml (1 hunks)
components/backends/vllm/deploy/disagg.yaml (1 hunks)
components/backends/vllm/deploy/disagg_planner.yaml (1 hunks)
components/backends/vllm/deploy/disagg_router.yaml (1 hunks)
components/backends/vllm/launch/agg.sh (1 hunks)
components/backends/vllm/launch/agg_router.sh (1 hunks)
components/backends/vllm/launch/dep.sh (1 hunks)
components/backends/vllm/launch/disagg.sh (1 hunks)
components/backends/vllm/launch/disagg_router.sh (1 hunks)
components/backends/vllm/launch/dsr1_dep.sh (2 hunks)
components/backends/vllm/multi-node.md (5 hunks)
components/backends/vllm/requirements.txt (1 hunks)
components/backends/vllm/src/dynamo/vllm/__main__.py (1 hunks)
components/backends/vllm/src/dynamo/vllm/args.py (1 hunks)
components/backends/vllm/src/dynamo/vllm/handlers.py (1 hunks)
components/backends/vllm/src/dynamo/vllm/main.py (2 hunks)
components/backends/vllm/src/dynamo/vllm/protocol.py (0 hunks)
components/backends/vllm/src/dynamo/vllm/publisher.py (0 hunks)
examples/vllm/launch/agg_router.sh (0 hunks)
pyproject.toml (1 hunks)
tests/serve/test_vllm.py (2 hunks)

🧠 Learnings (19)

components/backends/vllm/deploy/disagg.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/src/dynamo/vllm/handlers.py (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/deploy/agg_router.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/src/dynamo/vllm/args.py (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/deploy/disagg_router.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/deploy/disagg_planner.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

pyproject.toml (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/src/dynamo/vllm/__main__.py (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/deploy/agg.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/requirements.txt (1)

Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

components/backends/vllm/launch/dep.sh (2)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/README.md (4)

Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.

Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.

Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.

components/backends/vllm/launch/disagg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/multi-node.md (6)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.

tests/serve/test_vllm.py (2)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.

Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

components/backends/vllm/launch/agg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/launch/disagg_router.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/launch/agg_router.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/src/dynamo/vllm/main.py (2)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

🧬 Code Graph Analysis (2)

components/backends/vllm/src/dynamo/vllm/handlers.py (1)

components/backends/vllm/src/dynamo/vllm/protocol.py (1)

MyRequestOutput (11-34)

components/backends/vllm/src/dynamo/vllm/__main__.py (1)

components/backends/vllm/src/dynamo/vllm/main.py (1)

main (203-204)

💤 Files with no reviewable changes (3)

components/backends/vllm/src/dynamo/vllm/publisher.py
components/backends/vllm/src/dynamo/vllm/protocol.py
examples/vllm/launch/agg_router.sh

🧰 Additional context used

🧠 Learnings (19)

components/backends/vllm/deploy/disagg.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/src/dynamo/vllm/handlers.py (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/deploy/agg_router.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/src/dynamo/vllm/args.py (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/deploy/disagg_router.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/deploy/disagg_planner.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

pyproject.toml (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/src/dynamo/vllm/__main__.py (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/vllm/deploy/agg.yaml (3)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.

components/backends/vllm/requirements.txt (1)

Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

components/backends/vllm/launch/dep.sh (2)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/README.md (4)

Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.

Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.

Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.

components/backends/vllm/launch/disagg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/multi-node.md (6)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.

tests/serve/test_vllm.py (2)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.

Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

components/backends/vllm/launch/agg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/launch/disagg_router.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/launch/agg_router.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/vllm/src/dynamo/vllm/main.py (2)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

🧬 Code Graph Analysis (2)

components/backends/vllm/src/dynamo/vllm/handlers.py (1)

components/backends/vllm/src/dynamo/vllm/protocol.py (1)

MyRequestOutput (11-34)

components/backends/vllm/src/dynamo/vllm/__main__.py (1)

components/backends/vllm/src/dynamo/vllm/main.py (1)

main (203-204)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Mirror Repository to GitLab
GitHub Check: pre-merge-rust (.)
GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)

🔇 Additional comments (10)

components/backends/vllm/src/dynamo/vllm/args.py (2)

65-70: Silent defaulting can mask user intent
If enable_prefix_caching isn’t passed, you transparently flip it to True. That behaviour differs from vanilla vLLM and may surprise callers. At least log at INFO, or expose an explicit --enable-prefix-caching default in help text.

230-238: Fail fast if vLLM adds new required CLI attrs
hasattr(config.engine_args, key) guards unknown keys but raises at runtime. Unit-test this list (task, skip_tokenizer_init, …) against upstream vLLM versions to avoid sudden breakage after upgrades.

components/backends/vllm/src/dynamo/vllm/handlers.py (1)

17-18: Relative import is the right fix

Using .protocol prevents unintended shadowing when handlers.py is executed via python -m dynamo.vllm.
Looks good.

pyproject.toml (1)

82-82: Packaging update looks good

Including components/backends/vllm/src/dynamo in the wheel ensures the new backend ships with the project.

tests/serve/test_vllm.py (2)

186-192: 👍 Path update looks good

aggregated configuration now points to the relocated backend directory and should resolve correctly.

213-217: Consistent directory for disaggregated config

Path updated correctly here. No action needed.
components/backends/vllm/multi-node.md (2)

52-58: Ingress command blocks the terminal

In the previous scripts, ingress was backgrounded (&). Here it runs foreground, preventing the worker command below from executing in the same shell. Either append & or add a note that a separate terminal is required.
-python -m dynamo.frontend --router-mode kv
+python -m dynamo.frontend --router-mode kv &
90-96: Flag name seems inverted

Decode worker example passes --is-prefill-worker. Double-check the flag name and semantics.
components/backends/vllm/README.md (1)

24-32: Container run command can drop -it when launched from scripts.

If the goal is a detached container for automated deployments, consider:
-./container/run.sh -it --framework VLLM [--mount-workspace]
+./container/run.sh --framework VLLM [--mount-workspace]
Keeping -it is fine for local testing—just verify the intent.
components/backends/vllm/launch/agg_router.sh (1)

10-13: Propagate background errors & forward signals cleanly.

Only the second worker runs in foreground; failures in ingress / first worker are ignored.
Pattern identical to disagg.sh: add wait -n and return its status, or exec the last worker so signals reach it directly.

components/backends/vllm/launch/disagg.sh

components/backends/vllm/requirements.txt

components/backends/vllm/deploy/agg.yaml

components/backends/vllm/deploy/disagg_router.yaml

grahamking

Excellent work.

components/backends/vllm/src/dynamo/vllm/handlers.py

tedzhouhk

should frontend in k8s also be using python -m ... (those yaml in deploy folder)?

grahamking · 2025-07-22T19:33:56Z

should frontend in k8s also be using python -m ... (those yaml in deploy folder)?

Yes! I will put up a new PR for that.

grahamking · 2025-07-22T19:44:05Z

should frontend in k8s also be using python -m ... (those yaml in deploy folder)?

Yes! I will put up a new PR for that.

#2055

biswapanda · 2025-07-22T19:55:49Z

Lgtm 👍

pull-request-size bot added the size/XXL label Jul 17, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 17, 2025 07:08 Inactive

github-actions bot added the refactor label Jul 17, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 17, 2025 07:09 Inactive

pull-request-size bot added size/L and removed size/XXL labels Jul 17, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 17, 2025 19:00 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 17, 2025 19:01 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 17, 2025 19:03 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 17, 2025 19:06 Inactive

alec-flowers added 3 commits July 22, 2025 13:22

first pass

0cd9fcd

finishing up

a601b6f

clean up licenses

875941b

grahamking force-pushed the aflowers/vllm-python-ux branch from 0069741 to 875941b Compare July 22, 2025 17:39

copy-pr-bot bot temporarily deployed to GITLAB July 22, 2025 17:40 Inactive

grahamking marked this pull request as ready for review July 22, 2025 17:40

grahamking requested review from GuanLuo, PeaBrane, biswapanda, grahamking, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners July 22, 2025 17:40

grahamking requested review from a team, ishandhanani and jthomson04 as code owners July 22, 2025 17:40

copy-pr-bot bot temporarily deployed to GITLAB July 22, 2025 17:44 Inactive

coderabbitai bot reviewed Jul 22, 2025

View reviewed changes

components/backends/vllm/launch/disagg.sh Show resolved Hide resolved

components/backends/vllm/requirements.txt Outdated Show resolved Hide resolved

Pin vllm to 0.9.2 and update last of the docs.

25f903e

copy-pr-bot bot temporarily deployed to GITLAB July 22, 2025 18:10 Inactive

alec-flowers commented Jul 22, 2025

View reviewed changes

components/backends/vllm/deploy/agg.yaml Outdated Show resolved Hide resolved

alec-flowers commented Jul 22, 2025

View reviewed changes

components/backends/vllm/deploy/disagg_router.yaml Outdated Show resolved Hide resolved

Feedback

3e8c0d8

copy-pr-bot bot temporarily deployed to GITLAB July 22, 2025 18:18 Inactive

grahamking approved these changes Jul 22, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB July 22, 2025 18:24 Inactive

biswapanda reviewed Jul 22, 2025

View reviewed changes

components/backends/vllm/src/dynamo/vllm/handlers.py Show resolved Hide resolved

grahamking mentioned this pull request Jul 22, 2025

feat: deploy SLA profiler to k8s #2030

Merged

grahamking enabled auto-merge (squash) July 22, 2025 18:59

tedzhouhk reviewed Jul 22, 2025

View reviewed changes

PeaBrane approved these changes Jul 22, 2025

View reviewed changes

grahamking disabled auto-merge July 22, 2025 19:32

grahamking merged commit f3e3d94 into main Jul 22, 2025
12 of 13 checks passed

grahamking deleted the aflowers/vllm-python-ux branch July 22, 2025 19:33

This was referenced Jul 23, 2025

fix: vllm deployment examples #2062

Merged

fix: agg router test #2123

Merged

coderabbitai bot mentioned this pull request Aug 14, 2025

feat: Support python -m dynamo.frontend --version #2449

Merged

This was referenced Aug 29, 2025

feat: Dynamo deployment and benchmarking recipe for llama3-70b and oss-gpt-120b #2792

Merged

test: Metric labels unit test for vLLM. #2820

Closed

refactor: vLLM to new Python UX #1983

refactor: vLLM to new Python UX #1983

Uh oh!

Conversation

alec-flowers commented Jul 17, 2025 • edited by grahamking Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 22, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tedzhouhk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grahamking commented Jul 22, 2025

Uh oh!

grahamking commented Jul 22, 2025

Uh oh!

biswapanda commented Jul 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alec-flowers commented Jul 17, 2025 •

edited by grahamking

Loading