-
Notifications
You must be signed in to change notification settings - Fork 694
refactor: vLLM to new Python UX #1983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0069741 to
875941b
Compare
WalkthroughThis change removes Apache 2.0 license headers from multiple files, updates documentation and test paths to reflect a new directory structure, standardizes launch scripts to use Python module invocations, adds a requirements file, and introduces a Python module entry point for the vLLM backend. One obsolete launch script is deleted. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant LaunchScript
participant PythonModule
participant WorkerProcess
User->>LaunchScript: Run agg.sh/dep.sh/disagg.sh/agg_router.sh
LaunchScript->>PythonModule: python -m dynamo.frontend [--router-mode]
LaunchScript->>PythonModule: python -m dynamo.vllm [args]
PythonModule->>WorkerProcess: Start vLLM worker(s)
Estimated code review effort2 (~15 minutes) Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (9)
components/backends/vllm/src/dynamo/vllm/args.py (1)
131-160: Race window after releasing the socket
You reserve the OS port, write to ETCD, then close the socket before the vLLM process binds. Another local process can theoretically grab it in the interval. Mitigate by re-binding immediately after ETCD success (kept open until worker starts) or by settingSO_EXCLUSIVEADDRUSE/SO_REUSEPORTsemantics where available.components/backends/vllm/deploy/disagg.yaml (1)
84-88: Launch command still points tocomponents/main.py
The PR description says launch scripts are now module-based (python -m dynamo.vllm …). Update the manifest to avoid shipping a stale path.- args: - - "python3 components/main.py --model Qwen/Qwen3-0.6B --enforce-eager 2>&1 | tee /tmp/vllm.log" + args: + - "python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager 2>&1 | tee /tmp/vllm.log"components/backends/vllm/src/dynamo/vllm/handlers.py (1)
142-147: PossibleAttributeError– blindly calling.data()on the response
prefill_response = await anext(...).is immediately passed toprefill_response.data(), but it’s unclear that the returned object actually exposes adata()method. If it is alreadybytes/str, this will crash.- prefill_response = MyRequestOutput.model_validate_json( - prefill_response.data() - ) + payload = ( + prefill_response + if isinstance(prefill_response, (bytes, str)) + else prefill_response.data() + ) + prefill_response = MyRequestOutput.model_validate_json(payload)components/backends/vllm/deploy/agg_router.yaml (1)
40-50: Deployment paths still point to removedexamples/vllmdirectory
workingDir: /workspace/examples/vllmand the command underargswill 404 after the refactor. Update to the new layout or use module invocation, e.g.:- workingDir: /workspace/examples/vllm + workingDir: /workspace/components/backends/vllm … - - "python3 components/main.py …" + - "python -m dynamo.vllm …"components/backends/vllm/deploy/disagg_router.yaml (2)
40-50: Same stale directory reference as inagg_router.yaml
workingDirand ingress command still useexamples/vllm; containers will crash. Align with the new path (components/backends/vllm) or switch to module execution.
124-130: Prefill-worker command outdated
python3 components/main.py …no longer exists. Switch topython -m dynamo.vllm …or correct absolute path.tests/serve/test_vllm.py (1)
200-206:agg-routerstill references the old examples path
directory="/workspace/examples/vllm"was not migrated. If the examples tree was removed in this refactor, this will raiseFileNotFoundErrorand fail the pytest matrix.- directory="/workspace/examples/vllm", + directory="/workspace/components/backends/vllm",components/backends/vllm/launch/disagg_router.sh (1)
8-20: Add wait logic to surface failures.Same failure-propagation issue as the other scripts. A minimal addition after spawning all jobs:
wait -n exit $?components/backends/vllm/launch/dsr1_dep.sh (1)
79-83: Create the log directory before piping output intotee
$LOG_DIRis referenced on L79, butmkdir -p $LOG_DIRis executed only on L82.
If the directory does not exist,teefails and the ingress process terminates silently, breaking the launch sequence.-# run ingress if it's node 0 -if [ $NODE_RANK -eq 0 ]; then - DYN_LOG=debug python -m dynamo.frontend --router-mode kv 2>&1 | tee $LOG_DIR/dsr1_dep_ingress.log & -fi - -mkdir -p $LOG_DIR +# ensure the log directory exists early +mkdir -p "$LOG_DIR" + +# run ingress if it's node 0 +if [ "$NODE_RANK" -eq 0 ]; then + DYN_LOG=debug python3 -m dynamo.frontend --router-mode kv 2>&1 | tee "$LOG_DIR/dsr1_dep_ingress.log" & +fi
♻️ Duplicate comments (3)
components/backends/vllm/deploy/disagg_planner.yaml (1)
84-88: Same stale entry-point as above
Replicate the module-style invocation here to stay consistent with the new UX.components/backends/vllm/deploy/agg.yaml (1)
84-88: Stale script path – switch to module execution
See earlier comment; update topython -m dynamo.vllm ….components/backends/vllm/multi-node.md (1)
76-83: Same foreground/background concern for disaggregated ingressConsider the same tweak (
&) or explicit instructions.
🧹 Nitpick comments (16)
components/backends/vllm/src/dynamo/vllm/args.py (3)
22-24:DEFAULT_MODELis declared but never used
The constant adds dead code and may confuse readers about where the model default is enforced.-DEFAULT_MODEL = "Qwen/Qwen3-0.6B"
26-43: Consider turningConfiginto a real dataclass
The class is currently a mutable “bag-of-attrs” created viaConfig(); config.foo = ….
Using@dataclass(orpydantic) makes required / optional fields explicit, provides defaults, and catches typos at runtime & in type-checkers.+from dataclasses import dataclass, field + +@dataclass +class Config: + # Dynamo-specific + namespace: str = "" + component: str = "" + endpoint: str = "" + is_prefill_worker: bool = False + kv_port: int | None = None + side_channel_port: int | None = None + + # vLLM + model: str = "" + served_model_name: str | None = None + + # Engine args + engine_args: AsyncEngineArgs | None = field(default=None)
101-105: Hard-codedblock_size = 16may not fit all models
Large models frequently want 32/64 to keep KV-cache fragmentation low. Consider exposing this tweak as--default-block-sizeor piggy-backing on vLLM’s internal default instead of overriding unconditionally.components/backends/vllm/deepseek-r1.md (1)
8-8: Minor typos in documentation
“seperate” → “separate”; also “…variety of different configurations”.-Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a seperate dynamo component +Dynamo supports running Deepseek R1 with data-parallel attention and wide expert parallelism. Each data-parallel attention rank is a separate Dynamo componentcomponents/backends/vllm/src/dynamo/vllm/__main__.py (1)
4-7: Use an explicit relative import for robustnessBecause
__main__.pyis executed withpython -m dynamo.vllm,__package__is already set todynamo.vllm. A relative import (from .main import main) avoids hard-coding the full package path, insulating you from future top-level package renames or vendoring.-from dynamo.vllm.main import main +from .main import maincomponents/backends/vllm/launch/dep.sh (1)
8-15: Allow GPU count to be parameterisedHard-coding
{0..3}ties this script to exactly four GPUs. Making it dynamic improves portability, e.g.:-GPU_IDS=${GPU_IDS:-0 1 2 3} -for i in {0..3}; do +GPU_IDS=${GPU_IDS:-0 1 2 3} +for i in ${GPU_IDS}; doCallers can then do
GPU_IDS="0 1" ./dep.sh.components/backends/vllm/multi-node.md (1)
108-109: Minor: keep backgrounding consistentThe large-model section backgrounds ingress (
&) – update earlier snippets to match once decision made.components/backends/vllm/launch/agg.sh (1)
8-12: Tiny formatting nitDouble space after
vllm– not harmful, but trimming keeps scripts tidy.-python -m dynamo.vllm --model +python -m dynamo.vllm --modelcomponents/backends/vllm/launch/disagg.sh (1)
4-6: Harden the script withpipefail/nounset.
set -ealone will not detect failures in pipelines or unset variables. Tighten fail-fast behaviour:-set -e +set -euo pipefailcomponents/backends/vllm/README.md (2)
56-60: Docs still reference the removeddynamo runCLI.The launch scripts in this PR now invoke the ingress via
python -m dynamo.frontend --router-mode kv, but the README paragraph (lines 58-60) tells users that “each shell script runsdynamo run …”. Please update to avoid confusion.
80-83: Path not updated to new directory structure.Line 81 still instructs:
cd examples/vllm, but the examples were moved tocomponents/backends/vllm. Adjust for consistency:-cd examples/vllm +cd components/backends/vllmcomponents/backends/vllm/launch/agg_router.sh (1)
4-6: Adopt strict shell options.Same rationale as in disagg.sh—add
-uandpipefail:-set -e +set -euo pipefailcomponents/backends/vllm/src/dynamo/vllm/main.py (1)
203-205: Wrap worker in top-level coroutine for clearer control flow.
uvloop.run(worker())works, yet hides the logical distinction between “parse args & set-up” and “start worker”. Consider:-uvloop.run(worker()) +async def _entry(): + await worker() # allows future pre-run setup / error handling + +uvloop.run(_entry())Not mandatory but improves readability and lets you insert pre-flight checks without editing two files (
main.py&__main__.py).components/backends/vllm/launch/disagg_router.sh (1)
4-6: Mirror strict mode in all launch scripts.For consistency with other suggestions:
-set -e +set -euo pipefailcomponents/backends/vllm/launch/dsr1_dep.sh (2)
79-80: Inconsistent interpreter (pythonvspython3) may pick different environments
python -m dynamo.frontend(L79) andpython3 -m dynamo.vllm(L92) might resolve to different binaries/venvs, leading to mismatched package sets. Pick one and expose it via a single variable, e.g.:PYTHON=${PYTHON:-python3} … DYN_LOG=debug "$PYTHON" -m dynamo.frontend … … "$PYTHON" -m dynamo.vllm …Also applies to: 92-93
88-101: Quote variable expansions that can contain spaces
$LOG_DIR,$MASTER_ADDR, and others are unquoted when used in commands (tee,--data-parallel-address, etc.).
Quoting prevents word-splitting and unexpected globbing:tee "$LOG_DIR/dsr1_dep_${dp_rank}.log" & … --data-parallel-address "$MASTER_ADDR" \
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
components/backends/vllm/README.md(5 hunks)components/backends/vllm/deepseek-r1.md(1 hunks)components/backends/vllm/deploy/agg.yaml(1 hunks)components/backends/vllm/deploy/agg_router.yaml(1 hunks)components/backends/vllm/deploy/disagg.yaml(1 hunks)components/backends/vllm/deploy/disagg_planner.yaml(1 hunks)components/backends/vllm/deploy/disagg_router.yaml(1 hunks)components/backends/vllm/launch/agg.sh(1 hunks)components/backends/vllm/launch/agg_router.sh(1 hunks)components/backends/vllm/launch/dep.sh(1 hunks)components/backends/vllm/launch/disagg.sh(1 hunks)components/backends/vllm/launch/disagg_router.sh(1 hunks)components/backends/vllm/launch/dsr1_dep.sh(2 hunks)components/backends/vllm/multi-node.md(5 hunks)components/backends/vllm/requirements.txt(1 hunks)components/backends/vllm/src/dynamo/vllm/__main__.py(1 hunks)components/backends/vllm/src/dynamo/vllm/args.py(1 hunks)components/backends/vllm/src/dynamo/vllm/handlers.py(1 hunks)components/backends/vllm/src/dynamo/vllm/main.py(2 hunks)components/backends/vllm/src/dynamo/vllm/protocol.py(0 hunks)components/backends/vllm/src/dynamo/vllm/publisher.py(0 hunks)examples/vllm/launch/agg_router.sh(0 hunks)pyproject.toml(1 hunks)tests/serve/test_vllm.py(2 hunks)
🧠 Learnings (19)
components/backends/vllm/deploy/disagg.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/deploy/agg_router.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/src/dynamo/vllm/args.py (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/deploy/disagg_router.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/deploy/disagg_planner.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
pyproject.toml (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/src/dynamo/vllm/__main__.py (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/deploy/agg.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/requirements.txt (1)
Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
components/backends/vllm/launch/dep.sh (2)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/README.md (4)
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.
Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.
components/backends/vllm/launch/disagg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/multi-node.md (6)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.
tests/serve/test_vllm.py (2)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.
Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
components/backends/vllm/launch/agg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/launch/disagg_router.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/launch/agg_router.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/src/dynamo/vllm/main.py (2)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
🧬 Code Graph Analysis (2)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
components/backends/vllm/src/dynamo/vllm/protocol.py (1)
MyRequestOutput(11-34)
components/backends/vllm/src/dynamo/vllm/__main__.py (1)
components/backends/vllm/src/dynamo/vllm/main.py (1)
main(203-204)
💤 Files with no reviewable changes (3)
- components/backends/vllm/src/dynamo/vllm/publisher.py
- components/backends/vllm/src/dynamo/vllm/protocol.py
- examples/vllm/launch/agg_router.sh
🧰 Additional context used
🧠 Learnings (19)
components/backends/vllm/deploy/disagg.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/deploy/agg_router.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/src/dynamo/vllm/args.py (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/deploy/disagg_router.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/deploy/disagg_planner.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
pyproject.toml (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/src/dynamo/vllm/__main__.py (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/vllm/deploy/agg.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.445Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.482Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
components/backends/vllm/requirements.txt (1)
Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
components/backends/vllm/launch/dep.sh (2)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/README.md (4)
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.
Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.
components/backends/vllm/launch/disagg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/multi-node.md (6)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: biswapanda
PR: #1890
File: examples/vllm/deploy/agg.yaml:63-70
Timestamp: 2025-07-14T23:01:16.218Z
Learning: In vLLM worker deployments, grep-based log checks for "VllmWorker.*has been initialized" are appropriate for readiness probes to verify worker startup, but should not be used for liveness probes which need to detect ongoing worker health.
tests/serve/test_vllm.py (2)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with /workspace/ prefix are intentional and should not be changed to relative paths.
Learnt from: ptarasiewiczNV
PR: #2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.951Z
Learning: The --torch-backend=auto flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
components/backends/vllm/launch/agg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/launch/disagg_router.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/launch/agg_router.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/vllm/src/dynamo/vllm/main.py (2)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
🧬 Code Graph Analysis (2)
components/backends/vllm/src/dynamo/vllm/handlers.py (1)
components/backends/vllm/src/dynamo/vllm/protocol.py (1)
MyRequestOutput(11-34)
components/backends/vllm/src/dynamo/vllm/__main__.py (1)
components/backends/vllm/src/dynamo/vllm/main.py (1)
main(203-204)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (10)
components/backends/vllm/src/dynamo/vllm/args.py (2)
65-70: Silent defaulting can mask user intent
Ifenable_prefix_cachingisn’t passed, you transparently flip it toTrue. That behaviour differs from vanilla vLLM and may surprise callers. At least log atINFO, or expose an explicit--enable-prefix-cachingdefault in help text.
230-238: Fail fast if vLLM adds new required CLI attrs
hasattr(config.engine_args, key)guards unknown keys but raises at runtime. Unit-test this list (task, skip_tokenizer_init, …) against upstream vLLM versions to avoid sudden breakage after upgrades.components/backends/vllm/src/dynamo/vllm/handlers.py (1)
17-18: Relative import is the right fixUsing
.protocolprevents unintended shadowing whenhandlers.pyis executed viapython -m dynamo.vllm.
Looks good.pyproject.toml (1)
82-82: Packaging update looks goodIncluding
components/backends/vllm/src/dynamoin the wheel ensures the new backend ships with the project.tests/serve/test_vllm.py (2)
186-192: 👍 Path update looks good
aggregatedconfiguration now points to the relocated backend directory and should resolve correctly.
213-217: Consistent directory fordisaggregatedconfigPath updated correctly here. No action needed.
components/backends/vllm/multi-node.md (2)
52-58: Ingress command blocks the terminalIn the previous scripts, ingress was backgrounded (
&). Here it runs foreground, preventing the worker command below from executing in the same shell. Either append&or add a note that a separate terminal is required.-python -m dynamo.frontend --router-mode kv +python -m dynamo.frontend --router-mode kv &
90-96: Flag name seems invertedDecode worker example passes
--is-prefill-worker. Double-check the flag name and semantics.components/backends/vllm/README.md (1)
24-32: Container run command can drop-itwhen launched from scripts.If the goal is a detached container for automated deployments, consider:
-./container/run.sh -it --framework VLLM [--mount-workspace] +./container/run.sh --framework VLLM [--mount-workspace]Keeping
-itis fine for local testing—just verify the intent.components/backends/vllm/launch/agg_router.sh (1)
10-13: Propagate background errors & forward signals cleanly.Only the second worker runs in foreground; failures in ingress / first worker are ignored.
Pattern identical to disagg.sh: addwait -nand return its status, orexecthe last worker so signals reach it directly.
grahamking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work.
tedzhouhk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should frontend in k8s also be using python -m ... (those yaml in deploy folder)?
Yes! I will put up a new PR for that. |
|
|
Lgtm 👍 |
python -m dynamo.vllm --helpSummary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores