-
Notifications
You must be signed in to change notification settings - Fork 676
docs: Examples README/restructuring #2174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…stributed runtime docs
WalkthroughThis update introduces new and significantly expanded documentation across several directories. It restructures and enriches the README files for the TensorRT-LLM and vLLM backends, adds an overview README to the examples directory, and provides step-by-step deployment guides for Dynamo on Amazon EKS with EFS and for deploying vLLM on Kubernetes. The distributed runtime architecture documentation is also clarified and updated. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Documentation
participant Kubernetes Cluster
participant AWS Services (EKS/EFS/ECR)
participant Docker/Helm
User->>Documentation: Reads deployment/setup instructions
User->>AWS Services (EKS/EFS/ECR): Provisions resources per guide
User->>Docker/Helm: Builds and pushes images as instructed
User->>Kubernetes Cluster: Applies manifests/Helm charts
Kubernetes Cluster->>AWS Services (EKS/EFS): Integrates with EFS and pulls images from ECR
User->>Kubernetes Cluster: Verifies deployment and tests inference per documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
components/backends/trtllm/README.md (1)
201-204: Broken link to KV-cache transfer guideFile name is misspelled (
kv-cache-tranfer.md).-see the [KV cache transfer guide](./kv-cache-tranfer.md) +see the [KV cache transfer guide](./kv-cache-transfer.md)
🧹 Nitpick comments (16)
examples/deployments/EKS/Deploy_VLLM_example.md (1)
28-31: Escape JSON newlines or use@data.jsonThe multi-line
curlbody relies on Bash line-continuation with back-slashes.
Beginners often copy the snippet verbatim into a single line shell which fails.
Recommend either:curl -X POST localhost:8080/v1/chat/completions \ -H 'Content-Type: application/json' \ --data @request.jsonor wrap the entire JSON in a single line.
examples/README.md (2)
67-70: Fix typo & clarify Python version requirement
Python 3.9++is ambiguous. If the intent is “3.9 or later” use the conventional3.9+.- - **Python 3.9++** - For client scripts and utilities + - **Python 3.9+** – for client scripts and utilities
76-79: Correct list indentation (markdown-lint MD007)The nested list under Framework Support is indented with four spaces; two are enough for proper nesting and to satisfy md-lint rules.
- - **[vLLM](../components/backends/vllm/)** – vLLM-specific deployment and configuration - - **[SGLang](../components/backends/sglang/)** – SGLang integration examples and workflows - - **[TensorRT-LLM](../components/backends/trtllm/)** – TensorRT-LLM workflows and optimizations + - **[vLLM](../components/backends/vllm/)** – vLLM-specific deployment and configuration + - **[SGLang](../components/backends/sglang/)** – SGLang integration examples and workflows + - **[TensorRT-LLM](../components/backends/trtllm/)** – TensorRT-LLM workflows and optimisationsexamples/deployments/EKS/Deploy_Dynamo_Cloud.md (3)
5-7: Spelling typo
repositoriy→repository.-Create 1 ECR repositoriy +Create one ECR repository
44-50: Namespace creation duplicated
kubectl create namespace ${NAMESPACE}appears here and again at Line 69.
Invoke it once or guard the second call with|| trueto avoid “already exists” noise.-kubectl create namespace ${NAMESPACE} +kubectl create namespace ${NAMESPACE} 2>/dev/null || true
60-64: CRDs are cluster-scoped – namespace flag not needed
helm install dynamo-crds … --namespace defaultis harmless but misleading: CRDs ignore namespaces.
Omit the flag or add a note that any namespace is acceptable.examples/deployments/EKS/Create_EKS_EFS.md (2)
112-114: Tighten wording“inside of” → “inside”.
-... running inside of said pods ... +... running inside said pods ...
125-126: Simplify phrasingReplace “are able to access” with “can access”.
-... whether your nodes are able to access the created EFS file system. +... whether your nodes can access the created EFS file system.components/backends/vllm/README.md (1)
55-55: Grammar – duplicate word“all of our the” → “all of the”.
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of the common deployment patterns on a single node.docs/architecture/distributed_runtime.md (5)
22-29: Capitalize “Dynamo” and fix plural-singular mismatchSmall wording issues reduce credibility of technical docs:
-Dynamo's `DistributedRuntime` ... -... each dynamo components typically are deployed with its own process and thus has its own `DistributedRuntime` object. +Dynamo’s `DistributedRuntime` ... +... each Dynamo component is typically deployed in its own process and thus has its own `DistributedRuntime` object.
31-38: Example namespace and wording are confusing
- Example shows
runtime.namespace("dynamo")while earlier you state real deployments use namespaces likevllm-v1-agg. Using a placeholder that contradicts the preceding paragraph can mislead readers.- “Then, it route the request to the
Worker.” → “it routes”.- “Worker components create components with names like
worker,decode, orprefilland register endpoints like …” – consider breaking this very long sentence into two for readability.No code change needed if the doc is clear, but tightening the language will avoid reader confusion.
58-60: Typo “Componenet”-`Namespace`, `Componenet`, and `Endpoint` +`Namespace`, `Component`, and `Endpoint`
63-64: Typo “objected”-When a `Client` objected is created +When a `Client` object is created
78-79: Dangling back-tick & incomplete sentenceThe sentence ends abruptly after the back-tick.
-... `/components/backends`. ` +... `/components/backends` for full implementation details.components/backends/trtllm/README.md (2)
69-73: Minor grammar – double article-Below we provide a guide that lets you run all of our the common deployment patterns +Below we provide a guide that lets you run all of our common deployment patterns
150-156: Env-var explanation repeats “worker”Paragraph reads “… which can be set on the Backend that tells the Frontend how many times a request can be migrated…”. The wording is clear, but consider adding a concrete CLI example mirroring earlier examples:
./container/run.sh --framework tensorrtllm --migration-limit 3Not blocking, but improves discoverability.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
components/backends/trtllm/README.md(8 hunks)components/backends/vllm/README.md(3 hunks)docs/architecture/distributed_runtime.md(3 hunks)examples/README.md(1 hunks)examples/deployments/EKS/Create_EKS_EFS.md(1 hunks)examples/deployments/EKS/Deploy_Dynamo_Cloud.md(1 hunks)examples/deployments/EKS/Deploy_VLLM_example.md(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
examples/README.md (1)
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
examples/deployments/EKS/Deploy_Dynamo_Cloud.md (1)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
examples/deployments/EKS/Deploy_VLLM_example.md (1)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
components/backends/trtllm/README.md (4)
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: nnshah1
PR: #2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
docs/architecture/distributed_runtime.md (4)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2174/merge) by athreesh.
examples/README.md
[error] 19-19: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'.
components/backends/vllm/README.md
[error] 52-52: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'.
docs/architecture/distributed_runtime.md
[error] 31-31: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'.
🪛 LanguageTool
examples/deployments/EKS/Create_EKS_EFS.md
[style] ~112-~112: This phrase is redundant. Consider using “inside”.
Context: ... and then accessed by processes running inside of said pods as if they were part of the p...
(OUTSIDE_OF)
[style] ~125-~125: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...ly. This will affect whether your nodes are able to access the created EFS file system. ##...
(BE_ABLE_TO)
🪛 markdownlint-cli2 (0.17.2)
examples/deployments/EKS/Create_EKS_EFS.md
49-49: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
50-50: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
components/backends/trtllm/README.md
39-39: Link fragments should be valid
(MD051, link-fragments)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (2)
examples/deployments/EKS/Create_EKS_EFS.md (1)
135-149: Explicitly mark thefileSystemIdplaceholderReaders may apply the manifest verbatim and accidentally commit your sample ID.
Recommend making the placeholder obvious:- fileSystemId: fs-01e72da3fcdbf8a4d + fileSystemId: <YOUR_EFS_FILESYSTEM_ID>components/backends/vllm/README.md (1)
65-70: Ensure the compose path is correctThe text links to
../../../deploy/docker-compose.ymlbut the command usesdeploy/docker-compose.yml.
If the compose file resides at repo root, suggest aligning both:-docker compose -f deploy/docker-compose.yml up -d +docker compose -f ../../../deploy/docker-compose.yml up -d(or vice-versa) to avoid a “No such file” error.
Resolved conflicts in: - components/backends/trtllm/README.md: Fixed table of contents anchor and table formatting - components/backends/vllm/README.md: Updated feature support status from main - docs/architecture/distributed_runtime.md: Kept backend-agnostic description, used cleaner Python examples text - examples/deployments/EKS/Deploy_Dynamo_Cloud.md: Removed trailing slash from ECR login command
nealvaidya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Anish <[email protected]>
Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Anish <[email protected]>
Overview:
Added a readme to /examples and moved directories around to make the current structure clear to folks new to Dynamo
Additionally, merged with another branch that had approved README changes for each of the frameworks in /components/backends
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit