-
Notifications
You must be signed in to change notification settings - Fork 666
fix: creating a quickstart.md, readme, and making other updates #3197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: athreesh <[email protected]>
- Update backend READMEs with correct KVBM status - Simplify Kubernetes README with cleaner structure - Fix multinode example to use correct dynamo.sglang command - Add missing --skip-tokenizer-init flags Signed-off-by: athreesh <[email protected]>
WalkthroughDocs were extensively reorganized: top-level README rewritten, Kubernetes guide updated, and a new Quickstart added. Backend README feature matrices updated for KVBM status. Multinode example commands adjusted for SGLang module entrypoint and pkill pattern. No code/API changes. Changes
Sequence Diagram(s)Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/kubernetes/README.md (2)
84-97: Namespace mismatch in example.You set NAMESPACE=dynamo-kubernetes above, but here reset it to dynamo-cloud. This will cause confusion.
-export NAMESPACE=dynamo-cloud +# Use the same namespace as platform install +export NAMESPACE=dynamo-kubernetes
109-116: Leading slash links break on GitHub.Links like
/docs/kubernetes/api_reference.mdresolve to github.com/docs/… not this repo. Make them relative.-- **[API Reference](/docs/kubernetes/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment` -- **[Operator Guide](/docs/kubernetes/dynamo_operator.md)** - Dynamo operator configuration and management -- **[Create Deployment](/docs/kubernetes/create_deployment.md)** - Step-by-step deployment creation examples +- **[API Reference](./api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment` +- **[Operator Guide](./dynamo_operator.md)** - Dynamo operator configuration and management +- **[Create Deployment](./create_deployment.md)** - Step-by-step deployment creation examplesAlso fix leading slashes in “Additional Resources” to relative links inside docs/kubernetes.
docs/quickstart.md (1)
299-301: Broken Support Matrix link.From docs/, drop the “docs/” prefix.
-For detailed compatibility information, see the [Support Matrix](docs/support_matrix.md). +For detailed compatibility information, see the [Support Matrix](support_matrix.md).
🧹 Nitpick comments (11)
components/backends/sglang/README.md (2)
50-51: Typo: “does not router to DP worker”.Change “router” to “route”.
- | **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker | + | **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not route to DP worker |
190-191: Typo: “conjuction”.Change to “conjunction”.
- ... tokenizer_manager) is used in conjuction with NIXL ... + ... tokenizer_manager) is used in conjunction with NIXL ...components/backends/trtllm/README.md (1)
191-201: Duplicate “Client”/“Benchmarking” sections.“Client” appears twice (Lines 191–196 and 231–236) and “Benchmarking” twice (197–201 and 237–241). Consolidate to reduce redundancy.
examples/basics/multinode/README.md (1)
156-161: Minor formatting issues in the info note.There are stray “>” artifacts in “different > GPUs” / “token > generation”. Remove the stray symbols.
- > - `CUDA_VISIBLE_DEVICES`: Controls which GPU each worker uses (0 and 1 for different > GPUs) + > - `CUDA_VISIBLE_DEVICES`: Controls which GPU each worker uses (0 and 1 for different GPUs) - > - `--disaggregation-mode`: Separates prefill (prompt processing) from decode (token > generation) + > - `--disaggregation-mode`: Separates prefill (prompt processing) from decode (token generation)docs/kubernetes/README.md (2)
73-81: Typo: “replicaes”.Change to “replicas”.
-### **Multi-node Deployment** (Model replicaes across multiple nodes) +### **Multi-node Deployment** (Model replicas across multiple nodes)
167-187: Worker command examples — fix sglang to consolidated entrypoint.Use
python3 -m dynamo.sglangconsistently.- - >- - python3 -m dynamo.sglang + - >- + python3 -m dynamo.sglang(If any legacy
dynamo.sglang.workerinvocations exist elsewhere, update them.)docs/quickstart.md (3)
27-34: Pin extras but avoid hard version pin unless required.Pinning ai-dynamo to 0.5.0 is fine for reproducibility. Consider adding a brief note that users can omit the pin for latest.
41-44: Consider using the local compose file if running from a clone.If users cloned the repo,
docker compose -f deploy/docker-compose.yml up -davoids curl. Optionally add as an alternative.
206-208: Service name in port-forward is deployment-specific.Using
svc/agg-vllm-frontendassumes the aggregated vLLM sample. Consider adding a note that the service name varies by chosen manifest.README.md (2)
150-155: TRT-LLM run command should use --model-path.Align with backend README examples.
-| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](docs/quickstart.md#tensorrt-llm-backend) for setup. | +| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](docs/quickstart.md#tensorrt-llm-backend) for setup. |
164-166: markdownlint warnings: heading/emphasis.Consider replacing the emphasized “For contributors and advanced users” with a proper heading and ensure heading levels increment by one.
-**For contributors and advanced users** +### For contributors and advanced usersAlso confirm surrounding headings maintain the increment rule.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
README.md(6 hunks)components/backends/sglang/README.md(1 hunks)components/backends/trtllm/README.md(1 hunks)components/backends/vllm/README.md(1 hunks)docs/kubernetes/README.md(2 hunks)docs/quickstart.md(1 hunks)examples/basics/multinode/README.md(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
README.md
🪛 GitHub Check: Check for broken markdown links
docs/quickstart.md
[failure] 299-299:
Broken link: Support Matrix - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L299
[failure] 243-243:
Broken link: Multi-node Deployment - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L243
[failure] 242-242:
Broken link: Logging Configuration - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L242
[failure] 241-241:
Broken link: Monitoring Setup - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L241
[failure] 240-240:
Broken link: Installation Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L240
[failure] 239-239:
Broken link: API Reference - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L239
[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232
[failure] 231-231:
Broken link: Runtime Examples - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L231
🪛 markdownlint-cli2 (0.18.1)
README.md
164-164: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
166-166: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (13)
components/backends/vllm/README.md (2)
43-43: Confirm KVBM status and cross-doc consistency.KVBM flipped to ✅. Please confirm this aligns with the backend’s current capabilities and that linked docs (kvbm_architecture.md) accurately describe any limitations.
169-169: Version-pinned vLLM CLI docs link may drift.The link pins vLLM docs to v0.9.2. If our supported vLLM version differs, adjust the link or note the required version to avoid confusion.
components/backends/sglang/README.md (1)
43-44: KVBM moved to WIP — LGTM.Status adjustment to WIP looks good and matches broader PR intent.
Please ensure other references (quickstart, k8s docs) don’t overstate feature readiness for SGLang.
components/backends/trtllm/README.md (1)
60-60: KVBM marked as available — LGTM.Matches the PR theme and linked guide exists below. Consider adding a brief note about minimum TRT-LLM/Dynamo versions if applicable.
examples/basics/multinode/README.md (3)
134-153: Module entrypoint change topython3 -m dynamo.sglang— LGTM.Matches repo-wide shift to top-level package entrypoint. The flags also align with disagg prefill/decode.
168-188: Replica 2 commands — LGTM.Consistent with Replica 1 changes and disagg pattern.
476-477: Updated pkill pattern — LGTM.Matches the broadened module path change.
docs/kubernetes/README.md (2)
27-41: Versioning and Grove/KAI options — LGTM.RELEASE_VERSION bump to 0.5.0 and optional Grove/KAI flags look fine.
92-97: CRD kind casing in kubectl commands.Using
kubectl get dynamoGraphDeploymenttypically fails. CRDs are usually plural, lowercase (e.g., dynamographdeployments). Verify the resource name and update both “get” and later “delete” examples accordingly.- kubectl get dynamoGraphDeployment -n ${NAMESPACE} + kubectl get dynamographdeployments -n ${NAMESPACE}Apply similar change to the delete command later in the doc.
docs/quickstart.md (1)
146-158: K8s install section — LGTM.Consistent with the k8s README platform install.
README.md (3)
28-45: Crisp positioning and intro — LGTM.Clear problem statement and value prop; good.
67-85: Quickstart link and local steps — LGTM.Good cross-link to the new quickstart and concise local flow.
132-133: Kubernetes docs link — LGTM.Points to the updated k8s README.
| | **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | | ||
| | **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | ||
|
|
||
| ### **Disaggregated Serving** | ||
| Prefill and decode phases run on separate workers - higher performance and scalability. | ||
|
|
||
| | Backend | Configuration | Deploy Command | | ||
| |---------|---------------|----------------| | ||
| | **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` | | ||
| | **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | ||
| | **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | ||
|
|
||
| ### **Multi-node Deployment** (Model replicaes across multiple nodes) | ||
| Scale disaggregated serving across multiple Kubernetes nodes for maximum performance. | ||
|
|
||
| | Backend | Configuration | Deploy Command | | ||
| |---------|---------------|----------------| | ||
| | **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix broken relative links to backend deploy manifests.
These links are relative to docs/kubernetes. They should traverse up two directories.
Apply:
-| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Aggregated](../../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
-| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Aggregated + Router](../../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
-| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated](../../components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
-| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated + Router](../../components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
-| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated](../../components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
-| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated + Router](../../components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |Repeat the same ../../ prefix for all Disaggregated and Multi-node tables in this section.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| ### **Disaggregated Serving** | |
| Prefill and decode phases run on separate workers - higher performance and scalability. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| ### **Multi-node Deployment** (Model replicaes across multiple nodes) | |
| Scale disaggregated serving across multiple Kubernetes nodes for maximum performance. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Aggregated](../../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Aggregated + Router](../../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated](../../components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated + Router](../../components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated](../../components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated + Router](../../components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| ### **Disaggregated Serving** | |
| Prefill and decode phases run on separate workers - higher performance and scalability. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| ### **Multi-node Deployment** (Model replicaes across multiple nodes) | |
| Scale disaggregated serving across multiple Kubernetes nodes for maximum performance. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | |
🤖 Prompt for AI Agents
In docs/kubernetes/README.md around lines 52 to 80, the relative links to
backend deploy manifests in the Disaggregated and Multi-node tables are
incorrect because they are relative to docs/kubernetes and need to traverse up
two directories; update every link target and the corresponding kubectl apply
command for all entries in the Disaggregated and Multi-node sections to prepend
"../../" so they point to ../../components/backends/... (i.e., change
components/backends/... to ../../components/backends/... for both the
Configuration link and the Deploy Command for each row).
| uv pip install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 | ||
| uv pip install "cuda-python>=12,<13" | ||
| sudo apt-get -y install libopenmpi-dev | ||
|
|
||
| # 3. Install | ||
| uv pip install "ai-dynamo[trtllm]" | ||
|
|
||
| # 4. Run | ||
| python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
PyTorch version/cuda wheel looks invalid and may break installs.
torch==2.7.1 with --index-url .../cu128 is likely not published. Recommend removing manual torch pin or pointing users to official TRT-LLM container instructions instead.
-uv pip install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
-uv pip install "cuda-python>=12,<13"
-sudo apt-get -y install libopenmpi-dev
+# Prefer NVIDIA PyTorch container (NGC) or follow TensorRT-LLM’s official install guide.
+# If installing via pip, refer to PyTorch’s official compatibility matrix for your CUDA version:
+# https://pytorch.org/get-started/locally/Would you like me to update this subsection to a container-first flow consistent with the TRT-LLM backend README?
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== docs/quickstart.md lines 117-126 ==="
if [ -f docs/quickstart.md ]; then
sed -n '117,126p' docs/quickstart.md || true
else
echo "docs/quickstart.md not found"
fi
echo
echo "=== repo matches for 'torch==' or 'cu128' ==="
rg -n -C2 --hidden --no-ignore 'torch==|cu128|download.pytorch.org/whl/cu128' || trueLength of output: 2820
Fix PyTorch wheel spec in quickstart: use +cu128 build or prefer container-first install
docs/quickstart.md pins torch==2.7.1 with --index-url .../whl/cu128 — torch==2.7.1 will not match the cu128 wheels; the repo expects the build-tagged version 2.7.1+cu128 (see container/deps/vllm/install_vllm.sh:143 and deploy/dynamo_check.py:53).
Action (choose one): replace the quickstart line with torch==2.7.1+cu128 (and pin matching torchvision/torchaudio) OR remove the manual torch pin and direct users to PyTorch’s compatibility matrix / the TRT‑LLM container install flow.
Location: docs/quickstart.md lines 117–119.
🤖 Prompt for AI Agents
In docs/quickstart.md around lines 117–126, the PyTorch install pins
"torch==2.7.1" which will not match the cu128 wheels; update the quickstart to
either (1) use the build-tagged packages (e.g., replace torch==2.7.1 with
torch==2.7.1+cu128 and pin matching torchvision/torchaudio build tags) so the
--index-url cu128 wheel pattern matches, or (2) remove the hard torch pin
entirely and replace with a brief instruction directing users to PyTorch’s
compatibility matrix or the TRT‑LLM container-first install flow; implement one
of these two options and ensure the text and any package pins remain consistent
with container/deps/vllm/install_vllm.sh and deploy/dynamo_check.py.
| | **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | | ||
| | **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | ||
|
|
||
| #### **Disaggregated Serving** (Multi-node, specialized workers) | ||
| Prefill and decode phases run on separate workers - higher performance and scalability. | ||
|
|
||
| | Backend | Configuration | Deploy Command | | ||
| |---------|---------------|----------------| | ||
| | **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` | | ||
| | **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | ||
| | **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | ||
|
|
||
| #### **Multi-node Deployment** (Distributed across multiple nodes) | ||
| Scale disaggregated serving across multiple Kubernetes nodes for maximum performance. | ||
|
|
||
| | Backend | Configuration | Deploy Command | | ||
| |---------|---------------|----------------| | ||
| | **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | ||
| | **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | ||
| | **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix relative links to backend deploy manifests (broken).
From docs/quickstart.md, the “components/backends/…” links need a “../”.
- [Aggregated](components/backends/vllm/deploy/agg.yaml)
+ [Aggregated](../components/backends/vllm/deploy/agg.yaml)
- [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml)
+ [Aggregated + Router](../components/backends/vllm/deploy/agg_router.yaml)
- [Disaggregated](components/backends/vllm/deploy/disagg.yaml)
+ [Disaggregated](../components/backends/vllm/deploy/disagg.yaml)
- [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml)
+ [Disaggregated + Router](../components/backends/vllm/deploy/disagg_router.yaml)
- [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml)
+ [Disaggregated + Planner](../components/backends/vllm/deploy/disagg_planner.yaml)Repeat the same “../” fix for SGLang and TensorRT-LLM rows in Aggregated/Disaggregated/Multi-node tables in this section.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| #### **Disaggregated Serving** (Multi-node, specialized workers) | |
| Prefill and decode phases run on separate workers - higher performance and scalability. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| #### **Multi-node Deployment** (Distributed across multiple nodes) | |
| Scale disaggregated serving across multiple Kubernetes nodes for maximum performance. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Aggregated](../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Aggregated + Router](../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` | | |
| #### **Disaggregated Serving** (Multi-node, specialized workers) | |
| Prefill and decode phases run on separate workers - higher performance and scalability. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Disaggregated](../components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Router](../components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **vLLM** | [Disaggregated + Planner](../components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` | | |
| #### **Multi-node Deployment** (Distributed across multiple nodes) | |
| Scale disaggregated serving across multiple Kubernetes nodes for maximum performance. | |
| | Backend | Configuration | Deploy Command | | |
| |---------|---------------|----------------| | |
| | **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | | |
| | **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` | |
🤖 Prompt for AI Agents
In docs/quickstart.md around lines 169 to 198 the Markdown links to backend
deploy manifests are broken because they point to components/backends/...
relative to the current file; update every such link in the Aggregated,
Disaggregated and Multi-node tables to prepend "../" (e.g. change
components/backends/... to ../components/backends/...) for vLLM, SGLang and
TensorRT-LLM rows so the links correctly resolve to the manifests; apply the
same ../ prefix consistently across all three tables in this section.
| - **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings | ||
| - **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken relative link to runtime examples.
From docs/, link to lib/ should go up one directory.
-- **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings
+- **[Runtime Examples](../lib/bindings/python/README.md)** - Low-level Python<>Rust bindings📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings | |
| - **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing | |
| - **[Runtime Examples](../lib/bindings/python/README.md)** - Low-level Python<>Rust bindings | |
| - **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing |
🧰 Tools
🪛 GitHub Check: Check for broken markdown links
[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232
[failure] 231-231:
Broken link: Runtime Examples - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L231
🤖 Prompt for AI Agents
In docs/quickstart.md around lines 231 to 233 the relative link to the runtime
examples is incorrect (it points to lib/bindings/python/README.md from docs/
instead of up one level); update the link to reference the correct relative path
by prepending "../" so it resolves from docs/ to lib/bindings/python/README.md.
| - **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken link to KV-Aware Routing.
From docs/, use “architecture/…”, not “docs/architecture/…”.
-- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing
+- **[KV-Aware Routing](architecture/kv_cache_routing.md)** - Understand intelligent request routing📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing | |
| - **[KV-Aware Routing](architecture/kv_cache_routing.md)** - Understand intelligent request routing |
🧰 Tools
🪛 GitHub Check: Check for broken markdown links
[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232
🤖 Prompt for AI Agents
In docs/quickstart.md around lines 232 to 233, the link to KV-Aware Routing is
broken because it uses "docs/architecture/kv_cache_routing.md" instead of the
correct relative path "architecture/kv_cache_routing.md"; update the markdown
link target to **architecture/kv_cache_routing.md** (i.e., change the href to
remove the leading "docs/") so the link resolves correctly from the docs/
directory.
| - **[API Reference](docs/kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications | ||
| - **[Installation Guide](docs/kubernetes/installation_guide.md)** - Detailed platform setup | ||
| - **[Monitoring Setup](docs/kubernetes/metrics.md)** - Observability and metrics | ||
| - **[Logging Configuration](docs/kubernetes/logging.md)** - Centralized logging setup | ||
| - **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)** - Scale across K8s nodes | ||
| - **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken links from docs root to docs subpaths.
These should be relative to docs/, not prefixed with “docs/…”.
-- **[API Reference](docs/kubernetes/api_reference.md)**
+- **[API Reference](kubernetes/api_reference.md)**
-- **[Installation Guide](docs/kubernetes/installation_guide.md)**
+- **[Installation Guide](kubernetes/installation_guide.md)**
-- **[Monitoring Setup](docs/kubernetes/metrics.md)**
+- **[Monitoring Setup](kubernetes/metrics.md)**
-- **[Logging Configuration](docs/kubernetes/logging.md)**
+- **[Logging Configuration](kubernetes/logging.md)**
-- **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)**
+- **[Multi-node Deployment](kubernetes/multinode-deployment.md)**📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **[API Reference](docs/kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications | |
| - **[Installation Guide](docs/kubernetes/installation_guide.md)** - Detailed platform setup | |
| - **[Monitoring Setup](docs/kubernetes/metrics.md)** - Observability and metrics | |
| - **[Logging Configuration](docs/kubernetes/logging.md)** - Centralized logging setup | |
| - **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)** - Scale across K8s nodes | |
| - **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload | |
| --- | |
| - **[API Reference](kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications | |
| - **[Installation Guide](kubernetes/installation_guide.md)** - Detailed platform setup | |
| - **[Monitoring Setup](kubernetes/metrics.md)** - Observability and metrics | |
| - **[Logging Configuration](kubernetes/logging.md)** - Centralized logging setup | |
| - **[Multi-node Deployment](kubernetes/multinode-deployment.md)** - Scale across K8s nodes | |
| - **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload | |
| --- |
🧰 Tools
🪛 GitHub Check: Check for broken markdown links
[failure] 243-243:
Broken link: Multi-node Deployment - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L243
[failure] 242-242:
Broken link: Logging Configuration - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L242
[failure] 241-241:
Broken link: Monitoring Setup - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L241
[failure] 240-240:
Broken link: Installation Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L240
[failure] 239-239:
Broken link: API Reference - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L239
🤖 Prompt for AI Agents
In docs/quickstart.md around lines 239 to 246 the documentation links are
incorrectly prefixed with "docs/"; update each link target to be relative to the
docs/ root (remove the leading "docs/") so they point to the correct subpaths
(e.g., change "docs/kubernetes/api_reference.md" to
"kubernetes/api_reference.md", and similarly for installation_guide.md,
metrics.md, logging.md, multinode-deployment.md and benchmarks/).
| | [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ | | ||
| | [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 | | ||
| | [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ | | ||
| | [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 | | ||
| | [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ | | ||
| | [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leading slash links break on GitHub.
Change /docs/... to docs/....
-| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
-| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
-| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
-| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |
-| [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |
+| [**Disaggregated Serving**](docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
+| [**Conditional Disaggregation**](docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
+| [**KV-Aware Routing**](docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
+| [**SLA-Based Planner**](docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |
+| [**KVBM**](docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ | | |
| | [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 | | |
| | [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ | | |
| | [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 | | |
| | [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ | | |
| | [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ | | |
| | [**Disaggregated Serving**](docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ | | |
| | [**Conditional Disaggregation**](docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 | | |
| | [**KV-Aware Routing**](docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ | | |
| | [**SLA-Based Planner**](docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ | | |
| | [**KVBM**](docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ | |
🤖 Prompt for AI Agents
In README.md around lines 54 to 59 the internal links use a leading slash (e.g.
/docs/...), which breaks on GitHub; remove the leading slash for each link in
that block so they read docs/... instead, updating all occurrences in those
lines to relative paths.
| python -m dynamo.frontend --http-port 8000 | ||
|
|
||
| # Terminal 2: Start backend worker | ||
| python -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but a 0.6B model will download much faster. Less waiting around. I believe Qwen3 0.6B is also vllm's default nowadays.
| @@ -0,0 +1,300 @@ | |||
| # Dynamo Quickstart Guide | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we have a lot of doc duplication.
why can't we just point to kubernetes/README.md in this quickstart kubernetes section ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- we have a couple of docs that could be reasonably described as quickstarts including examples/basics/quickstart, README.md#3-run-dynamo, docs/kubernetes/README.md etc. If we want to create a new one here lets delete at least one of the others
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
UX Documentation Improvements
Enhanced README structure - Streamlined content with clearer navigation and improved quickstart links
Added comprehensive quickstart guide - New docs/quickstart.md with step-by-step local and Kubernetes deployment instructions
Fixed backend documentation - Updated KVBM status indicators and corrected deprecated SGLang commands in multinode examples
Simplified Kubernetes docs - Cleaner deployment patterns and consolidated backend configuration references
Summary by CodeRabbit