fix: creating a quickstart.md, readme, and making other updates #3197

athreesh · 2025-09-24T10:14:47Z

UX Documentation Improvements

Enhanced README structure - Streamlined content with clearer navigation and improved quickstart links
Added comprehensive quickstart guide - New docs/quickstart.md with step-by-step local and Kubernetes deployment instructions
Fixed backend documentation - Updated KVBM status indicators and corrected deprecated SGLang commands in multinode examples
Simplified Kubernetes docs - Cleaner deployment patterns and consolidated backend configuration references

Summary by CodeRabbit

Documentation
- Overhauled main README with structured Quick Start, local development, and deployment guides.
- Added comprehensive Quickstart covering local and Kubernetes paths, with backend-specific steps.
- Expanded Kubernetes docs with version update (0.5.0), deployment patterns (Aggregated/Disaggregated/Multi-node), and explicit commands.
- Updated engine feature matrices: KVBM now WIP for SGLang; complete for vLLM and TensorRT-LLM.
- Enhanced cross-links, testing workflows, and next steps.
Examples
- Updated multinode example commands to use unified sglang entry point and adjusted process termination patterns.

Signed-off-by: athreesh <[email protected]>

- Update backend READMEs with correct KVBM status - Simplify Kubernetes README with cleaner structure - Fix multinode example to use correct dynamo.sglang command - Add missing --skip-tokenizer-init flags Signed-off-by: athreesh <[email protected]>

copy-pr-bot · 2025-09-24T10:14:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-24T10:21:38Z

Walkthrough

Docs were extensively reorganized: top-level README rewritten, Kubernetes guide updated, and a new Quickstart added. Backend README feature matrices updated for KVBM status. Multinode example commands adjusted for SGLang module entrypoint and pkill pattern. No code/API changes.

Changes

Cohort / File(s)	Summary of Changes
Top-level README overhaul `README.md`	Rewrote and expanded structure: new Quick Start and Local Development, stage-based install/run/test, Kubernetes deployment with Helm, concise engine table and links, contributor and testing workflows. Removed legacy sections.
Backend KVBM status updates `components/backends/vllm/README.md`, `components/backends/trtllm/README.md`, `components/backends/sglang/README.md`	Updated KVBM feature row: vLLM from WIP to ✅; TRT-LLM from Planned to ✅; SGLang from Planned ❌ to WIP 🚧 in relevant tables.
Kubernetes docs reorg `docs/kubernetes/README.md`	Bumped RELEASE_VERSION to 0.5.0. Replaced single backend section with detailed “Backend and Deployment Pattern” tables, added aggregated/disaggregated/multi-node flows and commands, and a step-by-step deploy example.
New Quickstart guide `docs/quickstart.md`	Added comprehensive quickstart covering local and Kubernetes paths, per-backend (vLLM/SGLang/TRT-LLM) steps, testing, cleanup, and troubleshooting.
SGLang multinode example updates `examples/basics/multinode/README.md`	Switched worker invocations to `python3 -m dynamo.sglang` (from module-specific workers) and broadened pkill pattern to `dynamo.sglang.*prefill`.

Sequence Diagram(s)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Update README.md #2938 — Updates KVBM status in framework support matrices, overlapping with backend README badge changes.
docs: add multinode example #2155 — Modifies the same multinode SGLang example README, affecting invocation patterns.
docs: Refactor README.md and add components/README.md #2141 — Large restructure of the top-level README, similar scope to the current README overhaul.

Poem

I thump my paws—docs freshly tilled,
Paths mapped clean, the backlog filled.
KVBM sprouts: ✅s bloom bright,
SGLang buds—🚧 in sight.
From local runs to k8s skies,
I hop through guides—quickstart-wise.
Carrot-shaped commits—ship with pride! 🥕

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The provided PR description gives a short summary of documentation changes but does not follow the repository's required template: it lacks the explicit "Overview", "Details", "Where should the reviewer start?", and "Related Issues" sections, so required information for reviewers is missing.	Please update the PR description to the repository template by adding an Overview and Details that list the key file changes (for example README.md, docs/quickstart.md, docs/kubernetes/README.md, components/backends/*/README.md, and examples/basics/multinode/README.md), a "Where should the reviewer start?" section pointing reviewers to the most important files, and a Related Issues line using the required format (e.g., closes GitHub issue: #xxx).

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title mentions creating quickstart.md and README updates which match the primary documentation changes in the diff, so it is related to the main change; however the phrase "and making other updates" is vague and could be tightened to better describe scope.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

docs/kubernetes/README.md (2)
84-97: Namespace mismatch in example.

You set NAMESPACE=dynamo-kubernetes above, but here reset it to dynamo-cloud. This will cause confusion.
-export NAMESPACE=dynamo-cloud
+# Use the same namespace as platform install
+export NAMESPACE=dynamo-kubernetes
109-116: Leading slash links break on GitHub.

Links like /docs/kubernetes/api_reference.md resolve to github.com/docs/… not this repo. Make them relative.
-- **[API Reference](/docs/kubernetes/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment`
-- **[Operator Guide](/docs/kubernetes/dynamo_operator.md)** - Dynamo operator configuration and management
-- **[Create Deployment](/docs/kubernetes/create_deployment.md)** - Step-by-step deployment creation examples
+- **[API Reference](./api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment`
+- **[Operator Guide](./dynamo_operator.md)** - Dynamo operator configuration and management
+- **[Create Deployment](./create_deployment.md)** - Step-by-step deployment creation examples
Also fix leading slashes in “Additional Resources” to relative links inside docs/kubernetes.
docs/quickstart.md (1)
299-301: Broken Support Matrix link.

From docs/, drop the “docs/” prefix.
-For detailed compatibility information, see the [Support Matrix](docs/support_matrix.md).
+For detailed compatibility information, see the [Support Matrix](support_matrix.md).

🧹 Nitpick comments (11)

components/backends/sglang/README.md (2)
50-51: Typo: “does not router to DP worker”.

Change “router” to “route”.
- | **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not router to DP worker |
+ | **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not route to DP worker |
190-191: Typo: “conjuction”.

Change to “conjunction”.
- ... tokenizer_manager) is used in conjuction with NIXL ...
+ ... tokenizer_manager) is used in conjunction with NIXL ...
components/backends/trtllm/README.md (1)

191-201: Duplicate “Client”/“Benchmarking” sections.

“Client” appears twice (Lines 191–196 and 231–236) and “Benchmarking” twice (197–201 and 237–241). Consolidate to reduce redundancy.
examples/basics/multinode/README.md (1)
156-161: Minor formatting issues in the info note.

There are stray “>” artifacts in “different > GPUs” / “token > generation”. Remove the stray symbols.
- > - `CUDA_VISIBLE_DEVICES`: Controls which GPU each worker uses (0 and 1 for different > GPUs)
+ > - `CUDA_VISIBLE_DEVICES`: Controls which GPU each worker uses (0 and 1 for different GPUs)
- > - `--disaggregation-mode`: Separates prefill (prompt processing) from decode (token > generation)
+ > - `--disaggregation-mode`: Separates prefill (prompt processing) from decode (token generation)
docs/kubernetes/README.md (2)
73-81: Typo: “replicaes”.

Change to “replicas”.
-### **Multi-node Deployment** (Model replicaes across multiple nodes)
+### **Multi-node Deployment** (Model replicas across multiple nodes)
167-187: Worker command examples — fix sglang to consolidated entrypoint.

Use python3 -m dynamo.sglang consistently.
-  - >-
-    python3 -m dynamo.sglang
+  - >-
+    python3 -m dynamo.sglang
(If any legacy dynamo.sglang.worker invocations exist elsewhere, update them.)
docs/quickstart.md (3)

27-34: Pin extras but avoid hard version pin unless required.

Pinning ai-dynamo to 0.5.0 is fine for reproducibility. Consider adding a brief note that users can omit the pin for latest.

41-44: Consider using the local compose file if running from a clone.

If users cloned the repo, docker compose -f deploy/docker-compose.yml up -d avoids curl. Optionally add as an alternative.

206-208: Service name in port-forward is deployment-specific.

Using svc/agg-vllm-frontend assumes the aggregated vLLM sample. Consider adding a note that the service name varies by chosen manifest.
README.md (2)
150-155: TRT-LLM run command should use --model-path.

Align with backend README examples.
-| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](docs/quickstart.md#tensorrt-llm-backend) for setup. |
+| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](docs/quickstart.md#tensorrt-llm-backend) for setup. |
164-166: markdownlint warnings: heading/emphasis.

Consider replacing the emphasized “For contributors and advanced users” with a proper heading and ensure heading levels increment by one.
-**For contributors and advanced users**
+### For contributors and advanced users
Also confirm surrounding headings maintain the increment rule.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae2010 and 14c6c35.

📒 Files selected for processing (7)

README.md (6 hunks)
components/backends/sglang/README.md (1 hunks)
components/backends/trtllm/README.md (1 hunks)
components/backends/vllm/README.md (1 hunks)
docs/kubernetes/README.md (2 hunks)
docs/quickstart.md (1 hunks)
examples/basics/multinode/README.md (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-30T20:43:49.632Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

README.md

🪛 GitHub Check: Check for broken markdown links

docs/quickstart.md

[failure] 299-299:
Broken link: Support Matrix - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L299

[failure] 243-243:
Broken link: Multi-node Deployment - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L243

[failure] 242-242:
Broken link: Logging Configuration - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L242

[failure] 241-241:
Broken link: Monitoring Setup - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L241

[failure] 240-240:
Broken link: Installation Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L240

[failure] 239-239:
Broken link: API Reference - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L239

[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232

[failure] 231-231:
Broken link: Runtime Examples - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L231

🪛 markdownlint-cli2 (0.18.1)

README.md

164-164: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

166-166: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (13)

components/backends/vllm/README.md (2)

43-43: Confirm KVBM status and cross-doc consistency.

KVBM flipped to ✅. Please confirm this aligns with the backend’s current capabilities and that linked docs (kvbm_architecture.md) accurately describe any limitations.

169-169: Version-pinned vLLM CLI docs link may drift.

The link pins vLLM docs to v0.9.2. If our supported vLLM version differs, adjust the link or note the required version to avoid confusion.

components/backends/sglang/README.md (1)

43-44: KVBM moved to WIP — LGTM.

Status adjustment to WIP looks good and matches broader PR intent.

Please ensure other references (quickstart, k8s docs) don’t overstate feature readiness for SGLang.

components/backends/trtllm/README.md (1)

60-60: KVBM marked as available — LGTM.

Matches the PR theme and linked guide exists below. Consider adding a brief note about minimum TRT-LLM/Dynamo versions if applicable.

examples/basics/multinode/README.md (3)

134-153: Module entrypoint change to python3 -m dynamo.sglang — LGTM.

Matches repo-wide shift to top-level package entrypoint. The flags also align with disagg prefill/decode.

168-188: Replica 2 commands — LGTM.

Consistent with Replica 1 changes and disagg pattern.

476-477: Updated pkill pattern — LGTM.

Matches the broadened module path change.
docs/kubernetes/README.md (2)
27-41: Versioning and Grove/KAI options — LGTM.

RELEASE_VERSION bump to 0.5.0 and optional Grove/KAI flags look fine.

92-97: CRD kind casing in kubectl commands.

Using kubectl get dynamoGraphDeployment typically fails. CRDs are usually plural, lowercase (e.g., dynamographdeployments). Verify the resource name and update both “get” and later “delete” examples accordingly.
- kubectl get dynamoGraphDeployment -n ${NAMESPACE}
+ kubectl get dynamographdeployments -n ${NAMESPACE}
Apply similar change to the delete command later in the doc.
docs/quickstart.md (1)

146-158: K8s install section — LGTM.

Consistent with the k8s README platform install.

README.md (3)

28-45: Crisp positioning and intro — LGTM.

Clear problem statement and value prop; good.

67-85: Quickstart link and local steps — LGTM.

Good cross-link to the new quickstart and concise local flow.

132-133: Kubernetes docs link — LGTM.

Points to the updated k8s README.

coderabbitai · 2025-09-24T10:21:41Z

docs/kubernetes/README.md

+| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+
+### **Disaggregated Serving**
+Prefill and decode phases run on separate workers - higher performance and scalability.
+
+| Backend | Configuration | Deploy Command |
+|---------|---------------|----------------|
+| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
+
+### **Multi-node Deployment** (Model replicaes across multiple nodes)
+Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.
+
+| Backend | Configuration | Deploy Command |
+|---------|---------------|----------------|
+| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |


⚠️ Potential issue

Fix broken relative links to backend deploy manifests.

These links are relative to docs/kubernetes. They should traverse up two directories.

Apply:

-| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | +| **vLLM** | [Aggregated](../../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` | -| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | +| **vLLM** | [Aggregated + Router](../../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` | -| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | +| **SGLang** | [Aggregated](../../components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` | -| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | +| **SGLang** | [Aggregated + Router](../../components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` | -| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | +| **TensorRT-LLM** | [Aggregated](../../components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` | -| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` | +| **TensorRT-LLM** | [Aggregated + Router](../../components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

Repeat the same ../../ prefix for all Disaggregated and Multi-node tables in this section.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

### **Disaggregated Serving**

Prefill and decode phases run on separate workers - higher performance and scalability.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

### **Multi-node Deployment** (Model replicaes across multiple nodes)

Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **vLLM** | [Aggregated](../../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Aggregated + Router](../../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated](../../components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated + Router](../../components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated](../../components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated + Router](../../components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

### **Disaggregated Serving**

Prefill and decode phases run on separate workers - higher performance and scalability.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

### **Multi-node Deployment** (Model replicaes across multiple nodes)

Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

🤖 Prompt for AI Agents

In docs/kubernetes/README.md around lines 52 to 80, the relative links to backend deploy manifests in the Disaggregated and Multi-node tables are incorrect because they are relative to docs/kubernetes and need to traverse up two directories; update every link target and the corresponding kubectl apply command for all entries in the Disaggregated and Multi-node sections to prepend "../../" so they point to ../../components/backends/... (i.e., change components/backends/... to ../../components/backends/... for both the Configuration link and the Deploy Command for each row).

coderabbitai · 2025-09-24T10:21:41Z

docs/quickstart.md

+uv pip install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
+uv pip install "cuda-python>=12,<13"
+sudo apt-get -y install libopenmpi-dev
+
+# 3. Install
+uv pip install "ai-dynamo[trtllm]"
+
+# 4. Run
+python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+```


⚠️ Potential issue

🧩 Analysis chain

PyTorch version/cuda wheel looks invalid and may break installs.

torch==2.7.1 with --index-url .../cu128 is likely not published. Recommend removing manual torch pin or pointing users to official TRT-LLM container instructions instead.

-uv pip install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 -uv pip install "cuda-python>=12,<13" -sudo apt-get -y install libopenmpi-dev +# Prefer NVIDIA PyTorch container (NGC) or follow TensorRT-LLM’s official install guide. +# If installing via pip, refer to PyTorch’s official compatibility matrix for your CUDA version: +# https://pytorch.org/get-started/locally/

Would you like me to update this subsection to a container-first flow consistent with the TRT-LLM backend README?

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== docs/quickstart.md lines 117-126 ===" if [ -f docs/quickstart.md ]; then sed -n '117,126p' docs/quickstart.md || true else echo "docs/quickstart.md not found" fi echo echo "=== repo matches for 'torch==' or 'cu128' ===" rg -n -C2 --hidden --no-ignore 'torch==|cu128|download.pytorch.org/whl/cu128' || true

Length of output: 2820

Fix PyTorch wheel spec in quickstart: use +cu128 build or prefer container-first install

docs/quickstart.md pins torch==2.7.1 with --index-url .../whl/cu128 — torch==2.7.1 will not match the cu128 wheels; the repo expects the build-tagged version 2.7.1+cu128 (see container/deps/vllm/install_vllm.sh:143 and deploy/dynamo_check.py:53).

Action (choose one): replace the quickstart line with torch==2.7.1+cu128 (and pin matching torchvision/torchaudio) OR remove the manual torch pin and direct users to PyTorch’s compatibility matrix / the TRT‑LLM container install flow.

Location: docs/quickstart.md lines 117–119.

🤖 Prompt for AI Agents

In docs/quickstart.md around lines 117–126, the PyTorch install pins "torch==2.7.1" which will not match the cu128 wheels; update the quickstart to either (1) use the build-tagged packages (e.g., replace torch==2.7.1 with torch==2.7.1+cu128 and pin matching torchvision/torchaudio build tags) so the --index-url cu128 wheel pattern matches, or (2) remove the hard torch pin entirely and replace with a brief instruction directing users to PyTorch’s compatibility matrix or the TRT‑LLM container-first install flow; implement one of these two options and ensure the text and any package pins remain consistent with container/deps/vllm/install_vllm.sh and deploy/dynamo_check.py.

coderabbitai · 2025-09-24T10:21:41Z

docs/quickstart.md

+| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+
+#### **Disaggregated Serving** (Multi-node, specialized workers)
+Prefill and decode phases run on separate workers - higher performance and scalability.
+
+| Backend | Configuration | Deploy Command |
+|---------|---------------|----------------|
+| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
+
+#### **Multi-node Deployment** (Distributed across multiple nodes)
+Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.
+
+| Backend | Configuration | Deploy Command |
+|---------|---------------|----------------|
+| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
+


⚠️ Potential issue

Fix relative links to backend deploy manifests (broken).

From docs/quickstart.md, the “components/backends/…” links need a “../”.

- [Aggregated](components/backends/vllm/deploy/agg.yaml) + [Aggregated](../components/backends/vllm/deploy/agg.yaml) - [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) + [Aggregated + Router](../components/backends/vllm/deploy/agg_router.yaml) - [Disaggregated](components/backends/vllm/deploy/disagg.yaml) + [Disaggregated](../components/backends/vllm/deploy/disagg.yaml) - [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) + [Disaggregated + Router](../components/backends/vllm/deploy/disagg_router.yaml) - [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) + [Disaggregated + Planner](../components/backends/vllm/deploy/disagg_planner.yaml)

Repeat the same “../” fix for SGLang and TensorRT-LLM rows in Aggregated/Disaggregated/Multi-node tables in this section.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

#### **Disaggregated Serving** (Multi-node, specialized workers)

Prefill and decode phases run on separate workers - higher performance and scalability.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

#### **Multi-node Deployment** (Distributed across multiple nodes)

Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **vLLM** | [Aggregated](../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Aggregated + Router](../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

#### **Disaggregated Serving** (Multi-node, specialized workers)

Prefill and decode phases run on separate workers - higher performance and scalability.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Disaggregated](../components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Router](../components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **vLLM** | [Disaggregated + Planner](../components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |

| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

#### **Multi-node Deployment** (Distributed across multiple nodes)

Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.

| Backend | Configuration | Deploy Command |

|---------|---------------|----------------|

| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

🤖 Prompt for AI Agents

In docs/quickstart.md around lines 169 to 198 the Markdown links to backend deploy manifests are broken because they point to components/backends/... relative to the current file; update every such link in the Aggregated, Disaggregated and Multi-node tables to prepend "../" (e.g. change components/backends/... to ../components/backends/...) for vLLM, SGLang and TensorRT-LLM rows so the links correctly resolve to the manifests; apply the same ../ prefix consistently across all three tables in this section.

coderabbitai · 2025-09-24T10:21:42Z

docs/quickstart.md

+- **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings
+- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing
+


⚠️ Potential issue

Broken relative link to runtime examples.

From docs/, link to lib/ should go up one directory.

-- **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings +- **[Runtime Examples](../lib/bindings/python/README.md)** - Low-level Python<>Rust bindings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings

- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing

- **[Runtime Examples](../lib/bindings/python/README.md)** - Low-level Python<>Rust bindings

- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing

🧰 Tools

🪛 GitHub Check: Check for broken markdown links

[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232

[failure] 231-231:
Broken link: Runtime Examples - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L231

🤖 Prompt for AI Agents

In docs/quickstart.md around lines 231 to 233 the relative link to the runtime examples is incorrect (it points to lib/bindings/python/README.md from docs/ instead of up one level); update the link to reference the correct relative path by prepending "../" so it resolves from docs/ to lib/bindings/python/README.md.

coderabbitai · 2025-09-24T10:21:42Z

docs/quickstart.md

+- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing
+


⚠️ Potential issue

Broken link to KV-Aware Routing.

From docs/, use “architecture/…”, not “docs/architecture/…”.

-- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing +- **[KV-Aware Routing](architecture/kv_cache_routing.md)** - Understand intelligent request routing

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing

- **[KV-Aware Routing](architecture/kv_cache_routing.md)** - Understand intelligent request routing

🧰 Tools

🪛 GitHub Check: Check for broken markdown links

[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232

🤖 Prompt for AI Agents

In docs/quickstart.md around lines 232 to 233, the link to KV-Aware Routing is broken because it uses "docs/architecture/kv_cache_routing.md" instead of the correct relative path "architecture/kv_cache_routing.md"; update the markdown link target to **architecture/kv_cache_routing.md** (i.e., change the href to remove the leading "docs/") so the link resolves correctly from the docs/ directory.

coderabbitai · 2025-09-24T10:21:42Z

docs/quickstart.md

+- **[API Reference](docs/kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications
+- **[Installation Guide](docs/kubernetes/installation_guide.md)** - Detailed platform setup
+- **[Monitoring Setup](docs/kubernetes/metrics.md)** - Observability and metrics
+- **[Logging Configuration](docs/kubernetes/logging.md)** - Centralized logging setup
+- **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)** - Scale across K8s nodes
+- **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload
+
+---


⚠️ Potential issue

Broken links from docs root to docs subpaths.

These should be relative to docs/, not prefixed with “docs/…”.

-- **[API Reference](docs/kubernetes/api_reference.md)** +- **[API Reference](kubernetes/api_reference.md)** -- **[Installation Guide](docs/kubernetes/installation_guide.md)** +- **[Installation Guide](kubernetes/installation_guide.md)** -- **[Monitoring Setup](docs/kubernetes/metrics.md)** +- **[Monitoring Setup](kubernetes/metrics.md)** -- **[Logging Configuration](docs/kubernetes/logging.md)** +- **[Logging Configuration](kubernetes/logging.md)** -- **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)** +- **[Multi-node Deployment](kubernetes/multinode-deployment.md)**

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **[API Reference](docs/kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications

- **[Installation Guide](docs/kubernetes/installation_guide.md)** - Detailed platform setup

- **[Monitoring Setup](docs/kubernetes/metrics.md)** - Observability and metrics

- **[Logging Configuration](docs/kubernetes/logging.md)** - Centralized logging setup

- **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)** - Scale across K8s nodes

- **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload

---

- **[API Reference](kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications

- **[Installation Guide](kubernetes/installation_guide.md)** - Detailed platform setup

- **[Monitoring Setup](kubernetes/metrics.md)** - Observability and metrics

- **[Logging Configuration](kubernetes/logging.md)** - Centralized logging setup

- **[Multi-node Deployment](kubernetes/multinode-deployment.md)** - Scale across K8s nodes

- **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload

---

🧰 Tools

🪛 GitHub Check: Check for broken markdown links

[failure] 243-243:
Broken link: Multi-node Deployment - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L243

[failure] 242-242:
Broken link: Logging Configuration - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L242

[failure] 241-241:
Broken link: Monitoring Setup - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L241

[failure] 240-240:
Broken link: Installation Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L240

[failure] 239-239:
Broken link: API Reference - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L239

🤖 Prompt for AI Agents

In docs/quickstart.md around lines 239 to 246 the documentation links are incorrectly prefixed with "docs/"; update each link target to be relative to the docs/ root (remove the leading "docs/") so they point to the correct subpaths (e.g., change "docs/kubernetes/api_reference.md" to "kubernetes/api_reference.md", and similarly for installation_guide.md, metrics.md, logging.md, multinode-deployment.md and benchmarks/).

coderabbitai · 2025-09-24T10:21:42Z

README.md

 | [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
 | [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
 | [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
-| [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 |
 | [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |
 | [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |



⚠️ Potential issue

Leading slash links break on GitHub.

Change /docs/... to docs/....

-| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ | -| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 | -| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ | -| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ | -| [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ | +| [**Disaggregated Serving**](docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ | +| [**Conditional Disaggregation**](docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 | +| [**KV-Aware Routing**](docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ | +| [**SLA-Based Planner**](docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ | +| [**KVBM**](docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |

| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |

| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |

| [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 |

| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |

| [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |

| [**Disaggregated Serving**](docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |

| [**Conditional Disaggregation**](docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |

| [**KV-Aware Routing**](docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |

| [**SLA-Based Planner**](docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |

| [**KVBM**](docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |

🤖 Prompt for AI Agents

In README.md around lines 54 to 59 the internal links use a leading slash (e.g. /docs/...), which breaks on GitHub; remove the leading slash for each link in that block so they read docs/... instead, updating all occurrences in those lines to relative paths.

grahamking · 2025-09-24T12:44:08Z

README.md

+python -m dynamo.frontend --http-port 8000

+# Terminal 2: Start backend worker
+python -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B


Minor, but a 0.6B model will download much faster. Less waiting around. I believe Qwen3 0.6B is also vllm's default nowadays.

README.md

julienmancuso · 2025-09-24T13:55:21Z

docs/quickstart.md

@@ -0,0 +1,300 @@
+# Dynamo Quickstart Guide


I feel like we have a lot of doc duplication.

why can't we just point to kubernetes/README.md in this quickstart kubernetes section ?

Agreed -- we have a couple of docs that could be reasonably described as quickstarts including examples/basics/quickstart, README.md#3-run-dynamo, docs/kubernetes/README.md etc. If we want to create a new one here lets delete at least one of the others

github-actions · 2025-10-25T09:32:30Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2025-10-31T09:35:04Z

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

athreesh added 4 commits September 24, 2025 17:55

Apply UX fixes: improved README and quickstart documentation

dec35a5

Add quickstart.md with comprehensive UX improvements

b22f94c

Move quickstart.md to docs/ and update all references

59ba6c2

Signed-off-by: athreesh <[email protected]>

athreesh requested review from grahamking, hutm and nnshah1 September 24, 2025 10:14

athreesh requested review from a team as code owners September 24, 2025 10:14

pull-request-size bot added the size/XL label Sep 24, 2025

github-actions bot added the fix label Sep 24, 2025

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

grahamking reviewed Sep 24, 2025

View reviewed changes

README.md Show resolved Hide resolved

grahamking approved these changes Sep 24, 2025

View reviewed changes

julienmancuso reviewed Sep 24, 2025

View reviewed changes

github-actions bot added the Stale label Oct 25, 2025

github-actions bot closed this Oct 31, 2025

github-actions bot deleted the clean-ux-fixes branch October 31, 2025 09:35

		- [Runtime Examples](lib/bindings/python/README.md) - Low-level Python<>Rust bindings
		- [KV-Aware Routing](docs/architecture/kv_cache_routing.md) - Understand intelligent request routing

	- [KV-Aware Routing](docs/architecture/kv_cache_routing.md) - Understand intelligent request routing
	- [KV-Aware Routing](architecture/kv_cache_routing.md) - Understand intelligent request routing

fix: creating a quickstart.md, readme, and making other updates #3197

fix: creating a quickstart.md, readme, and making other updates #3197

Uh oh!

Conversation

athreesh commented Sep 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 24, 2025

Uh oh!

coderabbitai bot commented Sep 24, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

grahamking Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

julienmancuso Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

nealvaidya Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

athreesh commented Sep 24, 2025 •

edited by coderabbitai bot

Loading