Skip to content

Conversation

@athreesh
Copy link
Contributor

@athreesh athreesh commented Sep 24, 2025

UX Documentation Improvements

Enhanced README structure - Streamlined content with clearer navigation and improved quickstart links
Added comprehensive quickstart guide - New docs/quickstart.md with step-by-step local and Kubernetes deployment instructions
Fixed backend documentation - Updated KVBM status indicators and corrected deprecated SGLang commands in multinode examples
Simplified Kubernetes docs - Cleaner deployment patterns and consolidated backend configuration references

Summary by CodeRabbit

  • Documentation
    • Overhauled main README with structured Quick Start, local development, and deployment guides.
    • Added comprehensive Quickstart covering local and Kubernetes paths, with backend-specific steps.
    • Expanded Kubernetes docs with version update (0.5.0), deployment patterns (Aggregated/Disaggregated/Multi-node), and explicit commands.
    • Updated engine feature matrices: KVBM now WIP for SGLang; complete for vLLM and TensorRT-LLM.
    • Enhanced cross-links, testing workflows, and next steps.
  • Examples
    • Updated multinode example commands to use unified sglang entry point and adjusted process termination patterns.

- Update backend READMEs with correct KVBM status
- Simplify Kubernetes README with cleaner structure
- Fix multinode example to use correct dynamo.sglang command
- Add missing --skip-tokenizer-init flags

Signed-off-by: athreesh <[email protected]>
@athreesh athreesh requested review from a team as code owners September 24, 2025 10:14
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

Walkthrough

Docs were extensively reorganized: top-level README rewritten, Kubernetes guide updated, and a new Quickstart added. Backend README feature matrices updated for KVBM status. Multinode example commands adjusted for SGLang module entrypoint and pkill pattern. No code/API changes.

Changes

Cohort / File(s) Summary of Changes
Top-level README overhaul
README.md
Rewrote and expanded structure: new Quick Start and Local Development, stage-based install/run/test, Kubernetes deployment with Helm, concise engine table and links, contributor and testing workflows. Removed legacy sections.
Backend KVBM status updates
components/backends/vllm/README.md, components/backends/trtllm/README.md, components/backends/sglang/README.md
Updated KVBM feature row: vLLM from WIP to ✅; TRT-LLM from Planned to ✅; SGLang from Planned ❌ to WIP 🚧 in relevant tables.
Kubernetes docs reorg
docs/kubernetes/README.md
Bumped RELEASE_VERSION to 0.5.0. Replaced single backend section with detailed “Backend and Deployment Pattern” tables, added aggregated/disaggregated/multi-node flows and commands, and a step-by-step deploy example.
New Quickstart guide
docs/quickstart.md
Added comprehensive quickstart covering local and Kubernetes paths, per-backend (vLLM/SGLang/TRT-LLM) steps, testing, cleanup, and troubleshooting.
SGLang multinode example updates
examples/basics/multinode/README.md
Switched worker invocations to python3 -m dynamo.sglang (from module-specific workers) and broadened pkill pattern to dynamo.sglang.*prefill.

Sequence Diagram(s)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

I thump my paws—docs freshly tilled,
Paths mapped clean, the backlog filled.
KVBM sprouts: ✅s bloom bright,
SGLang buds—🚧 in sight.
From local runs to k8s skies,
I hop through guides—quickstart-wise.
Carrot-shaped commits—ship with pride! 🥕

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The provided PR description gives a short summary of documentation changes but does not follow the repository's required template: it lacks the explicit "Overview", "Details", "Where should the reviewer start?", and "Related Issues" sections, so required information for reviewers is missing. Please update the PR description to the repository template by adding an Overview and Details that list the key file changes (for example README.md, docs/quickstart.md, docs/kubernetes/README.md, components/backends/*/README.md, and examples/basics/multinode/README.md), a "Where should the reviewer start?" section pointing reviewers to the most important files, and a Related Issues line using the required format (e.g., closes GitHub issue: #xxx).
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title mentions creating quickstart.md and README updates which match the primary documentation changes in the diff, so it is related to the main change; however the phrase "and making other updates" is vague and could be tightened to better describe scope.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
docs/kubernetes/README.md (2)

84-97: Namespace mismatch in example.

You set NAMESPACE=dynamo-kubernetes above, but here reset it to dynamo-cloud. This will cause confusion.

-export NAMESPACE=dynamo-cloud
+# Use the same namespace as platform install
+export NAMESPACE=dynamo-kubernetes

109-116: Leading slash links break on GitHub.

Links like /docs/kubernetes/api_reference.md resolve to github.com/docs/… not this repo. Make them relative.

-- **[API Reference](/docs/kubernetes/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment`
-- **[Operator Guide](/docs/kubernetes/dynamo_operator.md)** - Dynamo operator configuration and management
-- **[Create Deployment](/docs/kubernetes/create_deployment.md)** - Step-by-step deployment creation examples
+- **[API Reference](./api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment`
+- **[Operator Guide](./dynamo_operator.md)** - Dynamo operator configuration and management
+- **[Create Deployment](./create_deployment.md)** - Step-by-step deployment creation examples

Also fix leading slashes in “Additional Resources” to relative links inside docs/kubernetes.

docs/quickstart.md (1)

299-301: Broken Support Matrix link.

From docs/, drop the “docs/” prefix.

-For detailed compatibility information, see the [Support Matrix](docs/support_matrix.md).
+For detailed compatibility information, see the [Support Matrix](support_matrix.md).
🧹 Nitpick comments (11)
components/backends/sglang/README.md (2)

50-51: Typo: “does not router to DP worker”.

Change “router” to “route”.

- | **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not router to DP worker |
+ | **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not route to DP worker |

190-191: Typo: “conjuction”.

Change to “conjunction”.

- ... tokenizer_manager) is used in conjuction with NIXL ...
+ ... tokenizer_manager) is used in conjunction with NIXL ...
components/backends/trtllm/README.md (1)

191-201: Duplicate “Client”/“Benchmarking” sections.

“Client” appears twice (Lines 191–196 and 231–236) and “Benchmarking” twice (197–201 and 237–241). Consolidate to reduce redundancy.

examples/basics/multinode/README.md (1)

156-161: Minor formatting issues in the info note.

There are stray “>” artifacts in “different > GPUs” / “token > generation”. Remove the stray symbols.

- > - `CUDA_VISIBLE_DEVICES`: Controls which GPU each worker uses (0 and 1 for different > GPUs)
+ > - `CUDA_VISIBLE_DEVICES`: Controls which GPU each worker uses (0 and 1 for different GPUs)
- > - `--disaggregation-mode`: Separates prefill (prompt processing) from decode (token > generation)
+ > - `--disaggregation-mode`: Separates prefill (prompt processing) from decode (token generation)
docs/kubernetes/README.md (2)

73-81: Typo: “replicaes”.

Change to “replicas”.

-### **Multi-node Deployment** (Model replicaes across multiple nodes)
+### **Multi-node Deployment** (Model replicas across multiple nodes)

167-187: Worker command examples — fix sglang to consolidated entrypoint.

Use python3 -m dynamo.sglang consistently.

-  - >-
-    python3 -m dynamo.sglang
+  - >-
+    python3 -m dynamo.sglang

(If any legacy dynamo.sglang.worker invocations exist elsewhere, update them.)

docs/quickstart.md (3)

27-34: Pin extras but avoid hard version pin unless required.

Pinning ai-dynamo to 0.5.0 is fine for reproducibility. Consider adding a brief note that users can omit the pin for latest.


41-44: Consider using the local compose file if running from a clone.

If users cloned the repo, docker compose -f deploy/docker-compose.yml up -d avoids curl. Optionally add as an alternative.


206-208: Service name in port-forward is deployment-specific.

Using svc/agg-vllm-frontend assumes the aggregated vLLM sample. Consider adding a note that the service name varies by chosen manifest.

README.md (2)

150-155: TRT-LLM run command should use --model-path.

Align with backend README examples.

-| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](docs/quickstart.md#tensorrt-llm-backend) for setup. |
+| **TensorRT-LLM** | `uv pip install ai-dynamo[trtllm]` | `python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | Requires NVIDIA PyTorch container. See [TensorRT-LLM Quickstart](docs/quickstart.md#tensorrt-llm-backend) for setup. |

164-166: markdownlint warnings: heading/emphasis.

Consider replacing the emphasized “For contributors and advanced users” with a proper heading and ensure heading levels increment by one.

-**For contributors and advanced users**
+### For contributors and advanced users

Also confirm surrounding headings maintain the increment rule.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae2010 and 14c6c35.

📒 Files selected for processing (7)
  • README.md (6 hunks)
  • components/backends/sglang/README.md (1 hunks)
  • components/backends/trtllm/README.md (1 hunks)
  • components/backends/vllm/README.md (1 hunks)
  • docs/kubernetes/README.md (2 hunks)
  • docs/quickstart.md (1 hunks)
  • examples/basics/multinode/README.md (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • README.md
🪛 GitHub Check: Check for broken markdown links
docs/quickstart.md

[failure] 299-299:
Broken link: Support Matrix - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L299


[failure] 243-243:
Broken link: Multi-node Deployment - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L243


[failure] 242-242:
Broken link: Logging Configuration - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L242


[failure] 241-241:
Broken link: Monitoring Setup - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L241


[failure] 240-240:
Broken link: Installation Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L240


[failure] 239-239:
Broken link: API Reference - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L239


[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232


[failure] 231-231:
Broken link: Runtime Examples - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L231

🪛 markdownlint-cli2 (0.18.1)
README.md

164-164: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


166-166: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (13)
components/backends/vllm/README.md (2)

43-43: Confirm KVBM status and cross-doc consistency.

KVBM flipped to ✅. Please confirm this aligns with the backend’s current capabilities and that linked docs (kvbm_architecture.md) accurately describe any limitations.


169-169: Version-pinned vLLM CLI docs link may drift.

The link pins vLLM docs to v0.9.2. If our supported vLLM version differs, adjust the link or note the required version to avoid confusion.

components/backends/sglang/README.md (1)

43-44: KVBM moved to WIP — LGTM.

Status adjustment to WIP looks good and matches broader PR intent.

Please ensure other references (quickstart, k8s docs) don’t overstate feature readiness for SGLang.

components/backends/trtllm/README.md (1)

60-60: KVBM marked as available — LGTM.

Matches the PR theme and linked guide exists below. Consider adding a brief note about minimum TRT-LLM/Dynamo versions if applicable.

examples/basics/multinode/README.md (3)

134-153: Module entrypoint change to python3 -m dynamo.sglang — LGTM.

Matches repo-wide shift to top-level package entrypoint. The flags also align with disagg prefill/decode.


168-188: Replica 2 commands — LGTM.

Consistent with Replica 1 changes and disagg pattern.


476-477: Updated pkill pattern — LGTM.

Matches the broadened module path change.

docs/kubernetes/README.md (2)

27-41: Versioning and Grove/KAI options — LGTM.

RELEASE_VERSION bump to 0.5.0 and optional Grove/KAI flags look fine.


92-97: CRD kind casing in kubectl commands.

Using kubectl get dynamoGraphDeployment typically fails. CRDs are usually plural, lowercase (e.g., dynamographdeployments). Verify the resource name and update both “get” and later “delete” examples accordingly.

- kubectl get dynamoGraphDeployment -n ${NAMESPACE}
+ kubectl get dynamographdeployments -n ${NAMESPACE}

Apply similar change to the delete command later in the doc.

docs/quickstart.md (1)

146-158: K8s install section — LGTM.

Consistent with the k8s README platform install.

README.md (3)

28-45: Crisp positioning and intro — LGTM.

Clear problem statement and value prop; good.


67-85: Quickstart link and local steps — LGTM.

Good cross-link to the new quickstart and concise local flow.


132-133: Kubernetes docs link — LGTM.

Points to the updated k8s README.

Comment on lines +52 to +80
| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

### **Disaggregated Serving**
Prefill and decode phases run on separate workers - higher performance and scalability.

| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

### **Multi-node Deployment** (Model replicaes across multiple nodes)
Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.

| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix broken relative links to backend deploy manifests.

These links are relative to docs/kubernetes. They should traverse up two directories.

Apply:

-| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Aggregated](../../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
-| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **vLLM** | [Aggregated + Router](../../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
-| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated](../../components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
-| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **SGLang** | [Aggregated + Router](../../components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
-| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated](../../components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
-| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
+| **TensorRT-LLM** | [Aggregated + Router](../../components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

Repeat the same ../../ prefix for all Disaggregated and Multi-node tables in this section.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
### **Disaggregated Serving**
Prefill and decode phases run on separate workers - higher performance and scalability.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
### **Multi-node Deployment** (Model replicaes across multiple nodes)
Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated](../../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated + Router](../../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated](../../components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated + Router](../../components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated](../../components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated + Router](../../components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
### **Disaggregated Serving**
Prefill and decode phases run on separate workers - higher performance and scalability.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
### **Multi-node Deployment** (Model replicaes across multiple nodes)
Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
🤖 Prompt for AI Agents
In docs/kubernetes/README.md around lines 52 to 80, the relative links to
backend deploy manifests in the Disaggregated and Multi-node tables are
incorrect because they are relative to docs/kubernetes and need to traverse up
two directories; update every link target and the corresponding kubectl apply
command for all entries in the Disaggregated and Multi-node sections to prepend
"../../" so they point to ../../components/backends/... (i.e., change
components/backends/... to ../../components/backends/... for both the
Configuration link and the Deploy Command for each row).

Comment on lines +117 to +126
uv pip install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
uv pip install "cuda-python>=12,<13"
sudo apt-get -y install libopenmpi-dev

# 3. Install
uv pip install "ai-dynamo[trtllm]"

# 4. Run
python -m dynamo.trtllm --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

PyTorch version/cuda wheel looks invalid and may break installs.

torch==2.7.1 with --index-url .../cu128 is likely not published. Recommend removing manual torch pin or pointing users to official TRT-LLM container instructions instead.

-uv pip install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
-uv pip install "cuda-python>=12,<13"
-sudo apt-get -y install libopenmpi-dev
+# Prefer NVIDIA PyTorch container (NGC) or follow TensorRT-LLM’s official install guide.
+# If installing via pip, refer to PyTorch’s official compatibility matrix for your CUDA version:
+# https://pytorch.org/get-started/locally/

Would you like me to update this subsection to a container-first flow consistent with the TRT-LLM backend README?


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== docs/quickstart.md lines 117-126 ==="
if [ -f docs/quickstart.md ]; then
  sed -n '117,126p' docs/quickstart.md || true
else
  echo "docs/quickstart.md not found"
fi

echo
echo "=== repo matches for 'torch==' or 'cu128' ==="
rg -n -C2 --hidden --no-ignore 'torch==|cu128|download.pytorch.org/whl/cu128' || true

Length of output: 2820


Fix PyTorch wheel spec in quickstart: use +cu128 build or prefer container-first install

docs/quickstart.md pins torch==2.7.1 with --index-url .../whl/cu128torch==2.7.1 will not match the cu128 wheels; the repo expects the build-tagged version 2.7.1+cu128 (see container/deps/vllm/install_vllm.sh:143 and deploy/dynamo_check.py:53).

Action (choose one): replace the quickstart line with torch==2.7.1+cu128 (and pin matching torchvision/torchaudio) OR remove the manual torch pin and direct users to PyTorch’s compatibility matrix / the TRT‑LLM container install flow.

Location: docs/quickstart.md lines 117–119.

🤖 Prompt for AI Agents
In docs/quickstart.md around lines 117–126, the PyTorch install pins
"torch==2.7.1" which will not match the cu128 wheels; update the quickstart to
either (1) use the build-tagged packages (e.g., replace torch==2.7.1 with
torch==2.7.1+cu128 and pin matching torchvision/torchaudio build tags) so the
--index-url cu128 wheel pattern matches, or (2) remove the hard torch pin
entirely and replace with a brief instruction directing users to PyTorch’s
compatibility matrix or the TRT‑LLM container-first install flow; implement one
of these two options and ensure the text and any package pins remain consistent
with container/deps/vllm/install_vllm.sh and deploy/dynamo_check.py.

Comment on lines +169 to +198
| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |

#### **Disaggregated Serving** (Multi-node, specialized workers)
Prefill and decode phases run on separate workers - higher performance and scalability.

| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |

#### **Multi-node Deployment** (Distributed across multiple nodes)
Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.

| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix relative links to backend deploy manifests (broken).

From docs/quickstart.md, the “components/backends/…” links need a “../”.

- [Aggregated](components/backends/vllm/deploy/agg.yaml)
+ [Aggregated](../components/backends/vllm/deploy/agg.yaml)
- [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml)
+ [Aggregated + Router](../components/backends/vllm/deploy/agg_router.yaml)
- [Disaggregated](components/backends/vllm/deploy/disagg.yaml)
+ [Disaggregated](../components/backends/vllm/deploy/disagg.yaml)
- [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml)
+ [Disaggregated + Router](../components/backends/vllm/deploy/disagg_router.yaml)
- [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml)
+ [Disaggregated + Planner](../components/backends/vllm/deploy/disagg_planner.yaml)

Repeat the same “../” fix for SGLang and TensorRT-LLM rows in Aggregated/Disaggregated/Multi-node tables in this section.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| **vLLM** | [Aggregated](components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated + Router](components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
#### **Disaggregated Serving** (Multi-node, specialized workers)
Prefill and decode phases run on separate workers - higher performance and scalability.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Disaggregated](components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Router](components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Planner](components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
#### **Multi-node Deployment** (Distributed across multiple nodes)
Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated](../components/backends/vllm/deploy/agg.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Aggregated + Router](../components/backends/vllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated](components/backends/sglang/deploy/agg.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Aggregated + Router](components/backends/sglang/deploy/agg_router.yaml) | `kubectl apply -f components/backends/sglang/deploy/agg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated](components/backends/trtllm/deploy/agg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Aggregated + Router](components/backends/trtllm/deploy/agg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/agg_router.yaml -n ${NAMESPACE}` |
#### **Disaggregated Serving** (Multi-node, specialized workers)
Prefill and decode phases run on separate workers - higher performance and scalability.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Disaggregated](../components/backends/vllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Router](../components/backends/vllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **vLLM** | [Disaggregated + Planner](../components/backends/vllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated](components/backends/sglang/deploy/disagg.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg.yaml -n ${NAMESPACE}` |
| **SGLang** | [Disaggregated + Planner](components/backends/sglang/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated](components/backends/trtllm/deploy/disagg.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Router](components/backends/trtllm/deploy/disagg_router.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_router.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Disaggregated + Planner](components/backends/trtllm/deploy/disagg_planner.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg_planner.yaml -n ${NAMESPACE}` |
#### **Multi-node Deployment** (Distributed across multiple nodes)
Scale disaggregated serving across multiple Kubernetes nodes for maximum performance.
| Backend | Configuration | Deploy Command |
|---------|---------------|----------------|
| **vLLM** | [Multi-node](components/backends/vllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/vllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **SGLang** | [Multi-node](components/backends/sglang/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/sglang/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
| **TensorRT-LLM** | [Multi-node](components/backends/trtllm/deploy/disagg-multinode.yaml) | `kubectl apply -f components/backends/trtllm/deploy/disagg-multinode.yaml -n ${NAMESPACE}` |
🤖 Prompt for AI Agents
In docs/quickstart.md around lines 169 to 198 the Markdown links to backend
deploy manifests are broken because they point to components/backends/...
relative to the current file; update every such link in the Aggregated,
Disaggregated and Multi-node tables to prepend "../" (e.g. change
components/backends/... to ../components/backends/...) for vLLM, SGLang and
TensorRT-LLM rows so the links correctly resolve to the manifests; apply the
same ../ prefix consistently across all three tables in this section.

Comment on lines +231 to +233
- **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings
- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Broken relative link to runtime examples.

From docs/, link to lib/ should go up one directory.

-- **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings
+- **[Runtime Examples](../lib/bindings/python/README.md)** - Low-level Python<>Rust bindings
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **[Runtime Examples](lib/bindings/python/README.md)** - Low-level Python<>Rust bindings
- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing
- **[Runtime Examples](../lib/bindings/python/README.md)** - Low-level Python<>Rust bindings
- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing
🧰 Tools
🪛 GitHub Check: Check for broken markdown links

[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232


[failure] 231-231:
Broken link: Runtime Examples - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L231

🤖 Prompt for AI Agents
In docs/quickstart.md around lines 231 to 233 the relative link to the runtime
examples is incorrect (it points to lib/bindings/python/README.md from docs/
instead of up one level); update the link to reference the correct relative path
by prepending "../" so it resolves from docs/ to lib/bindings/python/README.md.

Comment on lines +232 to +233
- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Broken link to KV-Aware Routing.

From docs/, use “architecture/…”, not “docs/architecture/…”.

-- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing
+- **[KV-Aware Routing](architecture/kv_cache_routing.md)** - Understand intelligent request routing
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **[KV-Aware Routing](docs/architecture/kv_cache_routing.md)** - Understand intelligent request routing
- **[KV-Aware Routing](architecture/kv_cache_routing.md)** - Understand intelligent request routing
🧰 Tools
🪛 GitHub Check: Check for broken markdown links

[failure] 232-232:
Broken link: KV-Aware Routing - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L232

🤖 Prompt for AI Agents
In docs/quickstart.md around lines 232 to 233, the link to KV-Aware Routing is
broken because it uses "docs/architecture/kv_cache_routing.md" instead of the
correct relative path "architecture/kv_cache_routing.md"; update the markdown
link target to **architecture/kv_cache_routing.md** (i.e., change the href to
remove the leading "docs/") so the link resolves correctly from the docs/
directory.

Comment on lines +239 to +246
- **[API Reference](docs/kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications
- **[Installation Guide](docs/kubernetes/installation_guide.md)** - Detailed platform setup
- **[Monitoring Setup](docs/kubernetes/metrics.md)** - Observability and metrics
- **[Logging Configuration](docs/kubernetes/logging.md)** - Centralized logging setup
- **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)** - Scale across K8s nodes
- **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Broken links from docs root to docs subpaths.

These should be relative to docs/, not prefixed with “docs/…”.

-- **[API Reference](docs/kubernetes/api_reference.md)**
+- **[API Reference](kubernetes/api_reference.md)**
-- **[Installation Guide](docs/kubernetes/installation_guide.md)**
+- **[Installation Guide](kubernetes/installation_guide.md)**
-- **[Monitoring Setup](docs/kubernetes/metrics.md)**
+- **[Monitoring Setup](kubernetes/metrics.md)**
-- **[Logging Configuration](docs/kubernetes/logging.md)**
+- **[Logging Configuration](kubernetes/logging.md)**
-- **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)**
+- **[Multi-node Deployment](kubernetes/multinode-deployment.md)**
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **[API Reference](docs/kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications
- **[Installation Guide](docs/kubernetes/installation_guide.md)** - Detailed platform setup
- **[Monitoring Setup](docs/kubernetes/metrics.md)** - Observability and metrics
- **[Logging Configuration](docs/kubernetes/logging.md)** - Centralized logging setup
- **[Multi-node Deployment](docs/kubernetes/multinode-deployment.md)** - Scale across K8s nodes
- **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload
---
- **[API Reference](kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specifications
- **[Installation Guide](kubernetes/installation_guide.md)** - Detailed platform setup
- **[Monitoring Setup](kubernetes/metrics.md)** - Observability and metrics
- **[Logging Configuration](kubernetes/logging.md)** - Centralized logging setup
- **[Multi-node Deployment](kubernetes/multinode-deployment.md)** - Scale across K8s nodes
- **[Performance Tuning](docs/benchmarks/)** - Optimize for your workload
---
🧰 Tools
🪛 GitHub Check: Check for broken markdown links

[failure] 243-243:
Broken link: Multi-node Deployment - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L243


[failure] 242-242:
Broken link: Logging Configuration - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L242


[failure] 241-241:
Broken link: Monitoring Setup - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L241


[failure] 240-240:
Broken link: Installation Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L240


[failure] 239-239:
Broken link: API Reference - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/docs/quickstart.md?plain=1#L239

🤖 Prompt for AI Agents
In docs/quickstart.md around lines 239 to 246 the documentation links are
incorrectly prefixed with "docs/"; update each link target to be relative to the
docs/ root (remove the leading "docs/") so they point to the correct subpaths
(e.g., change "docs/kubernetes/api_reference.md" to
"kubernetes/api_reference.md", and similarly for installation_guide.md,
metrics.md, logging.md, multinode-deployment.md and benchmarks/).

Comment on lines 54 to 59
| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) ||||
| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) ||||
| [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 |
| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) ||||
| [**KVBM**](/docs/architecture/kvbm_architecture.md) || 🚧 ||

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Leading slash links break on GitHub.

Change /docs/... to docs/....

-| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
-| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
-| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
-| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |
-| [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |
+| [**Disaggregated Serving**](docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
+| [**Conditional Disaggregation**](docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
+| [**KV-Aware Routing**](docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
+| [**SLA-Based Planner**](docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |
+| [**KVBM**](docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) ||||
| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) ||||
| [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 |
| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) ||||
| [**KVBM**](/docs/architecture/kvbm_architecture.md) || 🚧 ||
| [**Disaggregated Serving**](docs/architecture/disagg_serving.md) ||||
| [**Conditional Disaggregation**](docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
| [**KV-Aware Routing**](docs/architecture/kv_cache_routing.md) ||||
| [**SLA-Based Planner**](docs/architecture/sla_planner.md) ||||
| [**KVBM**](docs/architecture/kvbm_architecture.md) || 🚧 ||
🤖 Prompt for AI Agents
In README.md around lines 54 to 59 the internal links use a leading slash (e.g.
/docs/...), which breaks on GitHub; remove the leading slash for each link in
that block so they read docs/... instead, updating all occurrences in those
lines to relative paths.

python -m dynamo.frontend --http-port 8000

# Terminal 2: Start backend worker
python -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but a 0.6B model will download much faster. Less waiting around. I believe Qwen3 0.6B is also vllm's default nowadays.

@@ -0,0 +1,300 @@
# Dynamo Quickstart Guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we have a lot of doc duplication.

why can't we just point to kubernetes/README.md in this quickstart kubernetes section ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- we have a couple of docs that could be reasonably described as quickstarts including examples/basics/quickstart, README.md#3-run-dynamo, docs/kubernetes/README.md etc. If we want to create a new one here lets delete at least one of the others

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 25, 2025
@github-actions
Copy link

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

@github-actions github-actions bot closed this Oct 31, 2025
@github-actions github-actions bot deleted the clean-ux-fixes branch October 31, 2025 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants