-
Notifications
You must be signed in to change notification settings - Fork 1.8k
doc: Refactor documents and examples of disaggregated serving and wide ep #6054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors and consolidates SLURM scripts and accompanying documentation for both wide EP and disaggregated serving examples, updating parameter calculations, streamlining READMEs, and removing outdated script references.
- Use a shared
ntasks_per_node
variable inwide_ep/submit.sh
to calculate total tasks - Simplify the wide EP README to point to relocated disaggregated scripts
- Introduce new
examples/disaggregated/slurm
scripts and adjust defaults (time
,multi_round
,streaming
) - Refresh the disaggregated serving README and purge obsolete docs under
docs/source/scripts/disaggregated
Reviewed Changes
Copilot reviewed 11 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
examples/wide_ep/slurm_scripts/submit.sh | Compute total tasks via ntasks_per_node instead of hard-coded |
examples/wide_ep/slurm_scripts/README.md | Trim core scripts list and update usage to refer to disaggregated |
examples/disaggregated/slurm/submit.sh | Add standalone SLURM submit template for disaggregated benchmarks |
examples/disaggregated/slurm/disaggr_torch.slurm | Increase time, rounds, enable streaming, and expose nsys_on |
examples/disaggregated/README.md | Rename and reformat disaggregated serving instructions |
Comments suppressed due to low confidence (4)
examples/wide_ep/slurm_scripts/README.md:21
- The core scripts list has been reduced to only two items but still uses numbered bullets, which could confuse readers. Consider restoring the full list or renumbering to reflect only the remaining scripts.
2. **`process_gen_iterlog.py`** - Processes benchmark results and generates reports
examples/disaggregated/slurm/disaggr_torch.slurm:19
- [nitpick] Since the default for
streaming
was changed totrue
, it would be helpful to add a brief comment explaining the impact of this flag on job behavior.
streaming=true
examples/disaggregated/README.md:1
- [nitpick] The title was generalized to “Disaggregated Serving,” but elsewhere the project is called TensorRT-LLM. Consider renaming the heading to “TensorRT-LLM Disaggregated Serving” for consistency.
# Disaggregated Serving
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we will need to update this link though: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md?plain=1#L280
5168e43
to
6ded898
Compare
@pcastonguay Updated, thanks. |
/bot run |
PR_Github #12013 [ run ] triggered by Bot |
PR_Github #12013 [ run ] completed with state |
619b993
to
82cb10d
Compare
/bot run |
PR_Github #12083 [ run ] triggered by Bot |
PR_Github #12083 [ run ] completed with state |
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
82cb10d
to
c486ce3
Compare
WalkthroughThis update removes legacy SLURM scripts and related configuration generators for disaggregated serving, migrating all cluster launch and benchmarking scripts to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant submit.sh
participant Slurm
participant disaggr_torch.slurm
participant gen_yaml.py
User->>submit.sh: Run SLURM job submission script
submit.sh->>Slurm: sbatch disaggr_torch.slurm with params
Slurm->>disaggr_torch.slurm: Launch batch job
disaggr_torch.slurm->>gen_yaml.py: Generate YAML config
disaggr_torch.slurm->>Slurm: Start worker/server processes
disaggr_torch.slurm->>User: Job logs/results
Estimated code review effort4 (60–120 minutes) Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
examples/disaggregated/README.md (1)
108-109
: Typo: “refersh_interval” → “refresh_interval”.-refersh_interval: 10.0 +refresh_interval: 10.0
♻️ Duplicate comments (2)
examples/disaggregated/slurm/submit.sh (1)
1-2
: Add error handling to improve script robustness.The script would benefit from adding
set -euo pipefail
after the shebang to ensure it exits on errors, undefined variables, or pipeline failures.#!/bin/bash +set -euo pipefail
examples/wide_ep/slurm_scripts/submit.sh (1)
9-9
: Add documentation for the magic number.The number 5500 should be documented as it represents the maximum number of requests each DEP4 context server can handle before sending them to generation servers for 1k-1k benchmarking.
- ctx_num=$(((concurrency + 5499)/5500)) + # 5500 is the max requests per DEP4 context server for 1k-1k benchmarking + ctx_num=$(((concurrency + 5499)/5500))
🧹 Nitpick comments (8)
examples/disaggregated/slurm/submit.sh (1)
27-32
: Consider improving readability of the long command.The sbatch command with 14 positional arguments is quite long and could benefit from line breaks for better readability.
-sbatch --nodes=${total_node_num} \ - --ntasks=${ntasks} \ - --ntasks-per-node=${ntasks_per_node} \ - --segment=${total_node_num} \ - disaggr_torch.slurm \ - ${ctx_num} 4 4 4480 true 1 8 1024 1024 true "0.8" 0 0 "$concurrency" +sbatch --nodes=${total_node_num} \ + --ntasks=${ntasks} \ + --ntasks-per-node=${ntasks_per_node} \ + --segment=${total_node_num} \ + disaggr_torch.slurm \ + ${ctx_num} \ + 4 \ + 4 \ + 4480 \ + true \ + 1 \ + 8 \ + 1024 \ + 1024 \ + true \ + "0.8" \ + 0 \ + 0 \ + "$concurrency"examples/wide_ep/slurm_scripts/submit.sh (1)
27-33
: Consider refactoring single-iteration loop.The static analysis tool correctly identifies that this loop only runs once. Consider whether this should be a loop or a simple command.
If this is intended to be a single execution, you could simplify it:
-# dep32 eplb288 -for b in 512; do - concurrency=$((b * 32)) - ctx_num=$(((concurrency + 5499)/5500)) - total_node_num=$((ctx_num + 8)) - ntasks=$((total_node_num * ntasks_per_node)) - sbatch --nodes=${total_node_num} --ntasks=${ntasks} --ntasks-per-node=${ntasks_per_node} --segment=${total_node_num} disaggr_torch.slurm ${ctx_num} 4 4 4480 true 1 32 1024 1024 true "0.7" 288 "$mtp_size" "$concurrency" -done +# dep32 eplb288 +b=512 +concurrency=$((b * 32)) +ctx_num=$(((concurrency + 5499)/5500)) +total_node_num=$((ctx_num + 8)) +ntasks=$((total_node_num * ntasks_per_node)) +sbatch --nodes=${total_node_num} --ntasks=${ntasks} --ntasks-per-node=${ntasks_per_node} --segment=${total_node_num} disaggr_torch.slurm ${ctx_num} 4 4 4480 true 1 32 1024 1024 true "0.7" 288 "$mtp_size" "$concurrency"examples/disaggregated/README.md (5)
9-13
: Add explicit language identifier to fenced block.
markdownlint
flags MD040 – fenced code blocks should declare a language.
Mark this snippet as YAML for proper highlighting/lint‐compliance.-``` +```yaml cache_transceiver_config: backend: <str> max_tokens_in_buffer: <int>
22-32
: Label shell snippet and silence MD040.All three server-launch commands sit in a single fenced block without a language tag.
Addbash
so docs render correctly and pass lint.-``` +```bash echo -e "disable_overlap_scheduler: True\ncache_transceiver_config:\n backend: UCX\n max_tokens_in_buffer: 2048" > context_extra-llm-api-config.yml ... CUDA_VISIBLE_DEVICES=2 trtllm-serve ...
38-56
: Missing YAML language tag indisagg_config.yaml
example.Same MD040 issue; add
yaml
for consistency/readability.-``` +```yaml hostname: localhost ...
68-76
: Addbash
tag to cURL example.-``` +```bash curl http://localhost:8000/v1/completions \ ...
170-172
: Heading typo – “Know Issues” should be “Known Issues”.Minor doc polish.
-## Know Issues +## Known Issuesexamples/wide_ep/slurm_scripts/README.md (1)
38-40
: Inline comment risks breaking copy-paste.The two comment lines inside the fenced
bash
block will be executed if copied verbatim.
Consider moving them outside the code block or prefixing with#
inside a dedicated explanatory snippet.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
(2 hunks)docs/source/scripts/disaggregated/disaggr_torch.slurm
(0 hunks)docs/source/scripts/disaggregated/gen_yaml.py
(0 hunks)docs/source/scripts/disaggregated/run_benchmark.sh
(0 hunks)docs/source/scripts/disaggregated/start_worker.sh
(0 hunks)docs/source/scripts/disaggregated/submit.sh
(0 hunks)examples/disaggregated/README.md
(3 hunks)examples/disaggregated/slurm/disaggr_torch.slurm
(1 hunks)examples/disaggregated/slurm/gen_yaml.py
(2 hunks)examples/disaggregated/slurm/slurm_populate_urls.py
(0 hunks)examples/disaggregated/slurm/submit.sh
(1 hunks)examples/wide_ep/slurm_scripts/README.md
(2 hunks)examples/wide_ep/slurm_scripts/submit.sh
(1 hunks)
💤 Files with no reviewable changes (6)
- docs/source/scripts/disaggregated/start_worker.sh
- docs/source/scripts/disaggregated/submit.sh
- examples/disaggregated/slurm/slurm_populate_urls.py
- docs/source/scripts/disaggregated/run_benchmark.sh
- docs/source/scripts/disaggregated/disaggr_torch.slurm
- docs/source/scripts/disaggregated/gen_yaml.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/disaggregated/README.md
67-67: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.10.0)
examples/wide_ep/slurm_scripts/submit.sh
[warning] 27-27: This loop will only ever run once. Bad quoting or missing glob/expansion?
(SC2043)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (11)
examples/disaggregated/slurm/gen_yaml.py (2)
184-187
: Configuration update looks good.The cache transceiver configuration has been updated to use the new API with explicit backend specification and increased buffer size. The change from
max_num_tokens
tomax_tokens_in_buffer
and the increase from 4608 to 8320 tokens should improve performance.
206-209
: Consistent configuration applied to generation servers.The same cache transceiver configuration updates are correctly applied to both context and generation servers, maintaining consistency.
docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md (2)
280-280
: Documentation link updated correctly.The link has been properly updated to reflect the new location of the disaggregated serving scripts, which aligns with the consolidation efforts described in the PR objectives.
5-25
: Table of contents standardized.The anchor links have been properly standardized to lowercase and hyphenated format, improving markdown consistency and compatibility.
examples/disaggregated/slurm/disaggr_torch.slurm (4)
7-7
: Time limit increase is appropriate.Increasing the job time limit from 1 to 2 hours makes sense given the change to multiple rounds of testing.
12-12
: Multiple rounds will provide better performance data.Changing from single round to 10 rounds will give more comprehensive benchmarking results.
19-19
: Streaming mode enabled for realistic testing.Enabling streaming mode better reflects real-world usage scenarios.
102-113
: Script dependencies verifiedAll referenced helper scripts (
start_worker.sh
,start_server.sh
,run_benchmark.sh
) are present inexamples/disaggregated/slurm/
. No changes are needed.examples/wide_ep/slurm_scripts/submit.sh (1)
4-4
: Good parameterization improvements.The introduction of
ntasks_per_node
and calculated values fortotal_node_num
andntasks
makes the script more maintainable and clearer about resource allocation.Also applies to: 10-12
examples/disaggregated/README.md (1)
15-17
: Verify spelling of backend option “NIXL”.Common back-ends are
UCX
,NCCL
,MPI
, etc. “NIXL” looks suspicious – please confirm it is intentional and supported throughout the codebase / CLI help.examples/wide_ep/slurm_scripts/README.md (1)
20-23
: Clarify relocation note and check link accuracy.Good to point users to the consolidated scripts, but please ensure
examples/disaggregated/slurm/submit.sh
(anddisaggr_torch.slurm
) actually exist after the move; stale paths will confuse users.
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
/bot run |
PR_Github #12545 [ run ] triggered by Bot |
PR_Github #12545 [ run ] completed with state |
/bot run |
PR_Github #12564 [ run ] triggered by Bot |
PR_Github #12564 [ run ] completed with state |
…e ep (NVIDIA#6054) Signed-off-by: Kaiyu Xie <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
…e ep (NVIDIA#6054) Signed-off-by: Kaiyu Xie <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
This PR:
examples/disaggregated/slurm
, including the scripts that were added inexamples/wide_ep/slurm_scripts
.Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores