-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[None] [chore] Update wide-ep genonly scripts #6995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@kaiyux has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 27 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughAdds a new ctx_gpu_memory_fraction positional argument and benchmark_mode across Slurm benchmark scripts, reorders downstream positional arguments, derives and passes ctx_free_gpu_memory_fraction to YAML generator, updates start_worker.sh to accept benchmark_mode and concurrency with mode-specific environment exports, adjusts log-path parsing, and adds a gen-only submission script. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Submit as submit_*.sh
participant Slurm as disaggr_torch.slurm
participant YAML as YAML Generator
participant Worker as start_worker.sh
participant TRT as trtllm-serve
User->>Submit: Build args (incl. benchmark_mode, ctx_gpu_memory_fraction)
Submit->>Slurm: sbatch call with reordered/expanded args
Slurm->>Slurm: validate benchmark_mode (gen_only/e2e)
Slurm->>YAML: generate config (--ctx_free_gpu_memory_fraction)
Slurm->>Worker: launch (config, enable_pdl, ctx_gpus, benchmark_mode, concurrency, work_dir)
alt benchmark_mode == gen_only
Worker->>Worker: export TRTLLM_DISABLE_KV_CACHE_TRANSFER_OVERLAP=1
Worker->>Worker: export TLLM_BENCHMARK_REQ_QUEUES_SIZE=concurrency
else e2e
Worker->>Worker: no extra env changes
end
Worker->>TRT: start trtllm-serve (nsys optional)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
examples/wide_ep/slurm_scripts/process_gen_iterlog.py (1)
1-7: Missing NVIDIA copyright header and function docstrings.Per repo guidelines, Python sources must include the NVIDIA copyright header. Also add docstrings for externally used functions.
Apply this header at the top of the file:
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.Would you like me to draft Google-style docstrings for
process_files?
🧹 Nitpick comments (12)
examples/wide_ep/slurm_scripts/process_gen_iterlog.py (6)
21-22: Regex update aligns with new path schema; consider precompiling outside the loop.Looks good for the new ctxX_genY_… pattern. Minor nit: this runs re.search for every file. Precompile once before the loop for small perf/readability win.
Example (apply outside the selected range):
# above the for-loop path_re = re.compile(r'ctx\d+_gen\d+_(tep|dep)(\d+)_batch(\d+)_eplb(\d+)(?:_mtp(\d+))?') # inside the loop match = path_re.search(file)
27-31: Make concurrency extraction robust to avoid false positives across full paths.Matching any “concurrency_” substring in the entire path can lead to accidental matches. Anchor to path segment boundaries.
Apply this diff:
- concurrency_match = re.search(r'concurrency_(\d+)', file) - if not concurrency_match: - print(f"No concurrency match found for file {file}") - continue + concurrency_match = re.search(r'(?:^|/)concurrency_(\d+)(?:/|$)', file) + if not concurrency_match: + print(f"No concurrency match found for file {file}") + continueAlternatively, use pathlib to scan path components (apply import outside this hunk):
from pathlib import Path parts = Path(file).parts concurrency = next( (int(p.split('_', 1)[1]) for p in parts if p.startswith('concurrency_')), None ) if concurrency is None: print(f"No concurrency match found for file {file}") continue
36-37: Remove the no-op conversion or assign it to a named variable.
int(match.group(3))does nothing and may confuse readers. Either remove it or store as a descriptive variable (e.g., gen_batch_size) if you plan to use it later.Apply this diff to remove:
- int(match.group(3))If you need it later, prefer:
+ gen_batch_size = int(match.group(3))
100-101: Derive output_file robustly for filenames with multiple dots.
file.split('.')[0]truncates too aggressively when directories or filenames contain dots. Use os.path.splitext.Apply this diff:
- output_file = file.split('.')[0] + '.csv' + import os + output_file = os.path.splitext(file)[0] + '.csv'Note: Move the
import osto the top-level imports if you prefer.
146-152: Bug: printing “Total records processed” uses the last file’s data.At Line 151,
len(data)refers to the last loop’s local variable, not the overall processed count. Either sum row counts or print the number of summary rows.Apply this diff:
- print(f"Total records processed: {len(data)}") + print(f"Total records processed: {len(df)}")Or, if you want the total number of per-iter rows across all files, maintain a running counter (outside this hunk).
156-163: Make --dir_prefix required and document expectations.If the argument is omitted, the f-string becomes 'None*/concurrency_*/gen_only.txt' and no files are found. Explicitly require it or provide a sensible default + docstring for
process_files.Apply this diff:
- parser.add_argument('--dir_prefix', - help='Directory prefix to search for benchmark files') + parser.add_argument('--dir_prefix', + required=True, + help='Directory prefix (e.g., /logs/run123/) used to search */concurrency_*/gen_only.txt')Optional: Add a short Google-style docstring to
process_files.examples/disaggregated/slurm/benchmark/start_worker.sh (2)
6-8: Signature change looks good; include new args in logging for traceability.You added benchmark_mode and concurrency as positional args; reflect them in the startup echo for easier debugging.
Apply this diff:
-echo "config_file: ${config_file}, enable_pdl: ${enable_pdl}, ctx_gpus: ${ctx_gpus}, work_dir: ${work_dir}" +echo "config_file: ${config_file}, enable_pdl: ${enable_pdl}, ctx_gpus: ${ctx_gpus}, benchmark_mode: ${benchmark_mode}, concurrency: ${concurrency}, work_dir: ${work_dir}"Also consider renaming
work_dirtonsys_onfor clarity since it gates nsys profiling rather than representing a directory.
19-23: Gen-only gating is appropriate; consider quoting and strict-mode.The env toggles for gen-only are correct. Minor: quote variable expansion to be safe and add strict-mode for robustness.
Example diff:
-if [ "${benchmark_mode}" = "gen_only" ]; then +if [ "${benchmark_mode}" = "gen_only" ]; then export TRTLLM_DISABLE_KV_CACHE_TRANSFER_OVERLAP=1 - export TLLM_BENCHMARK_REQ_QUEUES_SIZE=${concurrency} + export TLLM_BENCHMARK_REQ_QUEUES_SIZE="${concurrency}" fiOptional at file top:
+set -euo pipefailexamples/disaggregated/slurm/benchmark/disaggr_torch.slurm (2)
16-16: New ctx_gpu_memory_fraction plumbed—consider sanity checks.Good addition. Add a quick check to ensure 0 < fraction <= 1 to avoid pathological configs.
Example (apply near argument parsing):
awk_prog='BEGIN{exit !('"${ctx_gpu_memory_fraction} > 0 && ${ctx_gpu_memory_fraction} <= 1"' )}' awk "$awk_prog" </dev/null || { echo "ctx_gpu_memory_fraction must be in (0,1]. Got: ${ctx_gpu_memory_fraction}"; exit 1; }
108-119: Quoting and container options—minor hardening.Consider quoting container-related vars to handle spaces and adding
--container-writableonly if needed; also redirect install logs robustly.Example:
-srun -l --container-image=${container_image} \ - --container-name=${container_name} \ - --container-mounts=${mounts} \ +srun -l --container-image="${container_image}" \ + --container-name="${container_name}" \ + --container-mounts="${mounts}" \ --mpi=pmix \ echo "Container up."Apply similar quoting in subsequent srun lines.
examples/disaggregated/slurm/benchmark/submit.sh (1)
24-24: Context servers GPU memory fraction (0.75) wired—confirm consistent with profiling needs.Value seems reasonable; if you plan to sweep this, consider making it a top-level variable like gen fraction.
examples/wide_ep/slurm_scripts/submit.sh (1)
33-33: Add ctx GPU memory fraction (0.85) — verify it aligns with available headroom.Looks fine; ensure this matches the target GPU SKU (GB200 vs B200) expectations. If you support both, consider parameterizing per-SKU.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm(3 hunks)examples/disaggregated/slurm/benchmark/start_worker.sh(2 hunks)examples/disaggregated/slurm/benchmark/submit.sh(2 hunks)examples/wide_ep/slurm_scripts/process_gen_iterlog.py(1 hunks)examples/wide_ep/slurm_scripts/submit.sh(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
examples/wide_ep/slurm_scripts/process_gen_iterlog.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
examples/wide_ep/slurm_scripts/process_gen_iterlog.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (8)
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (2)
121-144: No action needed: gen_yaml.py already supports --ctx_free_gpu_memory_fraction.The
--ctx_free_gpu_memory_fractionflag is defined in the argument parser, mapped to thectx_free_gpu_memory_fractionfield, and correctly emitted into the generated YAML (both at the top‐level and withinkv_cache_config).
155-156: All start_worker.sh invocations are up to date
I searched the entire repo and found only:
- The updated call in
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm(lines 155–156)- Documentation references in
examples/disaggregated/slurm/benchmark/README.mdThere are no other code call sites needing updates.
examples/disaggregated/slurm/benchmark/submit.sh (2)
21-21: benchmark_mode introduced (e2e) — ensure downstream workers honor it.This will flow into disaggr_torch.slurm and then start_worker.sh. Looks consistent with the updated worker signature.
36-36: Args ordering is consistent with disaggr_torch.slurm’s updated interface.Good catch appending benchmark_mode before repo_dir.
examples/wide_ep/slurm_scripts/submit.sh (4)
19-22: Switching to single round and disabling streaming is sensible for gen-only benchmarking.These defaults should reduce noise and stabilize throughput measurements.
45-46: Propagating benchmark_mode to slurm entrypoint is consistent with the worker changes.Invocation order matches disaggr_torch.slurm. LGTM.
71-71: Same ctx fraction update carried to dep32 block — consistent.No issues spotted.
83-84: Args ordering in dep32 block remains correct with benchmark_mode addition.Matches the dep16 block; good consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (4)
51-51: Confirm semantics: mapping ctx_gpu_memory_fraction to --ctx_free_gpu_memory_fraction (+ add range validation)You log ctx_gpu_memory_fraction, then pass the same value to gen_yaml.py as --ctx_free_gpu_memory_fraction. Please confirm this is intentional (i.e., the input is the free fraction, not the used/reserved fraction). Either way, adding a numeric range check will prevent hard-to-debug misconfigurations.
Apply this diff to validate input and fail fast on invalid values (keeps the current mapping intact):
ctx_max_seq_len=$((isl + 1)) gen_max_seq_len=$((isl + osl)) -ctx_gpu_frac=${ctx_gpu_memory_fraction} +# Validate ctx_gpu_memory_fraction in (0,1] +if ! awk 'BEGIN{v='"${ctx_gpu_memory_fraction}"'; if (v>0 && v<=1) exit 0; else exit 1}'; then + echo "Invalid ctx_gpu_memory_fraction '${ctx_gpu_memory_fraction}'. Expected a float in (0,1]." + exit 64 +fi +ctx_gpu_frac=${ctx_gpu_memory_fraction}If the intended meaning is “fraction of total memory to be used (not free)”, then the mapping should likely be inverted before passing to gen_yaml.py:
# Alternative mapping if the input is a 'used' fraction: ctx_gpu_frac=$(awk 'BEGIN{v='"${ctx_gpu_memory_fraction}"'; printf "%.6f", (1.0 - v)}')Follow-up: Please confirm gen_yaml.py indeed expects --ctx_free_gpu_memory_fraction in the target branch to avoid a silent mismatch.
Also applies to: 76-77, 138-139
100-105: Normalize benchmark_mode to lowercase before validationMinor polish: normalize to lowercase so values like “Gen_Only” or “E2E” are accepted without surprising rejections.
-# check benchmark_mode -if [ "${benchmark_mode}" != "gen_only" ] && [ "${benchmark_mode}" != "e2e" ]; then +# check benchmark_mode (normalize to lowercase) +benchmark_mode=$(echo "${benchmark_mode}" | tr '[:upper:]' '[:lower:]') +if [ "${benchmark_mode}" != "gen_only" ] && [ "${benchmark_mode}" != "e2e" ]; then echo "benchmark_mode: ${benchmark_mode} is not supported, change to default value: e2e" benchmark_mode="e2e" fi
161-161: Quote start_worker.sh arguments to preserve positions when nsys_on is empty and avoid word-splittingWithout quotes, an empty ${nsys_on} drops an argument entirely and any spaces in paths/mounts will split. Quoting is safer and keeps the positional interface stable.
- bash ${workdir}/start_worker.sh ${full_logdir}/config.yaml "${enable_pdl}" ${ctx_gpus} ${benchmark_mode} ${concurrency} ${nsys_on} &> ${full_logdir}/output_workers.log & + bash "${workdir}/start_worker.sh" "${full_logdir}/config.yaml" "${enable_pdl}" "${ctx_gpus}" "${benchmark_mode}" "${concurrency}" "${nsys_on}" &> "${full_logdir}/output_workers.log" &
128-131: Limit YAML generation to a single task to avoid duplicate/conflicting writesThis srun call doesn’t constrain tasks; with ntasks=8 the YAML step likely runs multiple times. Align it with server/worker invocations by pinning to one task.
srun -l --container-name=${container_name} \ --container-mounts=${mounts} \ - --mpi=pmix --overlap \ + --mpi=pmix --overlap -N 1 -n 1 \ python3 ${workdir}/gen_yaml.py --config ${full_logdir}/config.yaml \
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm(4 hunks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
examples/wide_ep/slurm_scripts/submit_e2e.sh (2)
19-21: Parameterize multi_round/streaming/benchmark_mode for flexibility.Hard-coding these defaults can surprise users. Make them overridable via env vars to keep this script reusable across scenarios.
Apply within this hunk:
-multi_round=1 -streaming=false -benchmark_mode=e2e +multi_round=${MULTI_ROUND:-1} +streaming=${STREAMING:-false} +benchmark_mode=${BENCHMARK_MODE:-e2e}
33-35: Hoist GPU memory fractions into named variablesAvoid embedding magic fractions directly in the server-launch arguments. At the top of
examples/wide_ep/slurm_scripts/submit_e2e.sh, add tunable variables:# GPU memory fractions (tunable) ctx_gpu_memory_fraction=${CTX_GPU_MEMORY_FRACTION:-0.85} gen_gpu_memory_fraction=${GEN_GPU_MEMORY_FRACTION:-0.7}Then update the two server-argument blocks (lines 33–35 and 71–73) to use these variables:
--- a/examples/wide_ep/slurm_scripts/submit_e2e.sh +++ b/examples/wide_ep/slurm_scripts/submit_e2e.sh @@ 33,35c33,35 - ${ctx_num} 4 4 4480 true "0.85" # Context servers arguments + ${ctx_num} 4 4 4480 true ${ctx_gpu_memory_fraction} # Context servers arguments - 1 16 1024 1024 true "0.7" # Generation servers arguments + 1 16 1024 1024 true ${gen_gpu_memory_fraction} # Generation servers arguments @@ 71,73c71,73 - ${ctx_num} 4 4 4480 true "0.85" # Context servers arguments + ${ctx_num} 4 4 4480 true ${ctx_gpu_memory_fraction} # Context servers arguments - 1 32 1024 1024 true "0.7" # Generation servers arguments + 1 32 1024 1024 true ${gen_gpu_memory_fraction} # Generation servers argumentsI’ve confirmed in
examples/disaggregated/slurm/benchmark/disaggr_torch.slurmthat the 6th positional parameter is bound toctx_gpu_memory_fraction=${6}, so these changes will be correctly propagated.examples/wide_ep/slurm_scripts/submit_gen_only.sh (6)
1-3: Enable strict mode early for safer failures.Add shell strict mode to catch unset vars and abort on errors. This reduces hard-to-debug partial submissions.
#!/bin/bash +set -euo pipefail
32-47: Quote array elements to avoid word-splitting (mount paths, repo paths, etc.).Unquoted expansions inside array literals are split on IFS. If any of these contain spaces, they’ll be broken into multiple args. Quote all variable expansions in the args array.
- args=( - ${ctx_num} 4 4 4480 true "0.85" # Context servers arguments - 1 16 1024 1024 true "0.7" # Generation servers arguments - $eplb_num_slots $mtp_size # Other arguments - $concurrency # Benchmarking arguments - $isl - $osl - $multi_round - $streaming - $container_image # User specific arguments - $mounts - $workdir - $model_dir - $benchmark_mode - $repo_dir - ) + args=( + "${ctx_num}" 4 4 4480 true "0.85" # Context servers arguments + 1 16 1024 1024 true "0.7" # Generation servers arguments + "${eplb_num_slots}" "${mtp_size}" # Other arguments + "${concurrency}" # Benchmarking arguments + "${isl}" + "${osl}" + "${multi_round}" + "${streaming}" + "${container_image}" # User specific arguments + "${mounts}" + "${workdir}" + "${model_dir}" + "${benchmark_mode}" + "${repo_dir}" + )
49-57: Quote sbatch arguments and add a unique job-name suffix per config.Quoting avoids surprises when values contain special chars. Adding a suffix improves traceability across the submission grid.
- sbatch --nodes=${total_node_num} \ - --ntasks=${ntasks} \ - --ntasks-per-node=${ntasks_per_node} \ - --partition=${partition} \ - --account=${account} \ - --job-name=${job_name} \ - --gres=gpu:${ntasks_per_node} \ - --segment=${total_node_num} \ - ${workdir}/disaggr_torch.slurm "${args[@]}" + sbatch --nodes="${total_node_num}" \ + --ntasks="${ntasks}" \ + --ntasks-per-node="${ntasks_per_node}" \ + --partition="${partition}" \ + --account="${account}" \ + --job-name="${job_name}-b${b}-eplb${eplb_num_slots}-dep16" \ + --gres="gpu:${ntasks_per_node}" \ + --segment="${total_node_num}" \ + "${workdir}/disaggr_torch.slurm" "${args[@]}"
70-85: Mirror quoting fixes in the second args array.- args=( - ${ctx_num} 4 4 4480 true "0.85" # Context servers arguments - 1 32 1024 1024 true "0.7" # Generation servers arguments - $eplb_num_slots $mtp_size # Other arguments - $concurrency # Benchmarking arguments - $isl - $osl - $multi_round - $streaming - $container_image # User specific arguments - $mounts - $workdir - $model_dir - $benchmark_mode - $repo_dir - ) + args=( + "${ctx_num}" 4 4 4480 true "0.85" # Context servers arguments + 1 32 1024 1024 true "0.7" # Generation servers arguments + "${eplb_num_slots}" "${mtp_size}" # Other arguments + "${concurrency}" # Benchmarking arguments + "${isl}" + "${osl}" + "${multi_round}" + "${streaming}" + "${container_image}" # User specific arguments + "${mounts}" + "${workdir}" + "${model_dir}" + "${benchmark_mode}" + "${repo_dir}" + )
87-95: Quote sbatch params and differentiate job names for dep32.- sbatch --nodes=${total_node_num} \ - --ntasks=${ntasks} \ - --ntasks-per-node=${ntasks_per_node} \ - --partition=${partition} \ - --account=${account} \ - --job-name=${job_name} \ - --gres=gpu:${ntasks_per_node} \ - --segment=${total_node_num} \ - ${workdir}/disaggr_torch.slurm "${args[@]}" + sbatch --nodes="${total_node_num}" \ + --ntasks="${ntasks}" \ + --ntasks-per-node="${ntasks_per_node}" \ + --partition="${partition}" \ + --account="${account}" \ + --job-name="${job_name}-b${b}-eplb${eplb_num_slots}-dep32" \ + --gres="gpu:${ntasks_per_node}" \ + --segment="${total_node_num}" \ + "${workdir}/disaggr_torch.slurm" "${args[@]}"
23-31: Optional: clarify dep-specific constants or gate them on GPU type.Comments hint at B200 vs GB200 adjustments. Consider gating via an env (e.g., GPU_ARCH=B200|GB200) to avoid accidental misconfigurations when switching clusters.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
examples/wide_ep/slurm_scripts/submit_e2e.sh(5 hunks)examples/wide_ep/slurm_scripts/submit_gen_only.sh(1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
examples/wide_ep/slurm_scripts/submit_gen_only.sh
[error] 5-5: Couldn't parse this variable assignment. Fix to allow more checks.
(SC1073)
[error] 5-5: Fix any mentioned problems and try again.
(SC1072)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
examples/wide_ep/slurm_scripts/submit_e2e.sh (1)
45-46: Arg ordering confirmed: benchmark_mode is correctly captured as$24.
- In
disaggr_torch.slurm, you havebenchmark_mode=${24}and a guard ensuring it’s eithergen_onlyore2e.- Downstream uses
${benchmark_mode}in the expected slot when invokingstart_worker.sh.No off-by-one issues detected.
examples/wide_ep/slurm_scripts/submit_gen_only.sh (2)
61-69: LGTM on the dep32 grid setup.The concurrency and node-count derivations align with the dep32 variant and fixed eplb=288.
32-47: Positional args verified — submit_gen_only.sh matches disaggr_torch.slurmBrief: I inspected the slurm interface and the submit script. disaggr_torch.slurm consumes the ctx/gen GPU-fraction args and benchmark_mode at the positions you suspected, and submit_gen_only.sh passes the hard-coded "0.85"/"0.7" and benchmark_mode in those same slots — no silent mis-wiring found.
Files / mapping (positional):
- examples/disaggregated/slurm/benchmark/disaggr_torch.slurm
- $1 num_ctx_servers
- $2 ctx_tp_size
- $3 ctx_batch_size
- $4 ctx_max_num_tokens
- $5 ctx_enable_attention_dp
- $6 ctx_gpu_memory_fraction
- $7 num_gen_servers
- $8 gen_tp_size
- $9 gen_batch_size
- $10 gen_max_num_tokens
- $11 gen_enable_attention_dp
- $12 gen_gpu_memory_fraction
- ...
- $24 benchmark_mode
- $25 trtllm_repo
- examples/wide_ep/slurm_scripts/submit_gen_only.sh
- The args array passes the context GPU fraction ("0.85") in the ctx slot and the generation GPU fraction ("0.7") in the gen slot, and benchmark_mode is passed at the user-args position that maps to $24.
Action: No changes required here — the positional order in submit_gen_only.sh matches disaggr_torch.slurm.
Signed-off-by: Xianjie <[email protected]>
Signed-off-by: Xianjie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
Signed-off-by: Kaiyu Xie <[email protected]>
|
/bot skip --comment "slurm scripts are not tested in the pipeline" |
|
PR_Github #15777 [ skip ] triggered by Bot |
|
PR_Github #15777 [ skip ] completed with state |
Summary by CodeRabbit
New Features
Refactor
Documentation
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.