Skip to content

Conversation

@qiaoxj07
Copy link
Collaborator

@qiaoxj07 qiaoxj07 commented Aug 18, 2025

Summary by CodeRabbit

  • New Features

    • Selectable benchmark modes (e2e, gen_only) with validation, logging, and mode-specific environment settings.
    • Configurable context GPU memory fraction exposed and logged; passed into job YAML generation.
    • New generate-only submission workflow to run and submit gen-only benchmark grids.
  • Refactor

    • Reordered script interfaces to accept benchmark_mode and concurrency as visible arguments.
    • Log processing updated to extract concurrency from directory paths and support new naming patterns.
  • Documentation

    • README updated with E2E and gen-only run instructions and post-processing guidance.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@qiaoxj07 qiaoxj07 requested a review from kaiyux August 18, 2025 12:13
@qiaoxj07 qiaoxj07 requested review from a team as code owners August 18, 2025 12:13
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Warning

Rate limit exceeded

@kaiyux has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 27 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b820ab6 and 362bf19.

📒 Files selected for processing (7)
  • examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (4 hunks)
  • examples/disaggregated/slurm/benchmark/start_worker.sh (2 hunks)
  • examples/disaggregated/slurm/benchmark/submit.sh (2 hunks)
  • examples/wide_ep/slurm_scripts/README.md (2 hunks)
  • examples/wide_ep/slurm_scripts/process_gen_iterlog.py (1 hunks)
  • examples/wide_ep/slurm_scripts/submit_e2e.sh (5 hunks)
  • examples/wide_ep/slurm_scripts/submit_gen_only.sh (1 hunks)
📝 Walkthrough

Walkthrough

Adds a new ctx_gpu_memory_fraction positional argument and benchmark_mode across Slurm benchmark scripts, reorders downstream positional arguments, derives and passes ctx_free_gpu_memory_fraction to YAML generator, updates start_worker.sh to accept benchmark_mode and concurrency with mode-specific environment exports, adjusts log-path parsing, and adds a gen-only submission script.

Changes

Cohort / File(s) Summary
Benchmark Orchestration (disaggregated)
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm
Adds positional arg ctx_gpu_memory_fraction; introduces and validates benchmark_mode (gen_only/e2e); derives ctx_gpu_frac--ctx_free_gpu_memory_fraction; logs ctx_gpu_memory_fraction and benchmark_mode; renumbers downstream args and updates start_worker.sh invocation to include benchmark_mode and concurrency.
Worker Launcher
examples/disaggregated/slurm/benchmark/start_worker.sh
Changes signature to (config_file, enable_pdl, ctx_gpus, benchmark_mode, concurrency, work_dir); in gen_only mode exports TRTLLM_DISABLE_KV_CACHE_TRANSFER_OVERLAP=1 and TLLM_BENCHMARK_REQ_QUEUES_SIZE=${concurrency}; otherwise behavior unchanged.
Submit (disaggregated)
examples/disaggregated/slurm/benchmark/submit.sh
Adds benchmark_mode=e2e; inserts a literal ctx GPU-fraction (0.75) into context-server args; passes $benchmark_mode into args array before repo_dir.
E2E Submit (wide_ep)
examples/wide_ep/slurm_scripts/submit_e2e.sh
Adds benchmark_mode=e2e; inserts context GPU-fraction "0.85" into context args; propagates $benchmark_mode into run args.
Gen-only Submit (wide_ep)
examples/wide_ep/slurm_scripts/submit_gen_only.sh
New script: submits gen-only benchmarks across parameter grids, computes concurrency/node/task counts, builds args including benchmark_mode, and calls disaggr_torch.slurm via sbatch.
Log Processing
examples/wide_ep/slurm_scripts/process_gen_iterlog.py
Updates path regex to ctxX_genX_..._batch..., extracts concurrency from concurrency_(N) path segment (with guard and error message) instead of previous capture group; keeps parsing/aggregation logic unchanged.
Docs / README
examples/wide_ep/slurm_scripts/README.md
Replaces single submit.sh entry with submit_e2e.sh and new submit_gen_only.sh; documents new gen-only workflow and post-processing with process_gen_iterlog.py; updates prerequisites.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant Submit as submit_*.sh
  participant Slurm as disaggr_torch.slurm
  participant YAML as YAML Generator
  participant Worker as start_worker.sh
  participant TRT as trtllm-serve

  User->>Submit: Build args (incl. benchmark_mode, ctx_gpu_memory_fraction)
  Submit->>Slurm: sbatch call with reordered/expanded args
  Slurm->>Slurm: validate benchmark_mode (gen_only/e2e)
  Slurm->>YAML: generate config (--ctx_free_gpu_memory_fraction)
  Slurm->>Worker: launch (config, enable_pdl, ctx_gpus, benchmark_mode, concurrency, work_dir)

  alt benchmark_mode == gen_only
    Worker->>Worker: export TRTLLM_DISABLE_KV_CACHE_TRANSFER_OVERLAP=1
    Worker->>Worker: export TLLM_BENCHMARK_REQ_QUEUES_SIZE=concurrency
  else e2e
    Worker->>Worker: no extra env changes
  end

  Worker->>TRT: start trtllm-serve (nsys optional)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested labels

Documentation

Suggested reviewers

  • Tabrizian
  • QiJune
  • chuangz0
  • qiaoxj07
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
examples/wide_ep/slurm_scripts/process_gen_iterlog.py (1)

1-7: Missing NVIDIA copyright header and function docstrings.

Per repo guidelines, Python sources must include the NVIDIA copyright header. Also add docstrings for externally used functions.

Apply this header at the top of the file:

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.

Would you like me to draft Google-style docstrings for process_files?

🧹 Nitpick comments (12)
examples/wide_ep/slurm_scripts/process_gen_iterlog.py (6)

21-22: Regex update aligns with new path schema; consider precompiling outside the loop.

Looks good for the new ctxX_genY_… pattern. Minor nit: this runs re.search for every file. Precompile once before the loop for small perf/readability win.

Example (apply outside the selected range):

# above the for-loop
path_re = re.compile(r'ctx\d+_gen\d+_(tep|dep)(\d+)_batch(\d+)_eplb(\d+)(?:_mtp(\d+))?')

# inside the loop
match = path_re.search(file)

27-31: Make concurrency extraction robust to avoid false positives across full paths.

Matching any “concurrency_” substring in the entire path can lead to accidental matches. Anchor to path segment boundaries.

Apply this diff:

-        concurrency_match = re.search(r'concurrency_(\d+)', file)
-        if not concurrency_match:
-            print(f"No concurrency match found for file {file}")
-            continue
+        concurrency_match = re.search(r'(?:^|/)concurrency_(\d+)(?:/|$)', file)
+        if not concurrency_match:
+            print(f"No concurrency match found for file {file}")
+            continue

Alternatively, use pathlib to scan path components (apply import outside this hunk):

from pathlib import Path

parts = Path(file).parts
concurrency = next(
    (int(p.split('_', 1)[1]) for p in parts if p.startswith('concurrency_')), 
    None
)
if concurrency is None:
    print(f"No concurrency match found for file {file}")
    continue

36-37: Remove the no-op conversion or assign it to a named variable.

int(match.group(3)) does nothing and may confuse readers. Either remove it or store as a descriptive variable (e.g., gen_batch_size) if you plan to use it later.

Apply this diff to remove:

-        int(match.group(3))

If you need it later, prefer:

+        gen_batch_size = int(match.group(3))

100-101: Derive output_file robustly for filenames with multiple dots.

file.split('.')[0] truncates too aggressively when directories or filenames contain dots. Use os.path.splitext.

Apply this diff:

-            output_file = file.split('.')[0] + '.csv'
+            import os
+            output_file = os.path.splitext(file)[0] + '.csv'

Note: Move the import os to the top-level imports if you prefer.


146-152: Bug: printing “Total records processed” uses the last file’s data.

At Line 151, len(data) refers to the last loop’s local variable, not the overall processed count. Either sum row counts or print the number of summary rows.

Apply this diff:

-        print(f"Total records processed: {len(data)}")
+        print(f"Total records processed: {len(df)}")

Or, if you want the total number of per-iter rows across all files, maintain a running counter (outside this hunk).


156-163: Make --dir_prefix required and document expectations.

If the argument is omitted, the f-string becomes 'None*/concurrency_*/gen_only.txt' and no files are found. Explicitly require it or provide a sensible default + docstring for process_files.

Apply this diff:

-    parser.add_argument('--dir_prefix',
-                        help='Directory prefix to search for benchmark files')
+    parser.add_argument('--dir_prefix',
+                        required=True,
+                        help='Directory prefix (e.g., /logs/run123/) used to search */concurrency_*/gen_only.txt')

Optional: Add a short Google-style docstring to process_files.

examples/disaggregated/slurm/benchmark/start_worker.sh (2)

6-8: Signature change looks good; include new args in logging for traceability.

You added benchmark_mode and concurrency as positional args; reflect them in the startup echo for easier debugging.

Apply this diff:

-echo "config_file: ${config_file}, enable_pdl: ${enable_pdl}, ctx_gpus: ${ctx_gpus}, work_dir: ${work_dir}"
+echo "config_file: ${config_file}, enable_pdl: ${enable_pdl}, ctx_gpus: ${ctx_gpus}, benchmark_mode: ${benchmark_mode}, concurrency: ${concurrency}, work_dir: ${work_dir}"

Also consider renaming work_dir to nsys_on for clarity since it gates nsys profiling rather than representing a directory.


19-23: Gen-only gating is appropriate; consider quoting and strict-mode.

The env toggles for gen-only are correct. Minor: quote variable expansion to be safe and add strict-mode for robustness.

Example diff:

-if [ "${benchmark_mode}" = "gen_only" ]; then
+if [ "${benchmark_mode}" = "gen_only" ]; then
     export TRTLLM_DISABLE_KV_CACHE_TRANSFER_OVERLAP=1
-    export TLLM_BENCHMARK_REQ_QUEUES_SIZE=${concurrency}
+    export TLLM_BENCHMARK_REQ_QUEUES_SIZE="${concurrency}"
 fi

Optional at file top:

+set -euo pipefail
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (2)

16-16: New ctx_gpu_memory_fraction plumbed—consider sanity checks.

Good addition. Add a quick check to ensure 0 < fraction <= 1 to avoid pathological configs.

Example (apply near argument parsing):

awk_prog='BEGIN{exit !('"${ctx_gpu_memory_fraction} > 0 && ${ctx_gpu_memory_fraction} <= 1"' )}'
awk "$awk_prog" </dev/null || { echo "ctx_gpu_memory_fraction must be in (0,1]. Got: ${ctx_gpu_memory_fraction}"; exit 1; }

108-119: Quoting and container options—minor hardening.

Consider quoting container-related vars to handle spaces and adding --container-writable only if needed; also redirect install logs robustly.

Example:

-srun -l --container-image=${container_image} \
-        --container-name=${container_name} \
-        --container-mounts=${mounts} \
+srun -l --container-image="${container_image}" \
+        --container-name="${container_name}" \
+        --container-mounts="${mounts}" \
         --mpi=pmix \
         echo "Container up."

Apply similar quoting in subsequent srun lines.

examples/disaggregated/slurm/benchmark/submit.sh (1)

24-24: Context servers GPU memory fraction (0.75) wired—confirm consistent with profiling needs.

Value seems reasonable; if you plan to sweep this, consider making it a top-level variable like gen fraction.

examples/wide_ep/slurm_scripts/submit.sh (1)

33-33: Add ctx GPU memory fraction (0.85) — verify it aligns with available headroom.

Looks fine; ensure this matches the target GPU SKU (GB200 vs B200) expectations. If you support both, consider parameterizing per-SKU.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 55f4f2d and 2962461.

📒 Files selected for processing (5)
  • examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (3 hunks)
  • examples/disaggregated/slurm/benchmark/start_worker.sh (2 hunks)
  • examples/disaggregated/slurm/benchmark/submit.sh (2 hunks)
  • examples/wide_ep/slurm_scripts/process_gen_iterlog.py (1 hunks)
  • examples/wide_ep/slurm_scripts/submit.sh (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • examples/wide_ep/slurm_scripts/process_gen_iterlog.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • examples/wide_ep/slurm_scripts/process_gen_iterlog.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (2)

121-144: No action needed: gen_yaml.py already supports --ctx_free_gpu_memory_fraction.

The --ctx_free_gpu_memory_fraction flag is defined in the argument parser, mapped to the ctx_free_gpu_memory_fraction field, and correctly emitted into the generated YAML (both at the top‐level and within kv_cache_config).


155-156: All start_worker.sh invocations are up to date
I searched the entire repo and found only:

  • The updated call in examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (lines 155–156)
  • Documentation references in examples/disaggregated/slurm/benchmark/README.md

There are no other code call sites needing updates.

examples/disaggregated/slurm/benchmark/submit.sh (2)

21-21: benchmark_mode introduced (e2e) — ensure downstream workers honor it.

This will flow into disaggr_torch.slurm and then start_worker.sh. Looks consistent with the updated worker signature.


36-36: Args ordering is consistent with disaggr_torch.slurm’s updated interface.

Good catch appending benchmark_mode before repo_dir.

examples/wide_ep/slurm_scripts/submit.sh (4)

19-22: Switching to single round and disabling streaming is sensible for gen-only benchmarking.

These defaults should reduce noise and stabilize throughput measurements.


45-46: Propagating benchmark_mode to slurm entrypoint is consistent with the worker changes.

Invocation order matches disaggr_torch.slurm. LGTM.


71-71: Same ctx fraction update carried to dep32 block — consistent.

No issues spotted.


83-84: Args ordering in dep32 block remains correct with benchmark_mode addition.

Matches the dep16 block; good consistency.

@qiaoxj07 qiaoxj07 changed the title [None] Update wide-ep genonly scripts [None] [chore] Update wide-ep genonly scripts Aug 18, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (4)

51-51: Confirm semantics: mapping ctx_gpu_memory_fraction to --ctx_free_gpu_memory_fraction (+ add range validation)

You log ctx_gpu_memory_fraction, then pass the same value to gen_yaml.py as --ctx_free_gpu_memory_fraction. Please confirm this is intentional (i.e., the input is the free fraction, not the used/reserved fraction). Either way, adding a numeric range check will prevent hard-to-debug misconfigurations.

Apply this diff to validate input and fail fast on invalid values (keeps the current mapping intact):

 ctx_max_seq_len=$((isl + 1))
 gen_max_seq_len=$((isl + osl))
-ctx_gpu_frac=${ctx_gpu_memory_fraction}
+# Validate ctx_gpu_memory_fraction in (0,1]
+if ! awk 'BEGIN{v='"${ctx_gpu_memory_fraction}"'; if (v>0 && v<=1) exit 0; else exit 1}'; then
+    echo "Invalid ctx_gpu_memory_fraction '${ctx_gpu_memory_fraction}'. Expected a float in (0,1]."
+    exit 64
+fi
+ctx_gpu_frac=${ctx_gpu_memory_fraction}

If the intended meaning is “fraction of total memory to be used (not free)”, then the mapping should likely be inverted before passing to gen_yaml.py:

# Alternative mapping if the input is a 'used' fraction:
ctx_gpu_frac=$(awk 'BEGIN{v='"${ctx_gpu_memory_fraction}"'; printf "%.6f", (1.0 - v)}')

Follow-up: Please confirm gen_yaml.py indeed expects --ctx_free_gpu_memory_fraction in the target branch to avoid a silent mismatch.

Also applies to: 76-77, 138-139


100-105: Normalize benchmark_mode to lowercase before validation

Minor polish: normalize to lowercase so values like “Gen_Only” or “E2E” are accepted without surprising rejections.

-# check benchmark_mode
-if [ "${benchmark_mode}" != "gen_only" ] && [ "${benchmark_mode}" != "e2e" ]; then
+# check benchmark_mode (normalize to lowercase)
+benchmark_mode=$(echo "${benchmark_mode}" | tr '[:upper:]' '[:lower:]')
+if [ "${benchmark_mode}" != "gen_only" ] && [ "${benchmark_mode}" != "e2e" ]; then
     echo "benchmark_mode: ${benchmark_mode} is not supported, change to default value: e2e"
     benchmark_mode="e2e"
 fi

161-161: Quote start_worker.sh arguments to preserve positions when nsys_on is empty and avoid word-splitting

Without quotes, an empty ${nsys_on} drops an argument entirely and any spaces in paths/mounts will split. Quoting is safer and keeps the positional interface stable.

-        bash ${workdir}/start_worker.sh ${full_logdir}/config.yaml "${enable_pdl}" ${ctx_gpus} ${benchmark_mode} ${concurrency} ${nsys_on} &> ${full_logdir}/output_workers.log &
+        bash "${workdir}/start_worker.sh" "${full_logdir}/config.yaml" "${enable_pdl}" "${ctx_gpus}" "${benchmark_mode}" "${concurrency}" "${nsys_on}" &> "${full_logdir}/output_workers.log" &

128-131: Limit YAML generation to a single task to avoid duplicate/conflicting writes

This srun call doesn’t constrain tasks; with ntasks=8 the YAML step likely runs multiple times. Align it with server/worker invocations by pinning to one task.

 srun -l --container-name=${container_name} \
         --container-mounts=${mounts} \
-        --mpi=pmix --overlap \
+        --mpi=pmix --overlap -N 1 -n 1 \
         python3 ${workdir}/gen_yaml.py --config ${full_logdir}/config.yaml \
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2962461 and 0d65cf4.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/benchmark/disaggr_torch.slurm (4 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
examples/wide_ep/slurm_scripts/submit_e2e.sh (2)

19-21: Parameterize multi_round/streaming/benchmark_mode for flexibility.

Hard-coding these defaults can surprise users. Make them overridable via env vars to keep this script reusable across scenarios.

Apply within this hunk:

-multi_round=1
-streaming=false
-benchmark_mode=e2e
+multi_round=${MULTI_ROUND:-1}
+streaming=${STREAMING:-false}
+benchmark_mode=${BENCHMARK_MODE:-e2e}

33-35: Hoist GPU memory fractions into named variables

Avoid embedding magic fractions directly in the server-launch arguments. At the top of examples/wide_ep/slurm_scripts/submit_e2e.sh, add tunable variables:

# GPU memory fractions (tunable)
ctx_gpu_memory_fraction=${CTX_GPU_MEMORY_FRACTION:-0.85}
gen_gpu_memory_fraction=${GEN_GPU_MEMORY_FRACTION:-0.7}

Then update the two server-argument blocks (lines 33–35 and 71–73) to use these variables:

--- a/examples/wide_ep/slurm_scripts/submit_e2e.sh
+++ b/examples/wide_ep/slurm_scripts/submit_e2e.sh
@@ 33,35c33,35
-           ${ctx_num} 4 4 4480 true "0.85"   # Context servers arguments
+           ${ctx_num} 4 4 4480 true ${ctx_gpu_memory_fraction}   # Context servers arguments
-           1 16 1024 1024 true "0.7"       # Generation servers arguments
+           1 16 1024 1024 true ${gen_gpu_memory_fraction}       # Generation servers arguments

@@ 71,73c71,73
-        ${ctx_num} 4 4 4480 true "0.85"   # Context servers arguments
+        ${ctx_num} 4 4 4480 true ${ctx_gpu_memory_fraction}   # Context servers arguments
-        1 32 1024 1024 true "0.7"  # Generation servers arguments
+        1 32 1024 1024 true ${gen_gpu_memory_fraction}       # Generation servers arguments

I’ve confirmed in examples/disaggregated/slurm/benchmark/disaggr_torch.slurm that the 6th positional parameter is bound to ctx_gpu_memory_fraction=${6}, so these changes will be correctly propagated.

examples/wide_ep/slurm_scripts/submit_gen_only.sh (6)

1-3: Enable strict mode early for safer failures.

Add shell strict mode to catch unset vars and abort on errors. This reduces hard-to-debug partial submissions.

 #!/bin/bash
 
+set -euo pipefail

32-47: Quote array elements to avoid word-splitting (mount paths, repo paths, etc.).

Unquoted expansions inside array literals are split on IFS. If any of these contain spaces, they’ll be broken into multiple args. Quote all variable expansions in the args array.

-        args=(
-            ${ctx_num} 4 4 4480 true "0.85"   # Context servers arguments
-            1 16 1024 1024 true "0.7"       # Generation servers arguments
-            $eplb_num_slots $mtp_size  # Other arguments
-            $concurrency               # Benchmarking arguments
-            $isl
-            $osl
-            $multi_round
-            $streaming
-            $container_image           # User specific arguments
-            $mounts
-            $workdir
-            $model_dir
-            $benchmark_mode
-            $repo_dir
-        )
+        args=(
+            "${ctx_num}" 4 4 4480 true "0.85"   # Context servers arguments
+            1 16 1024 1024 true "0.7"           # Generation servers arguments
+            "${eplb_num_slots}" "${mtp_size}"   # Other arguments
+            "${concurrency}"                    # Benchmarking arguments
+            "${isl}"
+            "${osl}"
+            "${multi_round}"
+            "${streaming}"
+            "${container_image}"                # User specific arguments
+            "${mounts}"
+            "${workdir}"
+            "${model_dir}"
+            "${benchmark_mode}"
+            "${repo_dir}"
+        )

49-57: Quote sbatch arguments and add a unique job-name suffix per config.

Quoting avoids surprises when values contain special chars. Adding a suffix improves traceability across the submission grid.

-        sbatch --nodes=${total_node_num} \
-            --ntasks=${ntasks} \
-            --ntasks-per-node=${ntasks_per_node} \
-            --partition=${partition} \
-            --account=${account} \
-            --job-name=${job_name} \
-            --gres=gpu:${ntasks_per_node} \
-            --segment=${total_node_num} \
-            ${workdir}/disaggr_torch.slurm "${args[@]}"
+        sbatch --nodes="${total_node_num}" \
+            --ntasks="${ntasks}" \
+            --ntasks-per-node="${ntasks_per_node}" \
+            --partition="${partition}" \
+            --account="${account}" \
+            --job-name="${job_name}-b${b}-eplb${eplb_num_slots}-dep16" \
+            --gres="gpu:${ntasks_per_node}" \
+            --segment="${total_node_num}" \
+            "${workdir}/disaggr_torch.slurm" "${args[@]}"

70-85: Mirror quoting fixes in the second args array.

-    args=(
-        ${ctx_num} 4 4 4480 true "0.85"   # Context servers arguments
-        1 32 1024 1024 true "0.7"  # Generation servers arguments
-        $eplb_num_slots $mtp_size  # Other arguments
-        $concurrency               # Benchmarking arguments
-        $isl
-        $osl
-        $multi_round
-        $streaming
-        $container_image           # User specific arguments
-        $mounts
-        $workdir
-        $model_dir
-        $benchmark_mode
-        $repo_dir
-    )
+    args=(
+        "${ctx_num}" 4 4 4480 true "0.85"     # Context servers arguments
+        1 32 1024 1024 true "0.7"             # Generation servers arguments
+        "${eplb_num_slots}" "${mtp_size}"     # Other arguments
+        "${concurrency}"                      # Benchmarking arguments
+        "${isl}"
+        "${osl}"
+        "${multi_round}"
+        "${streaming}"
+        "${container_image}"                  # User specific arguments
+        "${mounts}"
+        "${workdir}"
+        "${model_dir}"
+        "${benchmark_mode}"
+        "${repo_dir}"
+    )

87-95: Quote sbatch params and differentiate job names for dep32.

-    sbatch --nodes=${total_node_num} \
-        --ntasks=${ntasks} \
-        --ntasks-per-node=${ntasks_per_node} \
-        --partition=${partition} \
-        --account=${account} \
-        --job-name=${job_name} \
-        --gres=gpu:${ntasks_per_node} \
-        --segment=${total_node_num} \
-        ${workdir}/disaggr_torch.slurm "${args[@]}"
+    sbatch --nodes="${total_node_num}" \
+        --ntasks="${ntasks}" \
+        --ntasks-per-node="${ntasks_per_node}" \
+        --partition="${partition}" \
+        --account="${account}" \
+        --job-name="${job_name}-b${b}-eplb${eplb_num_slots}-dep32" \
+        --gres="gpu:${ntasks_per_node}" \
+        --segment="${total_node_num}" \
+        "${workdir}/disaggr_torch.slurm" "${args[@]}"

23-31: Optional: clarify dep-specific constants or gate them on GPU type.

Comments hint at B200 vs GB200 adjustments. Consider gating via an env (e.g., GPU_ARCH=B200|GB200) to avoid accidental misconfigurations when switching clusters.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0d65cf4 and fa2db82.

📒 Files selected for processing (2)
  • examples/wide_ep/slurm_scripts/submit_e2e.sh (5 hunks)
  • examples/wide_ep/slurm_scripts/submit_gen_only.sh (1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
examples/wide_ep/slurm_scripts/submit_gen_only.sh

[error] 5-5: Couldn't parse this variable assignment. Fix to allow more checks.

(SC1073)


[error] 5-5: Fix any mentioned problems and try again.

(SC1072)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
examples/wide_ep/slurm_scripts/submit_e2e.sh (1)

45-46: Arg ordering confirmed: benchmark_mode is correctly captured as $24.

  • In disaggr_torch.slurm, you have benchmark_mode=${24} and a guard ensuring it’s either gen_only or e2e.
  • Downstream uses ${benchmark_mode} in the expected slot when invoking start_worker.sh.

No off-by-one issues detected.

examples/wide_ep/slurm_scripts/submit_gen_only.sh (2)

61-69: LGTM on the dep32 grid setup.

The concurrency and node-count derivations align with the dep32 variant and fixed eplb=288.


32-47: Positional args verified — submit_gen_only.sh matches disaggr_torch.slurm

Brief: I inspected the slurm interface and the submit script. disaggr_torch.slurm consumes the ctx/gen GPU-fraction args and benchmark_mode at the positions you suspected, and submit_gen_only.sh passes the hard-coded "0.85"/"0.7" and benchmark_mode in those same slots — no silent mis-wiring found.

Files / mapping (positional):

  • examples/disaggregated/slurm/benchmark/disaggr_torch.slurm
    • $1 num_ctx_servers
    • $2 ctx_tp_size
    • $3 ctx_batch_size
    • $4 ctx_max_num_tokens
    • $5 ctx_enable_attention_dp
    • $6 ctx_gpu_memory_fraction
    • $7 num_gen_servers
    • $8 gen_tp_size
    • $9 gen_batch_size
    • $10 gen_max_num_tokens
    • $11 gen_enable_attention_dp
    • $12 gen_gpu_memory_fraction
    • ...
    • $24 benchmark_mode
    • $25 trtllm_repo
  • examples/wide_ep/slurm_scripts/submit_gen_only.sh
    • The args array passes the context GPU fraction ("0.85") in the ctx slot and the generation GPU fraction ("0.7") in the gen slot, and benchmark_mode is passed at the user-args position that maps to $24.

Action: No changes required here — the positional order in submit_gen_only.sh matches disaggr_torch.slurm.

qiaoxj07 and others added 5 commits August 19, 2025 19:25
@kaiyux
Copy link
Member

kaiyux commented Aug 19, 2025

/bot skip --comment "slurm scripts are not tested in the pipeline"

@kaiyux kaiyux enabled auto-merge (squash) August 19, 2025 11:26
@tensorrt-cicd
Copy link
Collaborator

PR_Github #15777 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15777 [ skip ] completed with state SUCCESS
Skipping testing for commit 362bf19

@kaiyux kaiyux merged commit 1966730 into NVIDIA:main Aug 19, 2025
4 checks passed
@kaiyux kaiyux deleted the fix-wide-ep branch August 19, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants