Skip to content

Conversation

@Tabrizian
Copy link
Member

@Tabrizian Tabrizian commented Jul 17, 2025

Add disagg launcher scripts

Add scripts for launching disaggregated serving jobs on SLURM.

Summary by CodeRabbit

  • New Features

    • Introduced a launcher and profiling toolkit for running disaggregated serving benchmarks on SLURM clusters, including automated and manual parameter sweeps.
    • Added comprehensive documentation and example configuration for setting up and running benchmarks.
    • Provided tools for profiling, parameter sweeping, and multi-configuration testing with flexible resource management.
    • Added job management and parameter sweeping utilities for orchestrating multi-server disaggregated serving workloads on SLURM.
    • Enabled detailed load generation testing and cache statistics collection for performance analysis.
  • Documentation

    • Added a detailed README with setup instructions, usage examples, and configuration schema.
    • Extended main disaggregated serving README with a section pointing to SLURM-specific launch instructions.
  • Chores

    • Removed an obsolete script for populating server URLs in SLURM configurations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 17, 2025

Walkthrough

A new disaggregated serving launcher system for TRT-LLM benchmarks on SLURM was introduced, including documentation, configuration templates, a launcher script, and a modular Python package for job management and parameter sweeping. The previous node URL population script was removed, consolidating all orchestration and configuration logic into the new, integrated launcher and profiler suite.

Changes

File(s) Change Summary
Documentation and Config
examples/disaggregated/slurm/README.md, examples/disaggregated/slurm/config.yaml
Added documentation and example configuration for the disaggregated serving launcher on SLURM.
Profiler Package Initialization
examples/disaggregated/slurm/disagg_profiler/__init__.py
Created package initializer, exposing job management and sweeping APIs.
Job Management
examples/disaggregated/slurm/disagg_profiler/job_manager.py
Added JobManager class and utilities for orchestrating SLURM jobs, server configs, and load generation.
Parameter Sweeping and Profiling
examples/disaggregated/slurm/disagg_profiler/sweeper.py
Added parameter sweeping, auto-profiling, and multi-config support for SLURM jobs.
Launcher Script
examples/disaggregated/slurm/launcher.py
Introduced a CLI launcher for SLURM-based disaggregated serving with sweep and profiling modes.
Removed Legacy Script
examples/disaggregated/slurm/slurm_populate_urls.py
Removed legacy script for populating server URLs from SLURM environment variables.
General Documentation Update
examples/disaggregated/README.md
Extended README with pointer to SLURM-specific disaggregated serving launch instructions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Launcher
    participant Sweeper/JobManager
    participant SLURM
    participant TRTLLM Servers

    User->>Launcher: Run launcher.py with config and sweep options
    Launcher->>SLURM: (Optional) Request allocation
    Launcher->>Sweeper/JobManager: Select sweep or job mode
    Sweeper/JobManager->>SLURM: Launch context/gen/IFB servers via srun
    Sweeper/JobManager->>TRTLLM Servers: Start servers with config
    Sweeper/JobManager->>TRTLLM Servers: Run loadgen tests
    Sweeper/JobManager->>Sweeper/JobManager: Collect results, manage jobs
    Sweeper/JobManager->>SLURM: Release allocation (if needed)
    Sweeper/JobManager->>Launcher: Return results/status
    Launcher->>User: Print summary, exit
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Documentation

Suggested reviewers

  • pcastonguay
  • kaiyux

Poem

In the warren of SLURM, a new path appears,
With configs and launchers, it conquers old fears.
The sweeper hops swiftly, the job manager wise,
Together they orchestrate benchmarks that rise.
Farewell to old scripts—now bunnies can cheer,
For streamlined serving is finally here!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Tabrizian Tabrizian requested a review from pcastonguay July 17, 2025 00:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🔭 Outside diff range comments (1)
examples/disaggregated/slurm/config.yaml (1)

1-37: Fix the configuration structure - profile should not be nested under exec

The profile section (lines 30-37) is incorrectly nested under exec. Based on the code in job_manager.py and launcher.py, the configuration expects profile to be at the root level, not under exec.

Apply this diff to fix the structure:

 exec:
   config:
     context:
       tp: 4
       ep: 4
       pp: 1
       max_batch_size: 4
       max_num_tokens: 1024
       max_seq_len: 1024
       config:
         kv_cache_config:
           free_gpu_memory_fraction: 0.75
           enable_block_reuse: false
         print_iter_log: true
       dp: 1
     generation:
       tp: 4
       ep: 4
       pp: 1
       max_batch_size: 1
       max_num_tokens: 4096
       max_seq_len: 2048
       config:
         print_iter_log: true
         kv_cache_config:
           free_gpu_memory_fraction: 0.75
           enable_block_reuse: false
       dp: 1
   model_path: TinyLlama/TinyLlama-1.1B-Chat-v1.0
 profile:
   isl: 1024
   osl: 1024
   use_benchmark_serving: true
   concurrency:
   - 128
   - 256
🧹 Nitpick comments (9)
examples/disaggregated/slurm/config.yaml (1)

3-15: Improve parameter organization by grouping parallelism parameters together

The dp parameter is placed after the nested config section, which affects readability. Consider grouping all parallelism parameters (tp, ep, pp, dp) together.

Apply this diff for better organization:

   config:
     context:
       tp: 4
       ep: 4
       pp: 1
+      dp: 1
       max_batch_size: 4
       max_num_tokens: 1024
       max_seq_len: 1024
       config:
         kv_cache_config:
           free_gpu_memory_fraction: 0.75
           enable_block_reuse: false
         print_iter_log: true
-      dp: 1
     generation:
       tp: 4
       ep: 4
       pp: 1
+      dp: 1
       max_batch_size: 1
       max_num_tokens: 4096
       max_seq_len: 2048
       config:
         print_iter_log: true
         kv_cache_config:
           free_gpu_memory_fraction: 0.75
           enable_block_reuse: false
-      dp: 1

Also applies to: 16-28

examples/disaggregated/slurm/launcher.py (3)

153-155: Consider extracting config loading to a utility function

Creating a temporary JobManager instance just to load the configuration feels like a workaround. Consider adding a standalone utility function for loading configurations.

Would you like me to create a utility function that loads the configuration without instantiating a JobManager?


90-90: Fix line length violations

These lines exceed the 120 character limit.

For line 90:

-        The goal is to ensure the request processing rates of context and generation servers are balanced.
+        The goal is to ensure the request processing rates of context and generation 
+        servers are balanced.

For line 174:

-            "Choose one of: --auto-sweep, --sweep-config, or define single configurations (not lists) in your config file"
+            "Choose one of: --auto-sweep, --sweep-config, or define single configurations "
+            "(not lists) in your config file"

For line 205:

-                "Multi-config or manual sweep mode without single allocation - allocations will be requested per configuration"
+                "Multi-config or manual sweep mode without single allocation - "
+                "allocations will be requested per configuration"

Also applies to: 174-174, 205-205


247-252: Add clarifying comment for the allocation cleanup condition

The condition for releasing the SLURM allocation is complex. Consider adding a comment to explain the logic.

         # Release the allocation if we requested one and it's not a manual sweep (which handles its own allocations)
+        # We release allocations for:
+        # 1. Single config mode
+        # 2. Auto-sweep mode  
+        # 3. Manual sweep with single allocation
+        # We don't release for manual sweep without single allocation or multi-config (they handle their own)
         if job_id and (not args.sweep_config
                        or args.use_single_allocation) and not is_multi_config:
examples/disaggregated/slurm/disagg_profiler/job_manager.py (2)

27-71: Add missing num_gpus parameter to docstring

The function signature includes num_gpus parameter but it's missing from the docstring.

     Args:
         config (dict): The configuration dictionary
+        num_gpus (int): The number of GPUs per node
 
     Returns:
         int: The number of nodes needed

90-92: Remove or conditionalize debug print statements

Multiple debug print statements are present which should be removed or made conditional.

Consider using Python's logging module instead of print statements for debugging:

import logging

logger = logging.getLogger(__name__)

# Then replace prints with:
logger.debug(f"JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}")

This allows users to control debug output via logging configuration.

Also applies to: 124-127, 140-142

examples/disaggregated/slurm/disagg_profiler/sweeper.py (3)

297-326: Add more robust error handling for file parsing.

The method silently continues if files don't exist or parsing fails. Consider logging warnings for missing expected files.

 benchmark_serving_path = os.path.join(concurrency_path,
                                       "benchmark_serving.log")
 if os.path.exists(benchmark_serving_path):
     with open(benchmark_serving_path, 'r') as f:
         for line in f:
             if "Request throughput (req/s):" in line:
                 rps = float(
                     line.split("Request throughput (req/s):")
                     [1].strip())
                 results[concurrency] = rps
                 print(
                     f"Concurrency {concurrency}: {rps:.3f} Requests/s"
                 )
 else:
     # Look for raw_data.json file
     raw_data_path = os.path.join(concurrency_path,
                                  "raw_data.json")
     if os.path.exists(raw_data_path):
         try:
             with open(raw_data_path, 'r') as f:
                 data = json.load(f)

             if 'results' in data and 'infbench_summary' in data[
                     'results']:
                 rps = data['results']['infbench_summary'][
                     'Requests/s']
                 results[concurrency] = rps
                 print(
                     f"Concurrency {concurrency}: {rps:.3f} Requests/s"
                 )
         except Exception as e:
             print(f"Error parsing {raw_data_path}: {e}")
+    else:
+        print(f"Warning: No results found for concurrency {concurrency} - expected benchmark_serving.log or raw_data.json")

922-922: Line exceeds 120 character limit.

Consider breaking this line for better readability.

-                f"Synchronizing parameter {sync_param}={value} across context and generation servers"
+                f"Synchronizing parameter {sync_param}={value} "
+                f"across context and generation servers"

1226-1226: Line exceeds 120 character limit.

Consider breaking this line for better readability.

-            f"Generating combinations for disaggregated mode: {len(context_configs)} context × {len(generation_configs)} generation"
+            f"Generating combinations for disaggregated mode: "
+            f"{len(context_configs)} context × {len(generation_configs)} generation"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d2b8ba and 290b957.

📒 Files selected for processing (7)
  • examples/disaggregated/slurm/README.md (1 hunks)
  • examples/disaggregated/slurm/config.yaml (1 hunks)
  • examples/disaggregated/slurm/disagg_profiler/__init__.py (1 hunks)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
  • examples/disaggregated/slurm/disagg_profiler/sweeper.py (1 hunks)
  • examples/disaggregated/slurm/launcher.py (1 hunks)
  • examples/disaggregated/slurm/slurm_populate_urls.py (0 hunks)
💤 Files with no reviewable changes (1)
  • examples/disaggregated/slurm/slurm_populate_urls.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
examples/disaggregated/slurm/disagg_profiler/__init__.py (2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
  • JobManager (72-1029)
  • calculate_nodes_needed (27-69)
  • wait_for_server (11-24)
examples/disaggregated/slurm/disagg_profiler/sweeper.py (4)
  • AutoSweeper (150-804)
  • ParameterSweeper (807-1097)
  • get_slurm_allocation (19-80)
  • run_sweep_configuration (83-147)
examples/disaggregated/slurm/disagg_profiler/sweeper.py (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
  • JobManager (72-1029)
  • calculate_nodes_needed (27-69)
  • launch_jobs (503-615)
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/launcher.py

90-90: Line too long (122 > 120)

(E501)


174-174: Line too long (122 > 120)

(E501)


205-205: Line too long (127 > 120)

(E501)

examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


992-992: Do not use bare except

(E722)

examples/disaggregated/slurm/disagg_profiler/sweeper.py

151-159: 1 blank line required between summary line and description

(D205)


922-922: Line too long (130 > 120)

(E501)


1226-1226: Line too long (136 > 120)

(E501)


1278-1279: One-line docstring should fit on one line

Reformat to one line

(D200)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
examples/disaggregated/slurm/config.yaml (1)

1-37: IFB configuration is optional by default

The code only applies InFlight Batching when an ifb section is present under exec.config; if it’s missing, IFB servers default to zero. Since your example doesn’t need IFB, no changes are required—add an ifb: block only when you intend to enable InFlight Batching.

examples/disaggregated/slurm/disagg_profiler/__init__.py (1)

1-17: LGTM!

The package initialization file is well-structured with clear documentation and proper exports.

examples/disaggregated/slurm/launcher.py (1)

11-146: Well-structured argument parser with comprehensive options

The argument parser setup is thorough and provides clear documentation for each option. The detailed help strings for sweep modes are particularly helpful.

"""
# Check if any sweep configuration needs more nodes than the base configuration
base_config = all_configs[0][0] if all_configs else {}
base_nodes = calculate_nodes_needed(base_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Same calculate_nodes_needed signature issue.

The function is called with only the config parameter.

Also applies to: 934-934

🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/sweeper.py at lines 930 and 934,
the function calculate_nodes_needed is called with only the config parameter,
but its signature requires additional arguments. Update these calls to include
all required parameters as per the function definition to match the expected
signature.

tp=tp)

# Calculate nodes needed for this configuration
nodes_needed = calculate_nodes_needed(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Same issue with calculate_nodes_needed function calls.

These calls also pass only the config parameter, but the function expects config and num_gpus.

Also applies to: 513-513

🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/sweeper.py at lines 389 and 513,
the function calculate_nodes_needed is called with only the config parameter,
but it requires both config and num_gpus parameters. Update these calls to pass
the appropriate num_gpus argument along with config to match the function
signature.

Signed-off-by: Iman Tabrizian <[email protected]>
@Tabrizian Tabrizian force-pushed the user/imant/disaggLauncherScripts branch from 290b957 to bcd565d Compare July 17, 2025 03:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (8)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)

11-25: Replace bare except with specific exception handling


380-475: Add bounds checking for server_id to prevent index errors


808-812: Add safe access for potentially missing configuration key


989-998: Replace bare except with specific exception handling

examples/disaggregated/slurm/disagg_profiler/sweeper.py (4)

105-105: Fix missing num_gpus argument in calculate_nodes_needed calls


389-390: Same issue with calculate_nodes_needed function calls

Also applies to: 514-515


871-878: Security risk: Replace eval() with safer alternative


932-936: Same calculate_nodes_needed signature issue

🧹 Nitpick comments (2)
examples/disaggregated/slurm/launcher.py (1)

11-151: Consider breaking long lines for better readability

Some help text lines exceed the 120-character limit. While this doesn't affect functionality, breaking them would improve code readability.

Apply this diff to break the long lines:

         help=
-        """Automatically determine optimal sweep configurations by profiling context and generation servers.
+        """Automatically determine optimal sweep configurations by profiling context and generation
+        servers.

         auto-sweep uses separate SLURM allocations for each configuration, providing better
         resource utilization and fault tolerance compared to a single large allocation.

-        The goal is to ensure the request processing rates of context and generation servers are balanced.
+        The goal is to ensure the request processing rates of context and generation servers
+        are balanced.
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)

76-143: Remove debug print statements from production code

The __init__ method contains several debug print statements that should be removed or converted to proper logging before production deployment.

Consider using Python's logging module instead:

-        print(
-            f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}"
-        )
+        import logging
+        logger = logging.getLogger(__name__)
+        logger.debug(
+            f"JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}"
+        )

Apply similar changes to lines 125-127, 140-142.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 290b957 and bcd565d.

📒 Files selected for processing (8)
  • examples/disaggregated/README.md (1 hunks)
  • examples/disaggregated/slurm/README.md (1 hunks)
  • examples/disaggregated/slurm/config.yaml (1 hunks)
  • examples/disaggregated/slurm/disagg_profiler/__init__.py (1 hunks)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
  • examples/disaggregated/slurm/disagg_profiler/sweeper.py (1 hunks)
  • examples/disaggregated/slurm/launcher.py (1 hunks)
  • examples/disaggregated/slurm/slurm_populate_urls.py (0 hunks)
💤 Files with no reviewable changes (1)
  • examples/disaggregated/slurm/slurm_populate_urls.py
✅ Files skipped from review due to trivial changes (2)
  • examples/disaggregated/README.md
  • examples/disaggregated/slurm/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/disaggregated/slurm/disagg_profiler/init.py
  • examples/disaggregated/slurm/config.yaml
🧰 Additional context used
🧬 Code Graph Analysis (1)
examples/disaggregated/slurm/disagg_profiler/sweeper.py (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
  • JobManager (72-1029)
  • calculate_nodes_needed (27-69)
  • launch_jobs (503-615)
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/launcher.py

96-96: Line too long (122 > 120)

(E501)


180-180: Line too long (122 > 120)

(E501)


211-211: Line too long (127 > 120)

(E501)

examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


992-992: Do not use bare except

(E722)

examples/disaggregated/slurm/disagg_profiler/sweeper.py

151-159: 1 blank line required between summary line and description

(D205)


924-924: Line too long (130 > 120)

(E501)


1228-1228: Line too long (136 > 120)

(E501)


1280-1281: One-line docstring should fit on one line

Reformat to one line

(D200)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
examples/disaggregated/slurm/launcher.py (1)

154-262: Well-structured main function with proper error handling

The main function effectively orchestrates different sweep modes with appropriate validation, allocation management, and cleanup logic. The mode detection and mutual exclusion check prevent conflicting configurations.

examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)

910-930: Good implementation of dataset preparation

The prepare_dataset method properly checks for existing dataset files before preparation and uses the configurable prepare_dataset_script parameter from args, which addresses the previous concern about hardcoded paths.

examples/disaggregated/slurm/disagg_profiler/sweeper.py (2)

19-82: Robust SLURM allocation request implementation

The function properly handles subprocess communication with real-time output streaming and regex-based job ID extraction. Good error handling ensures failures are caught and reported.


1122-1327: Well-designed multi-configuration sweeper

The MultiConfigSweeper class effectively handles multiple server configurations with proper detection of multi-config format and exhaustive combination generation. The implementation cleanly separates disaggregated mode (context + generation) from IFB mode.

@pcastonguay
Copy link
Collaborator

@Tabrizian with the changes in #6054, I'm concerned that there will be two different separate sets of scripts for doing the same thing in examples/disaggregated/slurm. Ideally we would only keep one. Does your launcher.py script covers all existing functionalities in existing slurm folder? Can we unify somehow?

Signed-off-by: Iman Tabrizian <[email protected]>
@Tabrizian Tabrizian requested a review from a team as a code owner August 6, 2025 23:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)

20-20: Replace bare except with specific exception handling.

The bare except clause should be more specific to avoid catching unexpected exceptions.


380-475: Add bounds checking for server_id to prevent index errors.

The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames.


997-997: Replace bare except with specific exception handling.

The bare except clause should catch specific exceptions.

🧹 Nitpick comments (4)
examples/disaggregated/slurm/launcher.py (3)

100-100: Fix line length violation.

Line exceeds the 120 character limit.

-                        "The goal is to ensure the request processing rates of context and generation servers are balanced."
+                        "The goal is to ensure the request processing rates of context and generation "
+                        "servers are balanced."

184-184: Fix line length violation.

Line exceeds the 120 character limit.

-            "Choose one of: --auto-sweep, --sweep-config, or define single configurations (not lists) in your config file"
+            "Choose one of: --auto-sweep, --sweep-config, or define single configurations "
+            "(not lists) in your config file"

214-216: Fix line length violation.

Line exceeds the 120 character limit.

-            print(
-                "Multi-config or manual sweep mode without single allocation - allocations will be requested per configuration"
-            )
+            print(
+                "Multi-config or manual sweep mode without single allocation - "
+                "allocations will be requested per configuration"
+            )
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)

89-92: Fix line length violation.

Line exceeds the 120 character limit.

         print(
-            f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}"
+            f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, "
+            f"override_job_id={override_job_id}"
         )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcd565d and 90114eb.

📒 Files selected for processing (2)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
  • examples/disaggregated/slurm/launcher.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
  • examples/disaggregated/slurm/launcher.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
  • examples/disaggregated/slurm/launcher.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


993-993: Do not use bare except

(E722)

examples/disaggregated/slurm/launcher.py

100-100: Line too long (122 > 120)

(E501)


184-184: Line too long (122 > 120)

(E501)


215-215: Line too long (127 > 120)

(E501)

🔇 Additional comments (5)
examples/disaggregated/slurm/launcher.py (1)

158-266: Well-structured mode handling and resource management.

The main function effectively handles multiple execution modes (auto-sweep, multi-config, manual sweep, single config) with proper validation to prevent conflicting modes. The SLURM allocation management and cleanup logic is comprehensive and handles different scenarios appropriately.

examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)

76-142: Well-structured initialization with proper parameter handling.

The JobManager initialization properly handles different modes (load_config_only), parameter overrides, and includes appropriate debug logging for troubleshooting.


144-379: Comprehensive configuration management.

The configuration methods handle different server types (context, generation, IFB) consistently and properly save configurations both for execution and reference. The output folder creation logic accounts for various configuration parameters to create meaningful directory structures.


476-849: Robust server launching implementation.

The server launching methods provide comprehensive support for different server types with proper SLURM integration, container execution, and logging. The command building logic correctly handles various configuration parameters and resource allocation.


850-1035: Comprehensive load testing and resource management.

The implementation provides robust load testing capabilities with support for multiple benchmarking modes (trtllm_bench, benchmark_serving). The context manager ensures proper cleanup of resources, and the configuration saving logic maintains good traceability of experimental setups.

Comment on lines +1 to +10
import json
import os
import subprocess
import time
import uuid

import requests
import yaml


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import json
import os
import subprocess
import time
import uuid
import requests
import yaml
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os
import subprocess
import time
import uuid
import requests
import yaml
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/job_manager.py at the top of the
file (lines 1 to 10), add the required NVIDIA copyright header including the
current year as per the coding guidelines. Insert the standard NVIDIA copyright
notice as the first lines before any imports or code.

Comment on lines +1 to +10
import argparse
import os
import subprocess
import sys

from disagg_profiler.job_manager import JobManager, calculate_nodes_needed
from disagg_profiler.sweeper import (AutoSweeper, MultiConfigSweeper,
ParameterSweeper, get_slurm_allocation)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import argparse
 import os
 import subprocess
 import sys
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import argparse
import os
import subprocess
import sys
from disagg_profiler.job_manager import JobManager, calculate_nodes_needed
from disagg_profiler.sweeper import (AutoSweeper, MultiConfigSweeper,
ParameterSweeper, get_slurm_allocation)
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
import subprocess
import sys
from disagg_profiler.job_manager import JobManager, calculate_nodes_needed
from disagg_profiler.sweeper import (AutoSweeper, MultiConfigSweeper,
ParameterSweeper, get_slurm_allocation)
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/launcher.py at the top of the file (lines 1 to
10), add the required NVIDIA copyright header including the current year as per
coding guidelines. Insert the standard NVIDIA copyright notice as the first
lines before any imports or code.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)

1044-1046: Add validation for sufficient node allocation

After getting node hostnames, validate that enough nodes are allocated for the configuration requirements.

         # Get hostnames of available nodes
         self.node_hostnames = self.get_node_hostnames()
+        
+        # Validate sufficient nodes are allocated
+        nodes_needed = calculate_nodes_needed(self.config, self.num_gpus)
+        if len(self.node_hostnames) < nodes_needed:
+            raise ValueError(
+                f"Insufficient nodes allocated. Need {nodes_needed} nodes "
+                f"but only {len(self.node_hostnames)} available"
+            )
♻️ Duplicate comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)

1-10: Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid

11-25: Replace bare except with specific exception handling

The bare except clause should be more specific to avoid catching unexpected exceptions.

 def wait_for_server(host, port, max_retries=1000, delay=5):
     """Wait for a server to become available."""
     for _ in range(max_retries):
         try:
             print(f"Checking server status at {host}:{port}")
             state = requests.get(f"http://{host}:{port}/health")
             if state.status_code == 200:
                 print("Server is running")
                 return True
-        except:
+        except (requests.exceptions.RequestException, ConnectionError):
             time.sleep(delay)
 
     print(f"Server did not start after {max_retries} attempts")
     return False

380-475: Add bounds checking for server_id to prevent index errors

The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames. This could cause an IndexError.

Add bounds checking before accessing self.node_hostnames[server_id]:

             for i in range(self.config['exec']['config']['context']['tp']):
                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.")
                     print(f'Adding {self.node_hostnames[server_id]}')
                     server_ids.append(self.node_hostnames[server_id])

Apply similar checks in the generation and IFB server loops.


1001-1010: Replace bare except with specific exception handling

The bare except clause should catch specific exceptions.

     def terminate_jobs(self, jobs):
         """Terminate all running jobs."""
         for job in jobs:
             if job and job.poll() is None:
                 try:
                     job.terminate()
                     job.wait(timeout=5)
-                except:
+                except (subprocess.TimeoutExpired, OSError):
                     job.kill()
🧹 Nitpick comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)

89-92: Break long debug line for better readability

Line 91 exceeds the 120 character limit.

         print(
-            f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}"
+            f"DEBUG JobManager: __init__ called with "
+            f"load_config_only={load_config_only}, override_job_id={override_job_id}"
         )

144-201: Consider extracting folder naming logic for better maintainability

The create_output_folder method contains complex folder naming logic that could be extracted into a separate helper method for better readability and maintainability.

Consider refactoring:

def _generate_folder_suffix(self, config):
    """Generate folder suffix based on configuration options."""
    suffix = ""
    for server_type in ['generation', 'ifb']:
        if server_type in config['exec']['config']:
            # ... existing logic for suffix generation
    return suffix

def create_output_folder(self):
    # ... existing model name logic
    # ... existing base folder creation
    suffix = self._generate_folder_suffix(self.config)
    output_folder += suffix
    # ... rest of the method

659-664: Break long nsys_prefix line for better readability

Line 663 exceeds the 120 character limit significantly (222 chars).

-            nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none"
+            nsys_prefix = (
+                f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" "
+                f"-o {nsys_file} -f true -t cuda,nvtx,python-gil "
+                f"-c cudaProfilerApi --cuda-graph-trace node "
+                f"--capture-range-end=stop --gpu-metrics-devices=none"
+            )

507-566: Consider optimizing server startup with parallel launching or adaptive waiting

The current implementation uses fixed sleep delays (15 seconds per server, 30 seconds for disaggregated server) which can significantly slow down the startup process. Consider:

  1. Parallel launching: Launch all servers of the same type simultaneously
  2. Adaptive waiting: Use wait_for_server immediately after launching instead of fixed delays
  3. Configurable timeouts: Make wait times configurable based on cluster characteristics

Example optimization:

# Launch all servers in parallel
for node in distribution:
    # Launch without intermediate sleep
    if node["is_ifb"]:
        self.ifb_jobs.extend(self.launch_ifb_server(...))
    # ... other server types

# Then wait for all servers with adaptive timing
for node in distribution:
    if not wait_for_server(node["hostnames"][0], node["port"], max_retries=100, delay=2):
        raise RuntimeError(f"Server {node['hostnames'][0]}:{node['port']} failed to start")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90114eb and 8db629e.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


659-659: Line too long (222 > 120)

(E501)


1004-1004: Do not use bare except

(E722)

🔇 Additional comments (5)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (5)

27-71: LGTM!

The calculate_nodes_needed function correctly calculates the total number of nodes required based on the GPU requirements of context, generation, and IFB servers.


818-824: Good practice: Safe access to configuration key

The code correctly uses .get() method with a default value to safely access the potentially missing 'gen_only' configuration key, preventing KeyError exceptions.


922-943: LGTM! Dataset preparation with configurable script path

The method correctly:

  1. Checks if the dataset already exists to avoid redundant generation
  2. Uses a configurable script path via self.args.prepare_dataset_script (addressing the previous hardcoded path issue)
  3. Properly constructs the command for dataset generation

348-379: Well-designed fallback mechanism for node hostname resolution

The get_node_hostnames method properly:

  1. Checks for SLURM environment variables
  2. Attempts to retrieve hostnames via scontrol
  3. Falls back gracefully to localhost for non-SLURM environments

This design ensures the code works both in SLURM clusters and local development environments.


944-978: LGTM! Robust configuration handling with proper defaults

The method demonstrates good practices:

  • Safe dictionary access with fallback values for all configuration parameters
  • Proper dataset file path construction
  • Clear command construction with all necessary parameters

Comment on lines +905 to +921
def record_kv_cache_stats(self, generation_server_urls, output_folder):
"""Record the kv cache stats for a specific concurrency level."""
print(
f"DEBUG JobManager: Recording kv cache stats for generation servers"
)
# Get the kv cache stats
# Send a request to "/metrics" and store the output in a file
for url in generation_server_urls:
response = requests.get(f"http://{url}/metrics")
with open(
os.path.join(output_folder,
f"{url.split(':')[0]}_kv_cache_stats.txt"),
"w") as f:
f.write(response.text)
print(
f"DEBUG JobManager: Recorded kv cache stats for generation servers")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for metrics collection

The method should handle potential network failures when collecting metrics from generation servers.

     def record_kv_cache_stats(self, generation_server_urls, output_folder):
         """Record the kv cache stats for a specific concurrency level."""
         print(
             f"DEBUG JobManager: Recording kv cache stats for generation servers"
         )
         # Get the kv cache stats
         # Send a request to "/metrics" and store the output in a file
         for url in generation_server_urls:
-            response = requests.get(f"http://{url}/metrics")
-            with open(
-                    os.path.join(output_folder,
-                                 f"{url.split(':')[0]}_kv_cache_stats.txt"),
-                    "w") as f:
-                f.write(response.text)
+            try:
+                response = requests.get(f"http://{url}/metrics", timeout=30)
+                response.raise_for_status()
+                with open(
+                        os.path.join(output_folder,
+                                     f"{url.split(':')[0]}_kv_cache_stats.txt"),
+                        "w") as f:
+                    f.write(response.text)
+            except (requests.exceptions.RequestException, IOError) as e:
+                print(f"Warning: Failed to collect metrics from {url}: {e}")
         print(
             f"DEBUG JobManager: Recorded kv cache stats for generation servers")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def record_kv_cache_stats(self, generation_server_urls, output_folder):
"""Record the kv cache stats for a specific concurrency level."""
print(
f"DEBUG JobManager: Recording kv cache stats for generation servers"
)
# Get the kv cache stats
# Send a request to "/metrics" and store the output in a file
for url in generation_server_urls:
response = requests.get(f"http://{url}/metrics")
with open(
os.path.join(output_folder,
f"{url.split(':')[0]}_kv_cache_stats.txt"),
"w") as f:
f.write(response.text)
print(
f"DEBUG JobManager: Recorded kv cache stats for generation servers")
def record_kv_cache_stats(self, generation_server_urls, output_folder):
"""Record the kv cache stats for a specific concurrency level."""
print(
f"DEBUG JobManager: Recording kv cache stats for generation servers"
)
# Get the kv cache stats
# Send a request to "/metrics" and store the output in a file
for url in generation_server_urls:
try:
response = requests.get(f"http://{url}/metrics", timeout=30)
response.raise_for_status()
with open(
os.path.join(output_folder,
f"{url.split(':')[0]}_kv_cache_stats.txt"),
"w") as f:
f.write(response.text)
except (requests.exceptions.RequestException, IOError) as e:
print(f"Warning: Failed to collect metrics from {url}: {e}")
print(
f"DEBUG JobManager: Recorded kv cache stats for generation servers"
)
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/job_manager.py around lines 905
to 921, the record_kv_cache_stats method lacks error handling for network
requests to generation servers. Add try-except blocks around the requests.get
call to catch exceptions like connection errors or timeouts. Log or print an
error message when a request fails, and ensure the method continues processing
other URLs without crashing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (5)

1-10: Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the beginning of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid

11-25: Replace bare except with specific exception handling

The bare except clause should be more specific to avoid catching unexpected exceptions.

 def wait_for_server(host, port, max_retries=1000, delay=5):
     """Wait for a server to become available."""
     for _ in range(max_retries):
         try:
             print(f"Checking server status at {host}:{port}")
             state = requests.get(f"http://{host}:{port}/health")
             if state.status_code == 200:
                 print("Server is running")
                 return True
-        except:
+        except (requests.exceptions.RequestException, ConnectionError):
             time.sleep(delay)
 
     print(f"Server did not start after {max_retries} attempts")
     return False

911-927: Add error handling for metrics collection

The method should handle potential network failures when collecting metrics from generation servers.

     def record_kv_cache_stats(self, generation_server_urls, output_folder):
         """Record the kv cache stats for a specific concurrency level."""
         print(
             f"DEBUG JobManager: Recording kv cache stats for generation servers"
         )
         # Get the kv cache stats
         # Send a request to "/metrics" and store the output in a file
         for url in generation_server_urls:
-            response = requests.get(f"http://{url}/metrics")
-            with open(
-                    os.path.join(output_folder,
-                                 f"{url.split(':')[0]}_kv_cache_stats.txt"),
-                    "w") as f:
-                f.write(response.text)
+            try:
+                response = requests.get(f"http://{url}/metrics", timeout=30)
+                response.raise_for_status()
+                with open(
+                        os.path.join(output_folder,
+                                     f"{url.split(':')[0]}_kv_cache_stats.txt"),
+                        "w") as f:
+                    f.write(response.text)
+            except (requests.exceptions.RequestException, IOError) as e:
+                print(f"Warning: Failed to collect metrics from {url}: {e}")
         print(
             f"DEBUG JobManager: Recorded kv cache stats for generation servers")

1007-1016: Replace bare except with specific exception handling

The bare except clause should catch specific exceptions.

     def terminate_jobs(self, jobs):
         """Terminate all running jobs."""
         for job in jobs:
             if job and job.poll() is None:
                 try:
                     job.terminate()
                     job.wait(timeout=5)
-                except:
+                except (subprocess.TimeoutExpired, OSError):
                     job.kill()

380-481: Add bounds checking for server_id to prevent index errors

The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames. This could cause an IndexError.

Add bounds checking before accessing self.node_hostnames[server_id]:

             for i in range(self.config['exec']['config']['context']['tp']):
                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes for context servers.")
                     print(f'Adding {self.node_hostnames[server_id]}')
                     server_ids.append(self.node_hostnames[server_id])

Apply similar checks in the generation server loop (lines 433-436) and IFB server loop (lines 461-464):

                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes for generation servers.")
                     server_ids.append(self.node_hostnames[server_id])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8db629e and 42993fc.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


665-665: Line too long (222 > 120)

(E501)


1010-1010: Do not use bare except

(E722)

🔇 Additional comments (2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (2)

27-71: LGTM!

The node calculation logic is well-implemented with proper rounding and minimum node guarantees.


73-1052: No duplicate SLURM logic found – consolidation verified
I inspected launcher.py and sweeper.py; both delegate all SLURM job orchestration to JobManager and merely handle argument parsing and sweep loops. There’s no overlapping or redundant launch logic elsewhere in examples/disaggregated/slurm.

Comment on lines +868 to +910
def run_loadgen_tests(self, hostname, port, generation_server_urls):
"""Run loadgen tests on the master node."""
print(
f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
)
output_folder = self.output_folder

# Run loadgen tests for each concurrency level
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.prepare_dataset(output_folder)

concurrency_levels = self.config['profile']['concurrency']
print(
f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}"
)

for concurrency in concurrency_levels:
print(
f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}"
)
concurrency_output_folder = f"{output_folder}/{concurrency}"
if not os.path.exists(concurrency_output_folder):
os.makedirs(concurrency_output_folder)

# Build and run the loadgen command
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.run_trtllm_bench_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
elif 'use_benchmark_serving' in self.config[
'profile'] and self.config['profile'][
'use_benchmark_serving']:
pid = self.run_benchmark_serving_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
pid.wait()
print(
f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}"
)

print("DEBUG JobManager: run_loadgen_tests completed")
return pid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential undefined variable error

The variable pid is returned at line 909 but is only assigned conditionally. If neither use_trtllm_bench nor use_benchmark_serving is true, pid will be undefined.

Initialize pid at the beginning of the method or handle the case when neither condition is met:

     def run_loadgen_tests(self, hostname, port, generation_server_urls):
         """Run loadgen tests on the master node."""
         print(
             f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
         )
         output_folder = self.output_folder
+        pid = None
 
         # Run loadgen tests for each concurrency level
         if 'use_trtllm_bench' in self.config['profile'] and self.config[
                 'profile']['use_trtllm_bench']:
             self.prepare_dataset(output_folder)
 
         concurrency_levels = self.config['profile']['concurrency']
         print(
             f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}"
         )
 
         for concurrency in concurrency_levels:
             print(
                 f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}"
             )
             concurrency_output_folder = f"{output_folder}/{concurrency}"
             if not os.path.exists(concurrency_output_folder):
                 os.makedirs(concurrency_output_folder)
 
             # Build and run the loadgen command
             if 'use_trtllm_bench' in self.config['profile'] and self.config[
                     'profile']['use_trtllm_bench']:
                 self.run_trtllm_bench_for_concurrency(
                     hostname, port, concurrency, concurrency_output_folder)
             elif 'use_benchmark_serving' in self.config[
                     'profile'] and self.config['profile'][
                         'use_benchmark_serving']:
                 pid = self.run_benchmark_serving_for_concurrency(
                     hostname, port, concurrency, concurrency_output_folder)
                 pid.wait()
+            else:
+                print("Warning: No benchmarking method configured")
             print(
                 f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}"
             )
 
         print("DEBUG JobManager: run_loadgen_tests completed")
         return pid
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def run_loadgen_tests(self, hostname, port, generation_server_urls):
"""Run loadgen tests on the master node."""
print(
f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
)
output_folder = self.output_folder
# Run loadgen tests for each concurrency level
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.prepare_dataset(output_folder)
concurrency_levels = self.config['profile']['concurrency']
print(
f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}"
)
for concurrency in concurrency_levels:
print(
f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}"
)
concurrency_output_folder = f"{output_folder}/{concurrency}"
if not os.path.exists(concurrency_output_folder):
os.makedirs(concurrency_output_folder)
# Build and run the loadgen command
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.run_trtllm_bench_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
elif 'use_benchmark_serving' in self.config[
'profile'] and self.config['profile'][
'use_benchmark_serving']:
pid = self.run_benchmark_serving_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
pid.wait()
print(
f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}"
)
print("DEBUG JobManager: run_loadgen_tests completed")
return pid
def run_loadgen_tests(self, hostname, port, generation_server_urls):
"""Run loadgen tests on the master node."""
print(
f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
)
output_folder = self.output_folder
pid = None
# Run loadgen tests for each concurrency level
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.prepare_dataset(output_folder)
concurrency_levels = self.config['profile']['concurrency']
print(
f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}"
)
for concurrency in concurrency_levels:
print(
f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}"
)
concurrency_output_folder = f"{output_folder}/{concurrency}"
if not os.path.exists(concurrency_output_folder):
os.makedirs(concurrency_output_folder)
# Build and run the loadgen command
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.run_trtllm_bench_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
elif 'use_benchmark_serving' in self.config['profile'] and self.config[
'profile']['use_benchmark_serving']:
pid = self.run_benchmark_serving_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
pid.wait()
else:
print("Warning: No benchmarking method configured")
print(
f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}"
)
print("DEBUG JobManager: run_loadgen_tests completed")
return pid
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/job_manager.py around lines 868
to 910, the variable pid is returned but only assigned inside a conditional
block, which can lead to an undefined variable error if neither condition is
met. To fix this, initialize pid to None at the start of the method and ensure
it is assigned appropriately within the conditionals, or add a fallback
assignment or error handling for the case when neither condition is true before
returning pid.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)

1-10: Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid

11-25: Replace bare except with specific exception handling.

The bare except clause should be more specific to avoid catching unexpected exceptions.

 def wait_for_server(host, port, max_retries=1000, delay=5):
     """Wait for a server to become available."""
     for _ in range(max_retries):
         try:
             print(f"Checking server status at {host}:{port}")
             state = requests.get(f"http://{host}:{port}/health")
             if state.status_code == 200:
                 print("Server is running")
                 return True
-        except:
+        except (requests.exceptions.RequestException, ConnectionError):
             time.sleep(delay)
 
     print(f"Server did not start after {max_retries} attempts")
     return False

382-483: Add bounds checking for server_id to prevent index errors.

The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames. This could cause an IndexError.

Add bounds checking before accessing self.node_hostnames[server_id]:

             for i in range(self.config['exec']['config']['context']['tp']):
                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.")
                     print(f'Adding {self.node_hostnames[server_id]}')
                     server_ids.append(self.node_hostnames[server_id])

Apply similar checks in the generation and IFB server loops at lines 436-438 and 464-466.


985-1027: Fix potential undefined variable error.

The variable pid is returned at line 1026 but is only assigned conditionally. If neither use_trtllm_bench nor use_benchmark_serving is true, pid will be undefined.

Initialize pid at the beginning of the method:

     def run_loadgen_tests(self, hostname, port, generation_server_urls):
         """Run loadgen tests on the master node."""
         print(
             f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
         )
         output_folder = self.output_folder
+        pid = None
 
         # Run loadgen tests for each concurrency level

Also consider adding error handling if neither benchmarking method is configured:

             elif 'use_benchmark_serving' in self.config[
                     'profile'] and self.config['profile'][
                         'use_benchmark_serving']:
                 pid = self.run_benchmark_serving_for_concurrency(
                     hostname, port, concurrency, concurrency_output_folder)
                 pid.wait()
+            else:
+                print("Warning: No benchmarking method configured")

1028-1044: Add error handling for metrics collection.

The method should handle potential network failures when collecting metrics from generation servers.

     def record_kv_cache_stats(self, generation_server_urls, output_folder):
         """Record the kv cache stats for a specific concurrency level."""
         print(
             f"DEBUG JobManager: Recording kv cache stats for generation servers"
         )
         # Get the kv cache stats
         # Send a request to "/metrics" and store the output in a file
         for url in generation_server_urls:
-            response = requests.get(f"http://{url}/metrics")
-            with open(
-                    os.path.join(output_folder,
-                                 f"{url.split(':')[0]}_kv_cache_stats.txt"),
-                    "w") as f:
-                f.write(response.text)
+            try:
+                response = requests.get(f"http://{url}/metrics", timeout=30)
+                response.raise_for_status()
+                with open(
+                        os.path.join(output_folder,
+                                     f"{url.split(':')[0]}_kv_cache_stats.txt"),
+                        "w") as f:
+                    f.write(response.text)
+            except (requests.exceptions.RequestException, IOError) as e:
+                print(f"Warning: Failed to collect metrics from {url}: {e}")
         print(
             f"DEBUG JobManager: Recorded kv cache stats for generation servers")

1124-1137: Replace bare except with specific exception handling.

The bare except clause should catch specific exceptions.

     def terminate_jobs(self, jobs):
         """Terminate all running jobs."""
         # First, stop any nsys sessions for jobs that have nsys enabled
         self.stop_nsys_sessions()
         
         # Then terminate the jobs
         for job in jobs:
             if job and job.poll() is None:
                 try:
                     job.terminate()
                     job.wait(timeout=5)
-                except:
+                except (subprocess.TimeoutExpired, OSError):
                     job.kill()
🧹 Nitpick comments (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)

630-708: Well-implemented nsys session management with minor formatting issue.

The method properly handles nsys session cleanup with good error handling and organization by hostname.

Remove trailing whitespace on line 708:

-        print("Finished stopping nsys sessions.")
-
+        print("Finished stopping nsys sessions.")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42993fc and 9991dce.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


678-678: Line too long (148 > 120)

(E501)


708-708: Blank line contains whitespace

Remove whitespace from blank line

(W293)


754-754: Line too long (247 > 120)

(E501)


1131-1131: Do not use bare except

(E722)

🔇 Additional comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)

27-71: LGTM!

The function correctly calculates the number of nodes needed based on tensor and data parallelism configurations.


943-945: Good use of safe dictionary access.

The code now correctly uses .get() with a default value to safely access the gen_only configuration key, preventing potential KeyError.


1045-1066: Good implementation of dataset preparation.

The method properly uses a configurable script path from command-line arguments and includes optimization to skip preparation if the dataset already exists.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)

1-10: Add required NVIDIA copyright header

All TensorRT-LLM source files must include an NVIDIA copyright header with the current year.

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid

11-25: Replace bare except with specific exception handling

Catch specific exceptions to avoid masking unexpected errors.

 def wait_for_server(host, port, max_retries=1000, delay=5):
     """Wait for a server to become available."""
     for _ in range(max_retries):
         try:
             print(f"Checking server status at {host}:{port}")
             state = requests.get(f"http://{host}:{port}/health")
             if state.status_code == 200:
                 print("Server is running")
                 return True
-        except:
+        except (requests.exceptions.RequestException, ConnectionError):
             time.sleep(delay)
 
     print(f"Server did not start after {max_retries} attempts")
     return False

382-483: Add bounds checking for server_id to prevent IndexError

The server_id variable is incremented and used to index self.node_hostnames without verifying it's within bounds, which could cause an IndexError if insufficient nodes are allocated.

Add bounds checking in the context, generation, and IFB server loops:

             for i in range(self.config['exec']['config']['context']['tp']):
                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes for context servers.")
                     print(f'Adding {self.node_hostnames[server_id]}')
                     server_ids.append(self.node_hostnames[server_id])

Apply similar checks at lines 436-437 and 464-465 for generation and IFB servers.


985-1027: Fix potential undefined variable error

The variable pid is returned but only assigned conditionally, leading to an undefined variable error if neither benchmarking method is configured.

     def run_loadgen_tests(self, hostname, port, generation_server_urls):
         """Run loadgen tests on the master node."""
         print(
             f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
         )
         output_folder = self.output_folder
+        pid = None
 
         # Run loadgen tests for each concurrency level
         if 'use_trtllm_bench' in self.config['profile'] and self.config[
                 'profile']['use_trtllm_bench']:
             self.prepare_dataset(output_folder)

1028-1044: Add error handling for metrics collection

The method should handle potential network failures gracefully when collecting metrics.

     def record_kv_cache_stats(self, generation_server_urls, output_folder):
         """Record the kv cache stats for a specific concurrency level."""
         print(
             f"DEBUG JobManager: Recording kv cache stats for generation servers"
         )
         # Get the kv cache stats
         # Send a request to "/metrics" and store the output in a file
         for url in generation_server_urls:
-            response = requests.get(f"http://{url}/metrics")
-            with open(
-                    os.path.join(output_folder,
-                                 f"{url.split(':')[0]}_kv_cache_stats.txt"),
-                    "w") as f:
-                f.write(response.text)
+            try:
+                response = requests.get(f"http://{url}/metrics", timeout=30)
+                response.raise_for_status()
+                with open(
+                        os.path.join(output_folder,
+                                     f"{url.split(':')[0]}_kv_cache_stats.txt"),
+                        "w") as f:
+                    f.write(response.text)
+            except (requests.exceptions.RequestException, IOError) as e:
+                print(f"Warning: Failed to collect metrics from {url}: {e}")
         print(
             f"DEBUG JobManager: Recorded kv cache stats for generation servers")

1124-1137: Replace bare except with specific exception handling

Catch specific exceptions to improve error handling clarity.

     def terminate_jobs(self, jobs):
         """Terminate all running jobs."""
         # First, stop any nsys sessions for jobs that have nsys enabled
         self.stop_nsys_sessions()
         
         # Then terminate the jobs
         for job in jobs:
             if job and job.poll() is None:
                 try:
                     job.terminate()
                     job.wait(timeout=5)
-                except:
+                except (subprocess.TimeoutExpired, OSError):
                     job.kill()
🧹 Nitpick comments (2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (2)

630-708: Remove trailing whitespace

Line 708 contains trailing whitespace that should be removed.

-        
+
         print("Finished stopping nsys sessions.")

Otherwise, the nsys session management implementation looks comprehensive with proper error handling.


754-764: Break up long nsys command line for better readability

Line 758 exceeds 120 characters and should be split for better readability.

-            nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none --session={session_name}"
+            nsys_prefix = (
+                f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" "
+                f"-o {nsys_file} -f true -t cuda,nvtx,python-gil "
+                f"-c cudaProfilerApi --cuda-graph-trace node "
+                f"--capture-range-end=stop --gpu-metrics-devices=none "
+                f"--session={session_name}"
+            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9991dce and 21e57c7.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


678-678: Line too long (148 > 120)

(E501)


708-708: Blank line contains whitespace

Remove whitespace from blank line

(W293)


754-754: Line too long (247 > 120)

(E501)


1131-1131: Do not use bare except

(E722)

🔇 Additional comments (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)

27-71: LGTM!

The node calculation logic correctly handles all server types and ensures proper rounding.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)

1-10: Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid

 import requests
 import yaml

11-25: Replace bare except with specific exception handling

The bare except clause should be more specific to avoid catching unexpected exceptions.

 def wait_for_server(host, port, max_retries=1000, delay=5):
     """Wait for a server to become available."""
     for _ in range(max_retries):
         try:
             print(f"Checking server status at {host}:{port}")
             state = requests.get(f"http://{host}:{port}/health")
             if state.status_code == 200:
                 print("Server is running")
                 return True
-        except:
+        except (requests.exceptions.RequestException, ConnectionError):
             time.sleep(delay)
 
     print(f"Server did not start after {max_retries} attempts")
     return False

382-483: Add bounds checking for server_id to prevent index errors

The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames. This could cause an IndexError.

Add bounds checking before accessing self.node_hostnames[server_id]:

             for i in range(self.config['exec']['config']['context']['tp']):
                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.")
                     print(f'Adding {self.node_hostnames[server_id]}')
                     server_ids.append(self.node_hostnames[server_id])

Apply similar checks at lines 437 and 465 for generation and IFB server loops.


1044-1086: Fix potential undefined variable error

The variable pid is returned at line 1085 but is only assigned conditionally. If neither use_trtllm_bench nor use_benchmark_serving is true, pid will be undefined.

Initialize pid at the beginning of the method:

     def run_loadgen_tests(self, hostname, port, generation_server_urls):
         """Run loadgen tests on the master node."""
         print(
             f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
         )
         output_folder = self.output_folder
+        pid = None

         # Run loadgen tests for each concurrency level
         if 'use_trtllm_bench' in self.config['profile'] and self.config[
                 'profile']['use_trtllm_bench']:
             self.prepare_dataset(output_folder)

         concurrency_levels = self.config['profile']['concurrency']
         print(
             f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}"
         )

         for concurrency in concurrency_levels:
             print(
                 f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}"
             )
             concurrency_output_folder = f"{output_folder}/{concurrency}"
             if not os.path.exists(concurrency_output_folder):
                 os.makedirs(concurrency_output_folder)

             # Build and run the loadgen command
             if 'use_trtllm_bench' in self.config['profile'] and self.config[
                     'profile']['use_trtllm_bench']:
                 self.run_trtllm_bench_for_concurrency(
                     hostname, port, concurrency, concurrency_output_folder)
             elif 'use_benchmark_serving' in self.config[
                     'profile'] and self.config['profile'][
                         'use_benchmark_serving']:
                 pid = self.run_benchmark_serving_for_concurrency(
                     hostname, port, concurrency, concurrency_output_folder)
                 pid.wait()
+            else:
+                print("Warning: No benchmarking method configured")
             print(
                 f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}"
             )

         print("DEBUG JobManager: run_loadgen_tests completed")
         return pid

1087-1103: Add error handling for metrics collection

The method should handle potential network failures when collecting metrics from generation servers.

     def record_kv_cache_stats(self, generation_server_urls, output_folder):
         """Record the kv cache stats for a specific concurrency level."""
         print(
             f"DEBUG JobManager: Recording kv cache stats for generation servers"
         )
         # Get the kv cache stats
         # Send a request to "/metrics" and store the output in a file
         for url in generation_server_urls:
-            response = requests.get(f"http://{url}/metrics")
-            with open(
-                    os.path.join(output_folder,
-                                 f"{url.split(':')[0]}_kv_cache_stats.txt"),
-                    "w") as f:
-                f.write(response.text)
+            try:
+                response = requests.get(f"http://{url}/metrics", timeout=30)
+                response.raise_for_status()
+                with open(
+                        os.path.join(output_folder,
+                                     f"{url.split(':')[0]}_kv_cache_stats.txt"),
+                        "w") as f:
+                    f.write(response.text)
+            except (requests.exceptions.RequestException, IOError) as e:
+                print(f"Warning: Failed to collect metrics from {url}: {e}")
         print(
             f"DEBUG JobManager: Recorded kv cache stats for generation servers")

1183-1196: Replace bare except with specific exception handling

The bare except clause should catch specific exceptions.

     def terminate_jobs(self, jobs):
         """Terminate all running jobs."""
         # First, stop any nsys sessions for jobs that have nsys enabled
         self.stop_nsys_sessions()
         
         # Then terminate the jobs
         for job in jobs:
             if job and job.poll() is None:
                 try:
                     job.terminate()
                     job.wait(timeout=5)
-                except:
+                except (subprocess.TimeoutExpired, OSError):
                     job.kill()
🧹 Nitpick comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)

89-92: Fix line length issue

Line 89 exceeds the 120 character limit.

Break the debug print statement into multiple lines:

         print(
-            f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}"
+            f"DEBUG JobManager: __init__ called with "
+            f"load_config_only={load_config_only}, override_job_id={override_job_id}"
         )

802-812: Break up long nsys_prefix line for better readability

Line 806 significantly exceeds the 120 character limit (222 characters).

Split the nsys_prefix string across multiple lines:

-            nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none"
+            nsys_prefix = (
+                f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" "
+                f"-o {nsys_file} -f true -t cuda,nvtx,python-gil "
+                f"-c cudaProfilerApi --cuda-graph-trace node "
+                f"--capture-range-end=stop --gpu-metrics-devices=none"
+            )

756-760: Remove trailing whitespace from blank line

Line 758 contains whitespace characters on an otherwise blank line.

Remove the whitespace from the blank line to maintain clean formatting.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21e57c7 and 7fd5c5a.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


678-678: Line too long (138 > 120)

(E501)


682-682: Line too long (146 > 120)

(E501)


758-758: Blank line contains whitespace

Remove whitespace from blank line

(W293)


802-802: Line too long (222 > 120)

(E501)


1190-1190: Do not use bare except

(E722)

🔇 Additional comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)

27-71: LGTM!

The node calculation logic correctly handles GPU requirements for context, generation, and IFB servers with proper rounding up.


1104-1125: Good implementation of dataset preparation

The dataset preparation logic correctly checks for existing datasets and uses the configurable script path from command-line arguments, addressing the previous concern about hardcoded paths.


1002-1006: Good use of safe dictionary access

The code correctly uses .get() with a default value to safely access the 'gen_only' configuration key, preventing potential KeyError exceptions.


630-712: Well-implemented nsys session management

The stop_nsys_sessions method is well-structured with:

  • Efficient grouping of jobs by hostname to minimize srun calls
  • Proper error handling with specific exception types
  • Clear logging for debugging

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)

1-10: Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid

11-25: Replace bare except with specific exception handling

The bare except clause should be more specific to avoid catching unexpected exceptions.

 def wait_for_server(host, port, max_retries=1000, delay=5):
     """Wait for a server to become available."""
     for _ in range(max_retries):
         try:
             print(f"Checking server status at {host}:{port}")
             state = requests.get(f"http://{host}:{port}/health")
             if state.status_code == 200:
                 print("Server is running")
                 return True
-        except:
+        except (requests.exceptions.RequestException, ConnectionError):
             time.sleep(delay)
 
     print(f"Server did not start after {max_retries} attempts")
     return False

380-481: Add bounds checking for server_id to prevent index errors

The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames. This could cause an IndexError.

Add bounds checking before accessing self.node_hostnames[server_id]:

             for i in range(self.config['exec']['config']['context']['tp']):
                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.")
                     print(f'Adding {self.node_hostnames[server_id]}')
                     server_ids.append(self.node_hostnames[server_id])

Apply similar checks in the generation and IFB server loops.


873-915: Fix potential undefined variable error

The variable pid is returned at line 914 but is only assigned conditionally. If neither use_trtllm_bench nor use_benchmark_serving is true, pid will be undefined.

Initialize pid at the beginning of the method:

     def run_loadgen_tests(self, hostname, port, generation_server_urls):
         """Run loadgen tests on the master node."""
         print(
             f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
         )
         output_folder = self.output_folder
+        pid = None
 
         # Run loadgen tests for each concurrency level

916-932: Add error handling for metrics collection

The method should handle potential network failures when collecting metrics from generation servers.

     def record_kv_cache_stats(self, generation_server_urls, output_folder):
         """Record the kv cache stats for a specific concurrency level."""
         print(
             f"DEBUG JobManager: Recording kv cache stats for generation servers"
         )
         # Get the kv cache stats
         # Send a request to "/metrics" and store the output in a file
         for url in generation_server_urls:
-            response = requests.get(f"http://{url}/metrics")
-            with open(
-                    os.path.join(output_folder,
-                                 f"{url.split(':')[0]}_kv_cache_stats.txt"),
-                    "w") as f:
-                f.write(response.text)
+            try:
+                response = requests.get(f"http://{url}/metrics", timeout=30)
+                response.raise_for_status()
+                with open(
+                        os.path.join(output_folder,
+                                     f"{url.split(':')[0]}_kv_cache_stats.txt"),
+                        "w") as f:
+                    f.write(response.text)
+            except (requests.exceptions.RequestException, IOError) as e:
+                print(f"Warning: Failed to collect metrics from {url}: {e}")

1012-1023: Replace bare except with specific exception handling

The bare except clause should catch specific exceptions.

     def terminate_jobs(self, jobs):
         """Terminate all running jobs."""
         
         # Then terminate the jobs
         for job in jobs:
             if job and job.poll() is None:
                 try:
                     job.terminate()
                     job.wait(timeout=5)
-                except:
+                except (subprocess.TimeoutExpired, OSError):
                     job.kill()
🧹 Nitpick comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)

89-92: Consider splitting long print statement for better readability

Line 91 exceeds the recommended line length. Consider splitting it:

-        print(
-            f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}"
-        )
+        print(
+            f"DEBUG JobManager: __init__ called with "
+            f"load_config_only={load_config_only}, override_job_id={override_job_id}"
+        )

144-201: Consider extracting folder name generation into helper methods

The create_output_folder method has complex logic for generating folder names. Consider breaking it down into smaller helper methods for better maintainability:

def _get_model_name(self):
    """Extract model name from path."""
    model_name = self.config['exec']['model_path'].split('/')[-1]
    if model_name == '':
        model_name = self.config['exec']['model_path'].split('/')[-2]
    return model_name

def _build_parallelism_suffix(self):
    """Build suffix based on parallelism configuration."""
    # Extract TP/DP values for each server type
    # Return formatted string
    
def _build_feature_suffix(self):
    """Build suffix based on enabled features."""
    # Check for overlap, cuda_graph, adp, etc.
    # Return formatted string

This would make the code more modular and easier to test.


623-627: Remove trailing whitespace

Line 623 contains trailing whitespace that should be removed:

-        
+
         self.terminate_jobs(self.context_jobs + self.generation_jobs +
                             self.ifb_jobs + self.disagg_jobs)

672-677: Split long nsys_prefix line for better readability

Line 676 is too long (222 characters). Split it for better readability:

-            nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none"
+            nsys_prefix = (
+                f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" "
+                f"-o {nsys_file} -f true -t cuda,nvtx,python-gil "
+                f"-c cudaProfilerApi --cuda-graph-trace node "
+                f"--capture-range-end=stop --gpu-metrics-devices=none"
+            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fd5c5a and c878b84.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


627-627: Blank line contains whitespace

Remove whitespace from blank line

(W293)


672-672: Line too long (222 > 120)

(E501)


1016-1016: Do not use bare except

(E722)

🔇 Additional comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)

27-71: LGTM!

The calculate_nodes_needed function correctly calculates the required number of nodes based on GPU requirements for context, generation, and IFB servers, with proper rounding and minimum node guarantee.


1024-1044: Well-implemented resource management

The context manager methods and save_config_to_output_folder are properly implemented with appropriate cleanup and error handling.


1045-1059: LGTM!

The prepare_and_run_single_config method properly orchestrates configuration file creation, saves the complete configuration, and manages node hostname retrieval.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)

1-10: Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import json
 import os
 import subprocess
 import time
 import uuid

11-25: Replace bare except with specific exception handling.

The bare except clause should be more specific to avoid catching unexpected exceptions.

 def wait_for_server(host, port, max_retries=1000, delay=5):
     """Wait for a server to become available."""
     for _ in range(max_retries):
         try:
             print(f"Checking server status at {host}:{port}")
             state = requests.get(f"http://{host}:{port}/health")
             if state.status_code == 200:
                 print("Server is running")
                 return True
-        except:
+        except (requests.exceptions.RequestException, ConnectionError):
             time.sleep(delay)
 
     print(f"Server did not start after {max_retries} attempts")
     return False

380-481: Add bounds checking for server_id to prevent index errors.

The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames. This could cause an IndexError.

Add bounds checking before accessing self.node_hostnames[server_id]:

             for i in range(self.config['exec']['config']['context']['tp']):
                 if i != 0 and i % self.num_gpus == 0:
                     server_id += 1
+                    if server_id >= len(self.node_hostnames):
+                        raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.")
                     print(f'Adding {self.node_hostnames[server_id]}')
                     server_ids.append(self.node_hostnames[server_id])

Apply similar checks in the generation server loop (lines 433-436) and IFB server loop (lines 462-464).


873-915: Fix potential undefined variable error.

The variable pid is returned at line 914 but is only assigned conditionally. If neither use_trtllm_bench nor use_benchmark_serving is true, pid will be undefined.

Initialize pid at the beginning of the method:

     def run_loadgen_tests(self, hostname, port, generation_server_urls):
         """Run loadgen tests on the master node."""
         print(
             f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
         )
         output_folder = self.output_folder
+        pid = None
 
         # Run loadgen tests for each concurrency level

916-932: Add error handling for metrics collection.

The method should handle potential network failures when collecting metrics from generation servers.

     def record_kv_cache_stats(self, generation_server_urls, output_folder):
         """Record the kv cache stats for a specific concurrency level."""
         print(
             f"DEBUG JobManager: Recording kv cache stats for generation servers"
         )
         # Get the kv cache stats
         # Send a request to "/metrics" and store the output in a file
         for url in generation_server_urls:
-            response = requests.get(f"http://{url}/metrics")
-            with open(
-                    os.path.join(output_folder,
-                                 f"{url.split(':')[0]}_kv_cache_stats.txt"),
-                    "w") as f:
-                f.write(response.text)
+            try:
+                response = requests.get(f"http://{url}/metrics", timeout=30)
+                response.raise_for_status()
+                with open(
+                        os.path.join(output_folder,
+                                     f"{url.split(':')[0]}_kv_cache_stats.txt"),
+                        "w") as f:
+                    f.write(response.text)
+            except (requests.exceptions.RequestException, IOError) as e:
+                print(f"Warning: Failed to collect metrics from {url}: {e}")
         print(
             f"DEBUG JobManager: Recorded kv cache stats for generation servers")

1012-1023: Replace bare except with specific exception handling.

The bare except clause should catch specific exceptions.

     def terminate_jobs(self, jobs):
         """Terminate all running jobs."""
         
         # Then terminate the jobs
         for job in jobs:
             if job and job.poll() is None:
                 try:
                     job.terminate()
                     job.wait(timeout=5)
-                except:
+                except (subprocess.TimeoutExpired, OSError):
                     job.kill()
🧹 Nitpick comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)

89-92: Format long line for better readability.

Line 91 exceeds the 120 character limit.

Split the debug message across multiple lines:

         print(
-            f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}"
+            f"DEBUG JobManager: __init__ called with "
+            f"load_config_only={load_config_only}, override_job_id={override_job_id}"
         )

627-627: Remove whitespace from blank line.

-        
+

675-677: Split long line for better readability.

Line 676 exceeds the 120 character limit significantly.

             nsys_file = os.path.join(self.output_folder, f"{log_file}.$SLURM_PROCID.nsys-rep")
-            nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none"
+            nsys_prefix = (
+                f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" "
+                f"-o {nsys_file} -f true -t cuda,nvtx,python-gil "
+                f"-c cudaProfilerApi --cuda-graph-trace node "
+                f"--capture-range-end=stop --gpu-metrics-devices=none"
+            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c878b84 and 1bb5fb8.

📒 Files selected for processing (1)
  • examples/disaggregated/slurm/disagg_profiler/job_manager.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: 2025-08-06T21:22:55.018Z
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-06T21:22:55.018Z
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-06T21:22:55.018Z
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py

20-20: Do not use bare except

(E722)


89-89: Line too long (124 > 120)

(E501)


627-627: Blank line contains whitespace

Remove whitespace from blank line

(W293)


672-672: Line too long (222 > 120)

(E501)


1016-1016: Do not use bare except

(E722)

🔇 Additional comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)

27-71: LGTM!

The function correctly calculates the number of nodes needed based on tensor and data parallelism configurations.


1024-1030: Good implementation of context manager protocol.

The context manager methods properly handle resource cleanup by terminating all jobs on exit.


1031-1059: Well-structured configuration management and job orchestration.

The methods properly handle configuration saving and job execution workflow with appropriate error handling.

@pcastonguay
Copy link
Collaborator

@Tabrizian should this PR be closed?

@Tabrizian
Copy link
Member Author

Yes

@Tabrizian Tabrizian closed this Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants