-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add disagg launcher scripts #6109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add disagg launcher scripts #6109
Conversation
WalkthroughA new disaggregated serving launcher system for TRT-LLM benchmarks on SLURM was introduced, including documentation, configuration templates, a launcher script, and a modular Python package for job management and parameter sweeping. The previous node URL population script was removed, consolidating all orchestration and configuration logic into the new, integrated launcher and profiler suite. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Launcher
participant Sweeper/JobManager
participant SLURM
participant TRTLLM Servers
User->>Launcher: Run launcher.py with config and sweep options
Launcher->>SLURM: (Optional) Request allocation
Launcher->>Sweeper/JobManager: Select sweep or job mode
Sweeper/JobManager->>SLURM: Launch context/gen/IFB servers via srun
Sweeper/JobManager->>TRTLLM Servers: Start servers with config
Sweeper/JobManager->>TRTLLM Servers: Run loadgen tests
Sweeper/JobManager->>Sweeper/JobManager: Collect results, manage jobs
Sweeper/JobManager->>SLURM: Release allocation (if needed)
Sweeper/JobManager->>Launcher: Return results/status
Launcher->>User: Print summary, exit
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🔭 Outside diff range comments (1)
examples/disaggregated/slurm/config.yaml (1)
1-37: Fix the configuration structure -profileshould not be nested underexecThe
profilesection (lines 30-37) is incorrectly nested underexec. Based on the code injob_manager.pyandlauncher.py, the configuration expectsprofileto be at the root level, not underexec.Apply this diff to fix the structure:
exec: config: context: tp: 4 ep: 4 pp: 1 max_batch_size: 4 max_num_tokens: 1024 max_seq_len: 1024 config: kv_cache_config: free_gpu_memory_fraction: 0.75 enable_block_reuse: false print_iter_log: true dp: 1 generation: tp: 4 ep: 4 pp: 1 max_batch_size: 1 max_num_tokens: 4096 max_seq_len: 2048 config: print_iter_log: true kv_cache_config: free_gpu_memory_fraction: 0.75 enable_block_reuse: false dp: 1 model_path: TinyLlama/TinyLlama-1.1B-Chat-v1.0 profile: isl: 1024 osl: 1024 use_benchmark_serving: true concurrency: - 128 - 256
🧹 Nitpick comments (9)
examples/disaggregated/slurm/config.yaml (1)
3-15: Improve parameter organization by grouping parallelism parameters togetherThe
dpparameter is placed after the nestedconfigsection, which affects readability. Consider grouping all parallelism parameters (tp, ep, pp, dp) together.Apply this diff for better organization:
config: context: tp: 4 ep: 4 pp: 1 + dp: 1 max_batch_size: 4 max_num_tokens: 1024 max_seq_len: 1024 config: kv_cache_config: free_gpu_memory_fraction: 0.75 enable_block_reuse: false print_iter_log: true - dp: 1 generation: tp: 4 ep: 4 pp: 1 + dp: 1 max_batch_size: 1 max_num_tokens: 4096 max_seq_len: 2048 config: print_iter_log: true kv_cache_config: free_gpu_memory_fraction: 0.75 enable_block_reuse: false - dp: 1Also applies to: 16-28
examples/disaggregated/slurm/launcher.py (3)
153-155: Consider extracting config loading to a utility functionCreating a temporary
JobManagerinstance just to load the configuration feels like a workaround. Consider adding a standalone utility function for loading configurations.Would you like me to create a utility function that loads the configuration without instantiating a JobManager?
90-90: Fix line length violationsThese lines exceed the 120 character limit.
For line 90:
- The goal is to ensure the request processing rates of context and generation servers are balanced. + The goal is to ensure the request processing rates of context and generation + servers are balanced.For line 174:
- "Choose one of: --auto-sweep, --sweep-config, or define single configurations (not lists) in your config file" + "Choose one of: --auto-sweep, --sweep-config, or define single configurations " + "(not lists) in your config file"For line 205:
- "Multi-config or manual sweep mode without single allocation - allocations will be requested per configuration" + "Multi-config or manual sweep mode without single allocation - " + "allocations will be requested per configuration"Also applies to: 174-174, 205-205
247-252: Add clarifying comment for the allocation cleanup conditionThe condition for releasing the SLURM allocation is complex. Consider adding a comment to explain the logic.
# Release the allocation if we requested one and it's not a manual sweep (which handles its own allocations) + # We release allocations for: + # 1. Single config mode + # 2. Auto-sweep mode + # 3. Manual sweep with single allocation + # We don't release for manual sweep without single allocation or multi-config (they handle their own) if job_id and (not args.sweep_config or args.use_single_allocation) and not is_multi_config:examples/disaggregated/slurm/disagg_profiler/job_manager.py (2)
27-71: Add missingnum_gpusparameter to docstringThe function signature includes
num_gpusparameter but it's missing from the docstring.Args: config (dict): The configuration dictionary + num_gpus (int): The number of GPUs per node Returns: int: The number of nodes needed
90-92: Remove or conditionalize debug print statementsMultiple debug print statements are present which should be removed or made conditional.
Consider using Python's logging module instead of print statements for debugging:
import logging logger = logging.getLogger(__name__) # Then replace prints with: logger.debug(f"JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}")This allows users to control debug output via logging configuration.
Also applies to: 124-127, 140-142
examples/disaggregated/slurm/disagg_profiler/sweeper.py (3)
297-326: Add more robust error handling for file parsing.The method silently continues if files don't exist or parsing fails. Consider logging warnings for missing expected files.
benchmark_serving_path = os.path.join(concurrency_path, "benchmark_serving.log") if os.path.exists(benchmark_serving_path): with open(benchmark_serving_path, 'r') as f: for line in f: if "Request throughput (req/s):" in line: rps = float( line.split("Request throughput (req/s):") [1].strip()) results[concurrency] = rps print( f"Concurrency {concurrency}: {rps:.3f} Requests/s" ) else: # Look for raw_data.json file raw_data_path = os.path.join(concurrency_path, "raw_data.json") if os.path.exists(raw_data_path): try: with open(raw_data_path, 'r') as f: data = json.load(f) if 'results' in data and 'infbench_summary' in data[ 'results']: rps = data['results']['infbench_summary'][ 'Requests/s'] results[concurrency] = rps print( f"Concurrency {concurrency}: {rps:.3f} Requests/s" ) except Exception as e: print(f"Error parsing {raw_data_path}: {e}") + else: + print(f"Warning: No results found for concurrency {concurrency} - expected benchmark_serving.log or raw_data.json")
922-922: Line exceeds 120 character limit.Consider breaking this line for better readability.
- f"Synchronizing parameter {sync_param}={value} across context and generation servers" + f"Synchronizing parameter {sync_param}={value} " + f"across context and generation servers"
1226-1226: Line exceeds 120 character limit.Consider breaking this line for better readability.
- f"Generating combinations for disaggregated mode: {len(context_configs)} context × {len(generation_configs)} generation" + f"Generating combinations for disaggregated mode: " + f"{len(context_configs)} context × {len(generation_configs)} generation"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
examples/disaggregated/slurm/README.md(1 hunks)examples/disaggregated/slurm/config.yaml(1 hunks)examples/disaggregated/slurm/disagg_profiler/__init__.py(1 hunks)examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)examples/disaggregated/slurm/disagg_profiler/sweeper.py(1 hunks)examples/disaggregated/slurm/launcher.py(1 hunks)examples/disaggregated/slurm/slurm_populate_urls.py(0 hunks)
💤 Files with no reviewable changes (1)
- examples/disaggregated/slurm/slurm_populate_urls.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
examples/disaggregated/slurm/disagg_profiler/__init__.py (2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
JobManager(72-1029)calculate_nodes_needed(27-69)wait_for_server(11-24)examples/disaggregated/slurm/disagg_profiler/sweeper.py (4)
AutoSweeper(150-804)ParameterSweeper(807-1097)get_slurm_allocation(19-80)run_sweep_configuration(83-147)
examples/disaggregated/slurm/disagg_profiler/sweeper.py (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
JobManager(72-1029)calculate_nodes_needed(27-69)launch_jobs(503-615)
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/launcher.py
90-90: Line too long (122 > 120)
(E501)
174-174: Line too long (122 > 120)
(E501)
205-205: Line too long (127 > 120)
(E501)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
992-992: Do not use bare except
(E722)
examples/disaggregated/slurm/disagg_profiler/sweeper.py
151-159: 1 blank line required between summary line and description
(D205)
922-922: Line too long (130 > 120)
(E501)
1226-1226: Line too long (136 > 120)
(E501)
1278-1279: One-line docstring should fit on one line
Reformat to one line
(D200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
examples/disaggregated/slurm/config.yaml (1)
1-37: IFB configuration is optional by defaultThe code only applies InFlight Batching when an ifb section is present under exec.config; if it’s missing, IFB servers default to zero. Since your example doesn’t need IFB, no changes are required—add an ifb: block only when you intend to enable InFlight Batching.
examples/disaggregated/slurm/disagg_profiler/__init__.py (1)
1-17: LGTM!The package initialization file is well-structured with clear documentation and proper exports.
examples/disaggregated/slurm/launcher.py (1)
11-146: Well-structured argument parser with comprehensive optionsThe argument parser setup is thorough and provides clear documentation for each option. The detailed help strings for sweep modes are particularly helpful.
| """ | ||
| # Check if any sweep configuration needs more nodes than the base configuration | ||
| base_config = all_configs[0][0] if all_configs else {} | ||
| base_nodes = calculate_nodes_needed(base_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same calculate_nodes_needed signature issue.
The function is called with only the config parameter.
Also applies to: 934-934
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/sweeper.py at lines 930 and 934,
the function calculate_nodes_needed is called with only the config parameter,
but its signature requires additional arguments. Update these calls to include
all required parameters as per the function definition to match the expected
signature.
| tp=tp) | ||
|
|
||
| # Calculate nodes needed for this configuration | ||
| nodes_needed = calculate_nodes_needed(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue with calculate_nodes_needed function calls.
These calls also pass only the config parameter, but the function expects config and num_gpus.
Also applies to: 513-513
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/sweeper.py at lines 389 and 513,
the function calculate_nodes_needed is called with only the config parameter,
but it requires both config and num_gpus parameters. Update these calls to pass
the appropriate num_gpus argument along with config to match the function
signature.
Signed-off-by: Iman Tabrizian <[email protected]>
290b957 to
bcd565d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (8)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)
11-25: Replace bare except with specific exception handling
380-475: Add bounds checking for server_id to prevent index errors
808-812: Add safe access for potentially missing configuration key
989-998: Replace bare except with specific exception handlingexamples/disaggregated/slurm/disagg_profiler/sweeper.py (4)
105-105: Fix missingnum_gpusargument in calculate_nodes_needed calls
389-390: Same issue with calculate_nodes_needed function callsAlso applies to: 514-515
871-878: Security risk: Replace eval() with safer alternative
932-936: Same calculate_nodes_needed signature issue
🧹 Nitpick comments (2)
examples/disaggregated/slurm/launcher.py (1)
11-151: Consider breaking long lines for better readabilitySome help text lines exceed the 120-character limit. While this doesn't affect functionality, breaking them would improve code readability.
Apply this diff to break the long lines:
help= - """Automatically determine optimal sweep configurations by profiling context and generation servers. + """Automatically determine optimal sweep configurations by profiling context and generation + servers. auto-sweep uses separate SLURM allocations for each configuration, providing better resource utilization and fault tolerance compared to a single large allocation. - The goal is to ensure the request processing rates of context and generation servers are balanced. + The goal is to ensure the request processing rates of context and generation servers + are balanced.examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)
76-143: Remove debug print statements from production codeThe
__init__method contains several debug print statements that should be removed or converted to proper logging before production deployment.Consider using Python's logging module instead:
- print( - f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}" - ) + import logging + logger = logging.getLogger(__name__) + logger.debug( + f"JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}" + )Apply similar changes to lines 125-127, 140-142.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/disaggregated/README.md(1 hunks)examples/disaggregated/slurm/README.md(1 hunks)examples/disaggregated/slurm/config.yaml(1 hunks)examples/disaggregated/slurm/disagg_profiler/__init__.py(1 hunks)examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)examples/disaggregated/slurm/disagg_profiler/sweeper.py(1 hunks)examples/disaggregated/slurm/launcher.py(1 hunks)examples/disaggregated/slurm/slurm_populate_urls.py(0 hunks)
💤 Files with no reviewable changes (1)
- examples/disaggregated/slurm/slurm_populate_urls.py
✅ Files skipped from review due to trivial changes (2)
- examples/disaggregated/README.md
- examples/disaggregated/slurm/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/disaggregated/slurm/disagg_profiler/init.py
- examples/disaggregated/slurm/config.yaml
🧰 Additional context used
🧬 Code Graph Analysis (1)
examples/disaggregated/slurm/disagg_profiler/sweeper.py (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
JobManager(72-1029)calculate_nodes_needed(27-69)launch_jobs(503-615)
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/launcher.py
96-96: Line too long (122 > 120)
(E501)
180-180: Line too long (122 > 120)
(E501)
211-211: Line too long (127 > 120)
(E501)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
992-992: Do not use bare except
(E722)
examples/disaggregated/slurm/disagg_profiler/sweeper.py
151-159: 1 blank line required between summary line and description
(D205)
924-924: Line too long (130 > 120)
(E501)
1228-1228: Line too long (136 > 120)
(E501)
1280-1281: One-line docstring should fit on one line
Reformat to one line
(D200)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
examples/disaggregated/slurm/launcher.py (1)
154-262: Well-structured main function with proper error handlingThe main function effectively orchestrates different sweep modes with appropriate validation, allocation management, and cleanup logic. The mode detection and mutual exclusion check prevent conflicting configurations.
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)
910-930: Good implementation of dataset preparationThe prepare_dataset method properly checks for existing dataset files before preparation and uses the configurable prepare_dataset_script parameter from args, which addresses the previous concern about hardcoded paths.
examples/disaggregated/slurm/disagg_profiler/sweeper.py (2)
19-82: Robust SLURM allocation request implementationThe function properly handles subprocess communication with real-time output streaming and regex-based job ID extraction. Good error handling ensures failures are caught and reported.
1122-1327: Well-designed multi-configuration sweeperThe MultiConfigSweeper class effectively handles multiple server configurations with proper detection of multi-config format and exhaustive combination generation. The implementation cleanly separates disaggregated mode (context + generation) from IFB mode.
|
@Tabrizian with the changes in #6054, I'm concerned that there will be two different separate sets of scripts for doing the same thing in |
Signed-off-by: Iman Tabrizian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
20-20: Replace bare except with specific exception handling.The bare except clause should be more specific to avoid catching unexpected exceptions.
380-475: Add bounds checking for server_id to prevent index errors.The server_id is incremented in multiple places without checking if it exceeds the length of self.node_hostnames.
997-997: Replace bare except with specific exception handling.The bare except clause should catch specific exceptions.
🧹 Nitpick comments (4)
examples/disaggregated/slurm/launcher.py (3)
100-100: Fix line length violation.Line exceeds the 120 character limit.
- "The goal is to ensure the request processing rates of context and generation servers are balanced." + "The goal is to ensure the request processing rates of context and generation " + "servers are balanced."
184-184: Fix line length violation.Line exceeds the 120 character limit.
- "Choose one of: --auto-sweep, --sweep-config, or define single configurations (not lists) in your config file" + "Choose one of: --auto-sweep, --sweep-config, or define single configurations " + "(not lists) in your config file"
214-216: Fix line length violation.Line exceeds the 120 character limit.
- print( - "Multi-config or manual sweep mode without single allocation - allocations will be requested per configuration" - ) + print( + "Multi-config or manual sweep mode without single allocation - " + "allocations will be requested per configuration" + )examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)
89-92: Fix line length violation.Line exceeds the 120 character limit.
print( - f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}" + f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, " + f"override_job_id={override_job_id}" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)examples/disaggregated/slurm/launcher.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.pyexamples/disaggregated/slurm/launcher.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.pyexamples/disaggregated/slurm/launcher.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
993-993: Do not use bare except
(E722)
examples/disaggregated/slurm/launcher.py
100-100: Line too long (122 > 120)
(E501)
184-184: Line too long (122 > 120)
(E501)
215-215: Line too long (127 > 120)
(E501)
🔇 Additional comments (5)
examples/disaggregated/slurm/launcher.py (1)
158-266: Well-structured mode handling and resource management.The main function effectively handles multiple execution modes (auto-sweep, multi-config, manual sweep, single config) with proper validation to prevent conflicting modes. The SLURM allocation management and cleanup logic is comprehensive and handles different scenarios appropriately.
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)
76-142: Well-structured initialization with proper parameter handling.The JobManager initialization properly handles different modes (load_config_only), parameter overrides, and includes appropriate debug logging for troubleshooting.
144-379: Comprehensive configuration management.The configuration methods handle different server types (context, generation, IFB) consistently and properly save configurations both for execution and reference. The output folder creation logic accounts for various configuration parameters to create meaningful directory structures.
476-849: Robust server launching implementation.The server launching methods provide comprehensive support for different server types with proper SLURM integration, container execution, and logging. The command building logic correctly handles various configuration parameters and resource allocation.
850-1035: Comprehensive load testing and resource management.The implementation provides robust load testing capabilities with support for multiple benchmarking modes (trtllm_bench, benchmark_serving). The context manager ensures proper cleanup of resources, and the configuration saving logic maintains good traceability of experimental setups.
| import json | ||
| import os | ||
| import subprocess | ||
| import time | ||
| import uuid | ||
|
|
||
| import requests | ||
| import yaml | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add required NVIDIA copyright header.
According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
import json
import os
import subprocess
import time
import uuid📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import json | |
| import os | |
| import subprocess | |
| import time | |
| import uuid | |
| import requests | |
| import yaml | |
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| import json | |
| import os | |
| import subprocess | |
| import time | |
| import uuid | |
| import requests | |
| import yaml |
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/job_manager.py at the top of the
file (lines 1 to 10), add the required NVIDIA copyright header including the
current year as per the coding guidelines. Insert the standard NVIDIA copyright
notice as the first lines before any imports or code.
| import argparse | ||
| import os | ||
| import subprocess | ||
| import sys | ||
|
|
||
| from disagg_profiler.job_manager import JobManager, calculate_nodes_needed | ||
| from disagg_profiler.sweeper import (AutoSweeper, MultiConfigSweeper, | ||
| ParameterSweeper, get_slurm_allocation) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add required NVIDIA copyright header.
According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
import argparse
import os
import subprocess
import sys📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import argparse | |
| import os | |
| import subprocess | |
| import sys | |
| from disagg_profiler.job_manager import JobManager, calculate_nodes_needed | |
| from disagg_profiler.sweeper import (AutoSweeper, MultiConfigSweeper, | |
| ParameterSweeper, get_slurm_allocation) | |
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| import argparse | |
| import os | |
| import subprocess | |
| import sys | |
| from disagg_profiler.job_manager import JobManager, calculate_nodes_needed | |
| from disagg_profiler.sweeper import (AutoSweeper, MultiConfigSweeper, | |
| ParameterSweeper, get_slurm_allocation) |
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/launcher.py at the top of the file (lines 1 to
10), add the required NVIDIA copyright header including the current year as per
coding guidelines. Insert the standard NVIDIA copyright notice as the first
lines before any imports or code.
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)
1044-1046: Add validation for sufficient node allocationAfter getting node hostnames, validate that enough nodes are allocated for the configuration requirements.
# Get hostnames of available nodes self.node_hostnames = self.get_node_hostnames() + + # Validate sufficient nodes are allocated + nodes_needed = calculate_nodes_needed(self.config, self.num_gpus) + if len(self.node_hostnames) < nodes_needed: + raise ValueError( + f"Insufficient nodes allocated. Need {nodes_needed} nodes " + f"but only {len(self.node_hostnames)} available" + )
♻️ Duplicate comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)
1-10: Add required NVIDIA copyright header.According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import json import os import subprocess import time import uuid
11-25: Replace bare except with specific exception handlingThe bare
exceptclause should be more specific to avoid catching unexpected exceptions.def wait_for_server(host, port, max_retries=1000, delay=5): """Wait for a server to become available.""" for _ in range(max_retries): try: print(f"Checking server status at {host}:{port}") state = requests.get(f"http://{host}:{port}/health") if state.status_code == 200: print("Server is running") return True - except: + except (requests.exceptions.RequestException, ConnectionError): time.sleep(delay) print(f"Server did not start after {max_retries} attempts") return False
380-475: Add bounds checking for server_id to prevent index errorsThe
server_idis incremented in multiple places without checking if it exceeds the length ofself.node_hostnames. This could cause an IndexError.Add bounds checking before accessing
self.node_hostnames[server_id]:for i in range(self.config['exec']['config']['context']['tp']): if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.") print(f'Adding {self.node_hostnames[server_id]}') server_ids.append(self.node_hostnames[server_id])Apply similar checks in the generation and IFB server loops.
1001-1010: Replace bare except with specific exception handlingThe bare
exceptclause should catch specific exceptions.def terminate_jobs(self, jobs): """Terminate all running jobs.""" for job in jobs: if job and job.poll() is None: try: job.terminate() job.wait(timeout=5) - except: + except (subprocess.TimeoutExpired, OSError): job.kill()
🧹 Nitpick comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)
89-92: Break long debug line for better readabilityLine 91 exceeds the 120 character limit.
print( - f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}" + f"DEBUG JobManager: __init__ called with " + f"load_config_only={load_config_only}, override_job_id={override_job_id}" )
144-201: Consider extracting folder naming logic for better maintainabilityThe
create_output_foldermethod contains complex folder naming logic that could be extracted into a separate helper method for better readability and maintainability.Consider refactoring:
def _generate_folder_suffix(self, config): """Generate folder suffix based on configuration options.""" suffix = "" for server_type in ['generation', 'ifb']: if server_type in config['exec']['config']: # ... existing logic for suffix generation return suffix def create_output_folder(self): # ... existing model name logic # ... existing base folder creation suffix = self._generate_folder_suffix(self.config) output_folder += suffix # ... rest of the method
659-664: Break long nsys_prefix line for better readabilityLine 663 exceeds the 120 character limit significantly (222 chars).
- nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none" + nsys_prefix = ( + f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" " + f"-o {nsys_file} -f true -t cuda,nvtx,python-gil " + f"-c cudaProfilerApi --cuda-graph-trace node " + f"--capture-range-end=stop --gpu-metrics-devices=none" + )
507-566: Consider optimizing server startup with parallel launching or adaptive waitingThe current implementation uses fixed sleep delays (15 seconds per server, 30 seconds for disaggregated server) which can significantly slow down the startup process. Consider:
- Parallel launching: Launch all servers of the same type simultaneously
- Adaptive waiting: Use
wait_for_serverimmediately after launching instead of fixed delays- Configurable timeouts: Make wait times configurable based on cluster characteristics
Example optimization:
# Launch all servers in parallel for node in distribution: # Launch without intermediate sleep if node["is_ifb"]: self.ifb_jobs.extend(self.launch_ifb_server(...)) # ... other server types # Then wait for all servers with adaptive timing for node in distribution: if not wait_for_server(node["hostnames"][0], node["port"], max_retries=100, delay=2): raise RuntimeError(f"Server {node['hostnames'][0]}:{node['port']} failed to start")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
659-659: Line too long (222 > 120)
(E501)
1004-1004: Do not use bare except
(E722)
🔇 Additional comments (5)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (5)
27-71: LGTM!The
calculate_nodes_neededfunction correctly calculates the total number of nodes required based on the GPU requirements of context, generation, and IFB servers.
818-824: Good practice: Safe access to configuration keyThe code correctly uses
.get()method with a default value to safely access the potentially missing 'gen_only' configuration key, preventing KeyError exceptions.
922-943: LGTM! Dataset preparation with configurable script pathThe method correctly:
- Checks if the dataset already exists to avoid redundant generation
- Uses a configurable script path via
self.args.prepare_dataset_script(addressing the previous hardcoded path issue)- Properly constructs the command for dataset generation
348-379: Well-designed fallback mechanism for node hostname resolutionThe
get_node_hostnamesmethod properly:
- Checks for SLURM environment variables
- Attempts to retrieve hostnames via scontrol
- Falls back gracefully to localhost for non-SLURM environments
This design ensures the code works both in SLURM clusters and local development environments.
944-978: LGTM! Robust configuration handling with proper defaultsThe method demonstrates good practices:
- Safe dictionary access with fallback values for all configuration parameters
- Proper dataset file path construction
- Clear command construction with all necessary parameters
| def record_kv_cache_stats(self, generation_server_urls, output_folder): | ||
| """Record the kv cache stats for a specific concurrency level.""" | ||
| print( | ||
| f"DEBUG JobManager: Recording kv cache stats for generation servers" | ||
| ) | ||
| # Get the kv cache stats | ||
| # Send a request to "/metrics" and store the output in a file | ||
| for url in generation_server_urls: | ||
| response = requests.get(f"http://{url}/metrics") | ||
| with open( | ||
| os.path.join(output_folder, | ||
| f"{url.split(':')[0]}_kv_cache_stats.txt"), | ||
| "w") as f: | ||
| f.write(response.text) | ||
| print( | ||
| f"DEBUG JobManager: Recorded kv cache stats for generation servers") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for metrics collection
The method should handle potential network failures when collecting metrics from generation servers.
def record_kv_cache_stats(self, generation_server_urls, output_folder):
"""Record the kv cache stats for a specific concurrency level."""
print(
f"DEBUG JobManager: Recording kv cache stats for generation servers"
)
# Get the kv cache stats
# Send a request to "/metrics" and store the output in a file
for url in generation_server_urls:
- response = requests.get(f"http://{url}/metrics")
- with open(
- os.path.join(output_folder,
- f"{url.split(':')[0]}_kv_cache_stats.txt"),
- "w") as f:
- f.write(response.text)
+ try:
+ response = requests.get(f"http://{url}/metrics", timeout=30)
+ response.raise_for_status()
+ with open(
+ os.path.join(output_folder,
+ f"{url.split(':')[0]}_kv_cache_stats.txt"),
+ "w") as f:
+ f.write(response.text)
+ except (requests.exceptions.RequestException, IOError) as e:
+ print(f"Warning: Failed to collect metrics from {url}: {e}")
print(
f"DEBUG JobManager: Recorded kv cache stats for generation servers")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def record_kv_cache_stats(self, generation_server_urls, output_folder): | |
| """Record the kv cache stats for a specific concurrency level.""" | |
| print( | |
| f"DEBUG JobManager: Recording kv cache stats for generation servers" | |
| ) | |
| # Get the kv cache stats | |
| # Send a request to "/metrics" and store the output in a file | |
| for url in generation_server_urls: | |
| response = requests.get(f"http://{url}/metrics") | |
| with open( | |
| os.path.join(output_folder, | |
| f"{url.split(':')[0]}_kv_cache_stats.txt"), | |
| "w") as f: | |
| f.write(response.text) | |
| print( | |
| f"DEBUG JobManager: Recorded kv cache stats for generation servers") | |
| def record_kv_cache_stats(self, generation_server_urls, output_folder): | |
| """Record the kv cache stats for a specific concurrency level.""" | |
| print( | |
| f"DEBUG JobManager: Recording kv cache stats for generation servers" | |
| ) | |
| # Get the kv cache stats | |
| # Send a request to "/metrics" and store the output in a file | |
| for url in generation_server_urls: | |
| try: | |
| response = requests.get(f"http://{url}/metrics", timeout=30) | |
| response.raise_for_status() | |
| with open( | |
| os.path.join(output_folder, | |
| f"{url.split(':')[0]}_kv_cache_stats.txt"), | |
| "w") as f: | |
| f.write(response.text) | |
| except (requests.exceptions.RequestException, IOError) as e: | |
| print(f"Warning: Failed to collect metrics from {url}: {e}") | |
| print( | |
| f"DEBUG JobManager: Recorded kv cache stats for generation servers" | |
| ) |
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/job_manager.py around lines 905
to 921, the record_kv_cache_stats method lacks error handling for network
requests to generation servers. Add try-except blocks around the requests.get
call to catch exceptions like connection errors or timeouts. Log or print an
error message when a request fails, and ensure the method continues processing
other URLs without crashing.
Signed-off-by: Iman Tabrizian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (5)
1-10: Add required NVIDIA copyright header.According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the beginning of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import json import os import subprocess import time import uuid
11-25: Replace bare except with specific exception handlingThe bare
exceptclause should be more specific to avoid catching unexpected exceptions.def wait_for_server(host, port, max_retries=1000, delay=5): """Wait for a server to become available.""" for _ in range(max_retries): try: print(f"Checking server status at {host}:{port}") state = requests.get(f"http://{host}:{port}/health") if state.status_code == 200: print("Server is running") return True - except: + except (requests.exceptions.RequestException, ConnectionError): time.sleep(delay) print(f"Server did not start after {max_retries} attempts") return False
911-927: Add error handling for metrics collectionThe method should handle potential network failures when collecting metrics from generation servers.
def record_kv_cache_stats(self, generation_server_urls, output_folder): """Record the kv cache stats for a specific concurrency level.""" print( f"DEBUG JobManager: Recording kv cache stats for generation servers" ) # Get the kv cache stats # Send a request to "/metrics" and store the output in a file for url in generation_server_urls: - response = requests.get(f"http://{url}/metrics") - with open( - os.path.join(output_folder, - f"{url.split(':')[0]}_kv_cache_stats.txt"), - "w") as f: - f.write(response.text) + try: + response = requests.get(f"http://{url}/metrics", timeout=30) + response.raise_for_status() + with open( + os.path.join(output_folder, + f"{url.split(':')[0]}_kv_cache_stats.txt"), + "w") as f: + f.write(response.text) + except (requests.exceptions.RequestException, IOError) as e: + print(f"Warning: Failed to collect metrics from {url}: {e}") print( f"DEBUG JobManager: Recorded kv cache stats for generation servers")
1007-1016: Replace bare except with specific exception handlingThe bare
exceptclause should catch specific exceptions.def terminate_jobs(self, jobs): """Terminate all running jobs.""" for job in jobs: if job and job.poll() is None: try: job.terminate() job.wait(timeout=5) - except: + except (subprocess.TimeoutExpired, OSError): job.kill()
380-481: Add bounds checking for server_id to prevent index errorsThe
server_idis incremented in multiple places without checking if it exceeds the length ofself.node_hostnames. This could cause an IndexError.Add bounds checking before accessing
self.node_hostnames[server_id]:for i in range(self.config['exec']['config']['context']['tp']): if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes for context servers.") print(f'Adding {self.node_hostnames[server_id]}') server_ids.append(self.node_hostnames[server_id])Apply similar checks in the generation server loop (lines 433-436) and IFB server loop (lines 461-464):
if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes for generation servers.") server_ids.append(self.node_hostnames[server_id])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
665-665: Line too long (222 > 120)
(E501)
1010-1010: Do not use bare except
(E722)
🔇 Additional comments (2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (2)
27-71: LGTM!The node calculation logic is well-implemented with proper rounding and minimum node guarantees.
73-1052: No duplicate SLURM logic found – consolidation verified
I inspected launcher.py and sweeper.py; both delegate all SLURM job orchestration to JobManager and merely handle argument parsing and sweep loops. There’s no overlapping or redundant launch logic elsewhere in examples/disaggregated/slurm.
| def run_loadgen_tests(self, hostname, port, generation_server_urls): | ||
| """Run loadgen tests on the master node.""" | ||
| print( | ||
| f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" | ||
| ) | ||
| output_folder = self.output_folder | ||
|
|
||
| # Run loadgen tests for each concurrency level | ||
| if 'use_trtllm_bench' in self.config['profile'] and self.config[ | ||
| 'profile']['use_trtllm_bench']: | ||
| self.prepare_dataset(output_folder) | ||
|
|
||
| concurrency_levels = self.config['profile']['concurrency'] | ||
| print( | ||
| f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}" | ||
| ) | ||
|
|
||
| for concurrency in concurrency_levels: | ||
| print( | ||
| f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}" | ||
| ) | ||
| concurrency_output_folder = f"{output_folder}/{concurrency}" | ||
| if not os.path.exists(concurrency_output_folder): | ||
| os.makedirs(concurrency_output_folder) | ||
|
|
||
| # Build and run the loadgen command | ||
| if 'use_trtllm_bench' in self.config['profile'] and self.config[ | ||
| 'profile']['use_trtllm_bench']: | ||
| self.run_trtllm_bench_for_concurrency( | ||
| hostname, port, concurrency, concurrency_output_folder) | ||
| elif 'use_benchmark_serving' in self.config[ | ||
| 'profile'] and self.config['profile'][ | ||
| 'use_benchmark_serving']: | ||
| pid = self.run_benchmark_serving_for_concurrency( | ||
| hostname, port, concurrency, concurrency_output_folder) | ||
| pid.wait() | ||
| print( | ||
| f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}" | ||
| ) | ||
|
|
||
| print("DEBUG JobManager: run_loadgen_tests completed") | ||
| return pid | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix potential undefined variable error
The variable pid is returned at line 909 but is only assigned conditionally. If neither use_trtllm_bench nor use_benchmark_serving is true, pid will be undefined.
Initialize pid at the beginning of the method or handle the case when neither condition is met:
def run_loadgen_tests(self, hostname, port, generation_server_urls):
"""Run loadgen tests on the master node."""
print(
f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}"
)
output_folder = self.output_folder
+ pid = None
# Run loadgen tests for each concurrency level
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.prepare_dataset(output_folder)
concurrency_levels = self.config['profile']['concurrency']
print(
f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}"
)
for concurrency in concurrency_levels:
print(
f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}"
)
concurrency_output_folder = f"{output_folder}/{concurrency}"
if not os.path.exists(concurrency_output_folder):
os.makedirs(concurrency_output_folder)
# Build and run the loadgen command
if 'use_trtllm_bench' in self.config['profile'] and self.config[
'profile']['use_trtllm_bench']:
self.run_trtllm_bench_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
elif 'use_benchmark_serving' in self.config[
'profile'] and self.config['profile'][
'use_benchmark_serving']:
pid = self.run_benchmark_serving_for_concurrency(
hostname, port, concurrency, concurrency_output_folder)
pid.wait()
+ else:
+ print("Warning: No benchmarking method configured")
print(
f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}"
)
print("DEBUG JobManager: run_loadgen_tests completed")
return pid📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def run_loadgen_tests(self, hostname, port, generation_server_urls): | |
| """Run loadgen tests on the master node.""" | |
| print( | |
| f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" | |
| ) | |
| output_folder = self.output_folder | |
| # Run loadgen tests for each concurrency level | |
| if 'use_trtllm_bench' in self.config['profile'] and self.config[ | |
| 'profile']['use_trtllm_bench']: | |
| self.prepare_dataset(output_folder) | |
| concurrency_levels = self.config['profile']['concurrency'] | |
| print( | |
| f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}" | |
| ) | |
| for concurrency in concurrency_levels: | |
| print( | |
| f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}" | |
| ) | |
| concurrency_output_folder = f"{output_folder}/{concurrency}" | |
| if not os.path.exists(concurrency_output_folder): | |
| os.makedirs(concurrency_output_folder) | |
| # Build and run the loadgen command | |
| if 'use_trtllm_bench' in self.config['profile'] and self.config[ | |
| 'profile']['use_trtllm_bench']: | |
| self.run_trtllm_bench_for_concurrency( | |
| hostname, port, concurrency, concurrency_output_folder) | |
| elif 'use_benchmark_serving' in self.config[ | |
| 'profile'] and self.config['profile'][ | |
| 'use_benchmark_serving']: | |
| pid = self.run_benchmark_serving_for_concurrency( | |
| hostname, port, concurrency, concurrency_output_folder) | |
| pid.wait() | |
| print( | |
| f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}" | |
| ) | |
| print("DEBUG JobManager: run_loadgen_tests completed") | |
| return pid | |
| def run_loadgen_tests(self, hostname, port, generation_server_urls): | |
| """Run loadgen tests on the master node.""" | |
| print( | |
| f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" | |
| ) | |
| output_folder = self.output_folder | |
| pid = None | |
| # Run loadgen tests for each concurrency level | |
| if 'use_trtllm_bench' in self.config['profile'] and self.config[ | |
| 'profile']['use_trtllm_bench']: | |
| self.prepare_dataset(output_folder) | |
| concurrency_levels = self.config['profile']['concurrency'] | |
| print( | |
| f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}" | |
| ) | |
| for concurrency in concurrency_levels: | |
| print( | |
| f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}" | |
| ) | |
| concurrency_output_folder = f"{output_folder}/{concurrency}" | |
| if not os.path.exists(concurrency_output_folder): | |
| os.makedirs(concurrency_output_folder) | |
| # Build and run the loadgen command | |
| if 'use_trtllm_bench' in self.config['profile'] and self.config[ | |
| 'profile']['use_trtllm_bench']: | |
| self.run_trtllm_bench_for_concurrency( | |
| hostname, port, concurrency, concurrency_output_folder) | |
| elif 'use_benchmark_serving' in self.config['profile'] and self.config[ | |
| 'profile']['use_benchmark_serving']: | |
| pid = self.run_benchmark_serving_for_concurrency( | |
| hostname, port, concurrency, concurrency_output_folder) | |
| pid.wait() | |
| else: | |
| print("Warning: No benchmarking method configured") | |
| print( | |
| f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}" | |
| ) | |
| print("DEBUG JobManager: run_loadgen_tests completed") | |
| return pid |
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/disagg_profiler/job_manager.py around lines 868
to 910, the variable pid is returned but only assigned inside a conditional
block, which can lead to an undefined variable error if neither condition is
met. To fix this, initialize pid to None at the start of the method and ensure
it is assigned appropriately within the conditionals, or add a fallback
assignment or error handling for the case when neither condition is true before
returning pid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)
1-10: Add required NVIDIA copyright header.According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import json import os import subprocess import time import uuid
11-25: Replace bare except with specific exception handling.The bare
exceptclause should be more specific to avoid catching unexpected exceptions.def wait_for_server(host, port, max_retries=1000, delay=5): """Wait for a server to become available.""" for _ in range(max_retries): try: print(f"Checking server status at {host}:{port}") state = requests.get(f"http://{host}:{port}/health") if state.status_code == 200: print("Server is running") return True - except: + except (requests.exceptions.RequestException, ConnectionError): time.sleep(delay) print(f"Server did not start after {max_retries} attempts") return False
382-483: Add bounds checking for server_id to prevent index errors.The
server_idis incremented in multiple places without checking if it exceeds the length ofself.node_hostnames. This could cause an IndexError.Add bounds checking before accessing
self.node_hostnames[server_id]:for i in range(self.config['exec']['config']['context']['tp']): if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.") print(f'Adding {self.node_hostnames[server_id]}') server_ids.append(self.node_hostnames[server_id])Apply similar checks in the generation and IFB server loops at lines 436-438 and 464-466.
985-1027: Fix potential undefined variable error.The variable
pidis returned at line 1026 but is only assigned conditionally. If neitheruse_trtllm_benchnoruse_benchmark_servingis true,pidwill be undefined.Initialize
pidat the beginning of the method:def run_loadgen_tests(self, hostname, port, generation_server_urls): """Run loadgen tests on the master node.""" print( f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" ) output_folder = self.output_folder + pid = None # Run loadgen tests for each concurrency levelAlso consider adding error handling if neither benchmarking method is configured:
elif 'use_benchmark_serving' in self.config[ 'profile'] and self.config['profile'][ 'use_benchmark_serving']: pid = self.run_benchmark_serving_for_concurrency( hostname, port, concurrency, concurrency_output_folder) pid.wait() + else: + print("Warning: No benchmarking method configured")
1028-1044: Add error handling for metrics collection.The method should handle potential network failures when collecting metrics from generation servers.
def record_kv_cache_stats(self, generation_server_urls, output_folder): """Record the kv cache stats for a specific concurrency level.""" print( f"DEBUG JobManager: Recording kv cache stats for generation servers" ) # Get the kv cache stats # Send a request to "/metrics" and store the output in a file for url in generation_server_urls: - response = requests.get(f"http://{url}/metrics") - with open( - os.path.join(output_folder, - f"{url.split(':')[0]}_kv_cache_stats.txt"), - "w") as f: - f.write(response.text) + try: + response = requests.get(f"http://{url}/metrics", timeout=30) + response.raise_for_status() + with open( + os.path.join(output_folder, + f"{url.split(':')[0]}_kv_cache_stats.txt"), + "w") as f: + f.write(response.text) + except (requests.exceptions.RequestException, IOError) as e: + print(f"Warning: Failed to collect metrics from {url}: {e}") print( f"DEBUG JobManager: Recorded kv cache stats for generation servers")
1124-1137: Replace bare except with specific exception handling.The bare
exceptclause should catch specific exceptions.def terminate_jobs(self, jobs): """Terminate all running jobs.""" # First, stop any nsys sessions for jobs that have nsys enabled self.stop_nsys_sessions() # Then terminate the jobs for job in jobs: if job and job.poll() is None: try: job.terminate() job.wait(timeout=5) - except: + except (subprocess.TimeoutExpired, OSError): job.kill()
🧹 Nitpick comments (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)
630-708: Well-implemented nsys session management with minor formatting issue.The method properly handles nsys session cleanup with good error handling and organization by hostname.
Remove trailing whitespace on line 708:
- print("Finished stopping nsys sessions.") - + print("Finished stopping nsys sessions.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
678-678: Line too long (148 > 120)
(E501)
708-708: Blank line contains whitespace
Remove whitespace from blank line
(W293)
754-754: Line too long (247 > 120)
(E501)
1131-1131: Do not use bare except
(E722)
🔇 Additional comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
27-71: LGTM!The function correctly calculates the number of nodes needed based on tensor and data parallelism configurations.
943-945: Good use of safe dictionary access.The code now correctly uses
.get()with a default value to safely access thegen_onlyconfiguration key, preventing potential KeyError.
1045-1066: Good implementation of dataset preparation.The method properly uses a configurable script path from command-line arguments and includes optimization to skip preparation if the dataset already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)
1-10: Add required NVIDIA copyright headerAll TensorRT-LLM source files must include an NVIDIA copyright header with the current year.
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import json import os import subprocess import time import uuid
11-25: Replace bare except with specific exception handlingCatch specific exceptions to avoid masking unexpected errors.
def wait_for_server(host, port, max_retries=1000, delay=5): """Wait for a server to become available.""" for _ in range(max_retries): try: print(f"Checking server status at {host}:{port}") state = requests.get(f"http://{host}:{port}/health") if state.status_code == 200: print("Server is running") return True - except: + except (requests.exceptions.RequestException, ConnectionError): time.sleep(delay) print(f"Server did not start after {max_retries} attempts") return False
382-483: Add bounds checking for server_id to prevent IndexErrorThe
server_idvariable is incremented and used to indexself.node_hostnameswithout verifying it's within bounds, which could cause an IndexError if insufficient nodes are allocated.Add bounds checking in the context, generation, and IFB server loops:
for i in range(self.config['exec']['config']['context']['tp']): if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes for context servers.") print(f'Adding {self.node_hostnames[server_id]}') server_ids.append(self.node_hostnames[server_id])Apply similar checks at lines 436-437 and 464-465 for generation and IFB servers.
985-1027: Fix potential undefined variable errorThe variable
pidis returned but only assigned conditionally, leading to an undefined variable error if neither benchmarking method is configured.def run_loadgen_tests(self, hostname, port, generation_server_urls): """Run loadgen tests on the master node.""" print( f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" ) output_folder = self.output_folder + pid = None # Run loadgen tests for each concurrency level if 'use_trtllm_bench' in self.config['profile'] and self.config[ 'profile']['use_trtllm_bench']: self.prepare_dataset(output_folder)
1028-1044: Add error handling for metrics collectionThe method should handle potential network failures gracefully when collecting metrics.
def record_kv_cache_stats(self, generation_server_urls, output_folder): """Record the kv cache stats for a specific concurrency level.""" print( f"DEBUG JobManager: Recording kv cache stats for generation servers" ) # Get the kv cache stats # Send a request to "/metrics" and store the output in a file for url in generation_server_urls: - response = requests.get(f"http://{url}/metrics") - with open( - os.path.join(output_folder, - f"{url.split(':')[0]}_kv_cache_stats.txt"), - "w") as f: - f.write(response.text) + try: + response = requests.get(f"http://{url}/metrics", timeout=30) + response.raise_for_status() + with open( + os.path.join(output_folder, + f"{url.split(':')[0]}_kv_cache_stats.txt"), + "w") as f: + f.write(response.text) + except (requests.exceptions.RequestException, IOError) as e: + print(f"Warning: Failed to collect metrics from {url}: {e}") print( f"DEBUG JobManager: Recorded kv cache stats for generation servers")
1124-1137: Replace bare except with specific exception handlingCatch specific exceptions to improve error handling clarity.
def terminate_jobs(self, jobs): """Terminate all running jobs.""" # First, stop any nsys sessions for jobs that have nsys enabled self.stop_nsys_sessions() # Then terminate the jobs for job in jobs: if job and job.poll() is None: try: job.terminate() job.wait(timeout=5) - except: + except (subprocess.TimeoutExpired, OSError): job.kill()
🧹 Nitpick comments (2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (2)
630-708: Remove trailing whitespaceLine 708 contains trailing whitespace that should be removed.
- + print("Finished stopping nsys sessions.")Otherwise, the nsys session management implementation looks comprehensive with proper error handling.
754-764: Break up long nsys command line for better readabilityLine 758 exceeds 120 characters and should be split for better readability.
- nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none --session={session_name}" + nsys_prefix = ( + f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" " + f"-o {nsys_file} -f true -t cuda,nvtx,python-gil " + f"-c cudaProfilerApi --cuda-graph-trace node " + f"--capture-range-end=stop --gpu-metrics-devices=none " + f"--session={session_name}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
678-678: Line too long (148 > 120)
(E501)
708-708: Blank line contains whitespace
Remove whitespace from blank line
(W293)
754-754: Line too long (247 > 120)
(E501)
1131-1131: Do not use bare except
(E722)
🔇 Additional comments (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (1)
27-71: LGTM!The node calculation logic correctly handles all server types and ensures proper rounding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)
1-10: Add required NVIDIA copyright header.According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import json import os import subprocess import time import uuid import requests import yaml
11-25: Replace bare except with specific exception handlingThe bare
exceptclause should be more specific to avoid catching unexpected exceptions.def wait_for_server(host, port, max_retries=1000, delay=5): """Wait for a server to become available.""" for _ in range(max_retries): try: print(f"Checking server status at {host}:{port}") state = requests.get(f"http://{host}:{port}/health") if state.status_code == 200: print("Server is running") return True - except: + except (requests.exceptions.RequestException, ConnectionError): time.sleep(delay) print(f"Server did not start after {max_retries} attempts") return False
382-483: Add bounds checking for server_id to prevent index errorsThe
server_idis incremented in multiple places without checking if it exceeds the length ofself.node_hostnames. This could cause an IndexError.Add bounds checking before accessing
self.node_hostnames[server_id]:for i in range(self.config['exec']['config']['context']['tp']): if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.") print(f'Adding {self.node_hostnames[server_id]}') server_ids.append(self.node_hostnames[server_id])Apply similar checks at lines 437 and 465 for generation and IFB server loops.
1044-1086: Fix potential undefined variable errorThe variable
pidis returned at line 1085 but is only assigned conditionally. If neitheruse_trtllm_benchnoruse_benchmark_servingis true,pidwill be undefined.Initialize
pidat the beginning of the method:def run_loadgen_tests(self, hostname, port, generation_server_urls): """Run loadgen tests on the master node.""" print( f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" ) output_folder = self.output_folder + pid = None # Run loadgen tests for each concurrency level if 'use_trtllm_bench' in self.config['profile'] and self.config[ 'profile']['use_trtllm_bench']: self.prepare_dataset(output_folder) concurrency_levels = self.config['profile']['concurrency'] print( f"DEBUG JobManager: Will run loadgen for concurrency levels: {concurrency_levels}" ) for concurrency in concurrency_levels: print( f"DEBUG JobManager: Starting loadgen for concurrency {concurrency}" ) concurrency_output_folder = f"{output_folder}/{concurrency}" if not os.path.exists(concurrency_output_folder): os.makedirs(concurrency_output_folder) # Build and run the loadgen command if 'use_trtllm_bench' in self.config['profile'] and self.config[ 'profile']['use_trtllm_bench']: self.run_trtllm_bench_for_concurrency( hostname, port, concurrency, concurrency_output_folder) elif 'use_benchmark_serving' in self.config[ 'profile'] and self.config['profile'][ 'use_benchmark_serving']: pid = self.run_benchmark_serving_for_concurrency( hostname, port, concurrency, concurrency_output_folder) pid.wait() + else: + print("Warning: No benchmarking method configured") print( f"DEBUG JobManager: Completed loadgen for concurrency {concurrency}" ) print("DEBUG JobManager: run_loadgen_tests completed") return pid
1087-1103: Add error handling for metrics collectionThe method should handle potential network failures when collecting metrics from generation servers.
def record_kv_cache_stats(self, generation_server_urls, output_folder): """Record the kv cache stats for a specific concurrency level.""" print( f"DEBUG JobManager: Recording kv cache stats for generation servers" ) # Get the kv cache stats # Send a request to "/metrics" and store the output in a file for url in generation_server_urls: - response = requests.get(f"http://{url}/metrics") - with open( - os.path.join(output_folder, - f"{url.split(':')[0]}_kv_cache_stats.txt"), - "w") as f: - f.write(response.text) + try: + response = requests.get(f"http://{url}/metrics", timeout=30) + response.raise_for_status() + with open( + os.path.join(output_folder, + f"{url.split(':')[0]}_kv_cache_stats.txt"), + "w") as f: + f.write(response.text) + except (requests.exceptions.RequestException, IOError) as e: + print(f"Warning: Failed to collect metrics from {url}: {e}") print( f"DEBUG JobManager: Recorded kv cache stats for generation servers")
1183-1196: Replace bare except with specific exception handlingThe bare
exceptclause should catch specific exceptions.def terminate_jobs(self, jobs): """Terminate all running jobs.""" # First, stop any nsys sessions for jobs that have nsys enabled self.stop_nsys_sessions() # Then terminate the jobs for job in jobs: if job and job.poll() is None: try: job.terminate() job.wait(timeout=5) - except: + except (subprocess.TimeoutExpired, OSError): job.kill()
🧹 Nitpick comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
89-92: Fix line length issueLine 89 exceeds the 120 character limit.
Break the debug print statement into multiple lines:
print( - f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}" + f"DEBUG JobManager: __init__ called with " + f"load_config_only={load_config_only}, override_job_id={override_job_id}" )
802-812: Break up long nsys_prefix line for better readabilityLine 806 significantly exceeds the 120 character limit (222 characters).
Split the nsys_prefix string across multiple lines:
- nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none" + nsys_prefix = ( + f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" " + f"-o {nsys_file} -f true -t cuda,nvtx,python-gil " + f"-c cudaProfilerApi --cuda-graph-trace node " + f"--capture-range-end=stop --gpu-metrics-devices=none" + )
756-760: Remove trailing whitespace from blank lineLine 758 contains whitespace characters on an otherwise blank line.
Remove the whitespace from the blank line to maintain clean formatting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
678-678: Line too long (138 > 120)
(E501)
682-682: Line too long (146 > 120)
(E501)
758-758: Blank line contains whitespace
Remove whitespace from blank line
(W293)
802-802: Line too long (222 > 120)
(E501)
1190-1190: Do not use bare except
(E722)
🔇 Additional comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)
27-71: LGTM!The node calculation logic correctly handles GPU requirements for context, generation, and IFB servers with proper rounding up.
1104-1125: Good implementation of dataset preparationThe dataset preparation logic correctly checks for existing datasets and uses the configurable script path from command-line arguments, addressing the previous concern about hardcoded paths.
1002-1006: Good use of safe dictionary accessThe code correctly uses
.get()with a default value to safely access the 'gen_only' configuration key, preventing potential KeyError exceptions.
630-712: Well-implemented nsys session managementThe
stop_nsys_sessionsmethod is well-structured with:
- Efficient grouping of jobs by hostname to minimize srun calls
- Proper error handling with specific exception types
- Clear logging for debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)
1-10: Add required NVIDIA copyright header.According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import json import os import subprocess import time import uuid
11-25: Replace bare except with specific exception handlingThe bare
exceptclause should be more specific to avoid catching unexpected exceptions.def wait_for_server(host, port, max_retries=1000, delay=5): """Wait for a server to become available.""" for _ in range(max_retries): try: print(f"Checking server status at {host}:{port}") state = requests.get(f"http://{host}:{port}/health") if state.status_code == 200: print("Server is running") return True - except: + except (requests.exceptions.RequestException, ConnectionError): time.sleep(delay) print(f"Server did not start after {max_retries} attempts") return False
380-481: Add bounds checking for server_id to prevent index errorsThe
server_idis incremented in multiple places without checking if it exceeds the length ofself.node_hostnames. This could cause an IndexError.Add bounds checking before accessing
self.node_hostnames[server_id]:for i in range(self.config['exec']['config']['context']['tp']): if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.") print(f'Adding {self.node_hostnames[server_id]}') server_ids.append(self.node_hostnames[server_id])Apply similar checks in the generation and IFB server loops.
873-915: Fix potential undefined variable errorThe variable
pidis returned at line 914 but is only assigned conditionally. If neitheruse_trtllm_benchnoruse_benchmark_servingis true,pidwill be undefined.Initialize
pidat the beginning of the method:def run_loadgen_tests(self, hostname, port, generation_server_urls): """Run loadgen tests on the master node.""" print( f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" ) output_folder = self.output_folder + pid = None # Run loadgen tests for each concurrency level
916-932: Add error handling for metrics collectionThe method should handle potential network failures when collecting metrics from generation servers.
def record_kv_cache_stats(self, generation_server_urls, output_folder): """Record the kv cache stats for a specific concurrency level.""" print( f"DEBUG JobManager: Recording kv cache stats for generation servers" ) # Get the kv cache stats # Send a request to "/metrics" and store the output in a file for url in generation_server_urls: - response = requests.get(f"http://{url}/metrics") - with open( - os.path.join(output_folder, - f"{url.split(':')[0]}_kv_cache_stats.txt"), - "w") as f: - f.write(response.text) + try: + response = requests.get(f"http://{url}/metrics", timeout=30) + response.raise_for_status() + with open( + os.path.join(output_folder, + f"{url.split(':')[0]}_kv_cache_stats.txt"), + "w") as f: + f.write(response.text) + except (requests.exceptions.RequestException, IOError) as e: + print(f"Warning: Failed to collect metrics from {url}: {e}")
1012-1023: Replace bare except with specific exception handlingThe bare
exceptclause should catch specific exceptions.def terminate_jobs(self, jobs): """Terminate all running jobs.""" # Then terminate the jobs for job in jobs: if job and job.poll() is None: try: job.terminate() job.wait(timeout=5) - except: + except (subprocess.TimeoutExpired, OSError): job.kill()
🧹 Nitpick comments (4)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (4)
89-92: Consider splitting long print statement for better readabilityLine 91 exceeds the recommended line length. Consider splitting it:
- print( - f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}" - ) + print( + f"DEBUG JobManager: __init__ called with " + f"load_config_only={load_config_only}, override_job_id={override_job_id}" + )
144-201: Consider extracting folder name generation into helper methodsThe
create_output_foldermethod has complex logic for generating folder names. Consider breaking it down into smaller helper methods for better maintainability:def _get_model_name(self): """Extract model name from path.""" model_name = self.config['exec']['model_path'].split('/')[-1] if model_name == '': model_name = self.config['exec']['model_path'].split('/')[-2] return model_name def _build_parallelism_suffix(self): """Build suffix based on parallelism configuration.""" # Extract TP/DP values for each server type # Return formatted string def _build_feature_suffix(self): """Build suffix based on enabled features.""" # Check for overlap, cuda_graph, adp, etc. # Return formatted stringThis would make the code more modular and easier to test.
623-627: Remove trailing whitespaceLine 623 contains trailing whitespace that should be removed:
- + self.terminate_jobs(self.context_jobs + self.generation_jobs + self.ifb_jobs + self.disagg_jobs)
672-677: Split long nsys_prefix line for better readabilityLine 676 is too long (222 characters). Split it for better readability:
- nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none" + nsys_prefix = ( + f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" " + f"-o {nsys_file} -f true -t cuda,nvtx,python-gil " + f"-c cudaProfilerApi --cuda-graph-trace node " + f"--capture-range-end=stop --gpu-metrics-devices=none" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: applies to **/*.py : when using try-except blocks in python, limit the except to the smallest set of...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
627-627: Blank line contains whitespace
Remove whitespace from blank line
(W293)
672-672: Line too long (222 > 120)
(E501)
1016-1016: Do not use bare except
(E722)
🔇 Additional comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
27-71: LGTM!The
calculate_nodes_neededfunction correctly calculates the required number of nodes based on GPU requirements for context, generation, and IFB servers, with proper rounding and minimum node guarantee.
1024-1044: Well-implemented resource managementThe context manager methods and
save_config_to_output_folderare properly implemented with appropriate cleanup and error handling.
1045-1059: LGTM!The
prepare_and_run_single_configmethod properly orchestrates configuration file creation, saves the complete configuration, and manages node hostname retrieval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (6)
1-10: Add required NVIDIA copyright header.According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import json import os import subprocess import time import uuid
11-25: Replace bare except with specific exception handling.The bare
exceptclause should be more specific to avoid catching unexpected exceptions.def wait_for_server(host, port, max_retries=1000, delay=5): """Wait for a server to become available.""" for _ in range(max_retries): try: print(f"Checking server status at {host}:{port}") state = requests.get(f"http://{host}:{port}/health") if state.status_code == 200: print("Server is running") return True - except: + except (requests.exceptions.RequestException, ConnectionError): time.sleep(delay) print(f"Server did not start after {max_retries} attempts") return False
380-481: Add bounds checking for server_id to prevent index errors.The
server_idis incremented in multiple places without checking if it exceeds the length ofself.node_hostnames. This could cause an IndexError.Add bounds checking before accessing
self.node_hostnames[server_id]:for i in range(self.config['exec']['config']['context']['tp']): if i != 0 and i % self.num_gpus == 0: server_id += 1 + if server_id >= len(self.node_hostnames): + raise ValueError(f"Not enough nodes allocated. Need at least {server_id + 1} nodes.") print(f'Adding {self.node_hostnames[server_id]}') server_ids.append(self.node_hostnames[server_id])Apply similar checks in the generation server loop (lines 433-436) and IFB server loop (lines 462-464).
873-915: Fix potential undefined variable error.The variable
pidis returned at line 914 but is only assigned conditionally. If neitheruse_trtllm_benchnoruse_benchmark_servingis true,pidwill be undefined.Initialize
pidat the beginning of the method:def run_loadgen_tests(self, hostname, port, generation_server_urls): """Run loadgen tests on the master node.""" print( f"DEBUG JobManager: run_loadgen_tests called with hostname={hostname}, port={port}" ) output_folder = self.output_folder + pid = None # Run loadgen tests for each concurrency level
916-932: Add error handling for metrics collection.The method should handle potential network failures when collecting metrics from generation servers.
def record_kv_cache_stats(self, generation_server_urls, output_folder): """Record the kv cache stats for a specific concurrency level.""" print( f"DEBUG JobManager: Recording kv cache stats for generation servers" ) # Get the kv cache stats # Send a request to "/metrics" and store the output in a file for url in generation_server_urls: - response = requests.get(f"http://{url}/metrics") - with open( - os.path.join(output_folder, - f"{url.split(':')[0]}_kv_cache_stats.txt"), - "w") as f: - f.write(response.text) + try: + response = requests.get(f"http://{url}/metrics", timeout=30) + response.raise_for_status() + with open( + os.path.join(output_folder, + f"{url.split(':')[0]}_kv_cache_stats.txt"), + "w") as f: + f.write(response.text) + except (requests.exceptions.RequestException, IOError) as e: + print(f"Warning: Failed to collect metrics from {url}: {e}") print( f"DEBUG JobManager: Recorded kv cache stats for generation servers")
1012-1023: Replace bare except with specific exception handling.The bare
exceptclause should catch specific exceptions.def terminate_jobs(self, jobs): """Terminate all running jobs.""" # Then terminate the jobs for job in jobs: if job and job.poll() is None: try: job.terminate() job.wait(timeout=5) - except: + except (subprocess.TimeoutExpired, OSError): job.kill()
🧹 Nitpick comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
89-92: Format long line for better readability.Line 91 exceeds the 120 character limit.
Split the debug message across multiple lines:
print( - f"DEBUG JobManager: __init__ called with load_config_only={load_config_only}, override_job_id={override_job_id}" + f"DEBUG JobManager: __init__ called with " + f"load_config_only={load_config_only}, override_job_id={override_job_id}" )
627-627: Remove whitespace from blank line.- +
675-677: Split long line for better readability.Line 676 exceeds the 120 character limit significantly.
nsys_file = os.path.join(self.output_folder, f"{log_file}.$SLURM_PROCID.nsys-rep") - nsys_prefix = f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" -o {nsys_file} -f true -t cuda,nvtx,python-gil -c cudaProfilerApi --cuda-graph-trace node --capture-range-end=stop --gpu-metrics-devices=none" + nsys_prefix = ( + f"nsys profile -e \"NSYS_MPI_STORE_TEAMS_PER_RANK=1\" " + f"-o {nsys_file} -f true -t cuda,nvtx,python-gil " + f"-c cudaProfilerApi --cuda-graph-trace node " + f"--capture-range-end=stop --gpu-metrics-devices=none" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/disaggregated/slurm/disagg_profiler/job_manager.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
📚 Learning: 2025-08-06T21:22:55.018Z
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-06T21:22:55.018Z
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-06T21:22:55.018Z
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T21:22:55.018Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/disaggregated/slurm/disagg_profiler/job_manager.py
🪛 Ruff (0.12.2)
examples/disaggregated/slurm/disagg_profiler/job_manager.py
20-20: Do not use bare except
(E722)
89-89: Line too long (124 > 120)
(E501)
627-627: Blank line contains whitespace
Remove whitespace from blank line
(W293)
672-672: Line too long (222 > 120)
(E501)
1016-1016: Do not use bare except
(E722)
🔇 Additional comments (3)
examples/disaggregated/slurm/disagg_profiler/job_manager.py (3)
27-71: LGTM!The function correctly calculates the number of nodes needed based on tensor and data parallelism configurations.
1024-1030: Good implementation of context manager protocol.The context manager methods properly handle resource cleanup by terminating all jobs on exit.
1031-1059: Well-structured configuration management and job orchestration.The methods properly handle configuration saving and job execution workflow with appropriate error handling.
|
@Tabrizian should this PR be closed? |
|
Yes |
Add disagg launcher scripts
Add scripts for launching disaggregated serving jobs on SLURM.
Summary by CodeRabbit
New Features
Documentation
Chores