-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[https://nvbugs/5410279][test] resubmit timeout refactor #6337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[https://nvbugs/5410279][test] resubmit timeout refactor #6337
Conversation
📝 WalkthroughWalkthroughTimeout management was integrated into the integration test framework. A Changes
Sequence Diagram(s)sequenceDiagram
participant pytest
participant Fixture as timeout_manager fixture
participant Test as Test Function
participant Timeout as TimeoutManager
participant Subprocess as Subprocess Call
pytest->>Fixture: Request timeout_manager
Fixture->>Timeout: Instantiate TimeoutManager (from marker/CLI)
pytest->>Test: Inject timeout_manager
loop For each major test phase
Test->>Timeout: Enter timed_operation("phase")
Timeout->>Subprocess: Call with timeout=remaining_timeout
Subprocess-->>Timeout: Return result or raise on timeout
Timeout->>Test: Exit timed_operation (update remaining timeout)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
/bot run --stage-list "A10-PyTorch-1, H100_PCIe-PyTorch-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
tests/integration/defs/conftest.py (1)
371-384: Fix the priority comment to match the actual implementation.The comment states "command line > marker > config" but the implementation prioritizes marker over command line (lines 379-382). This inconsistency could confuse future maintainers.
Apply this diff to correct the comment:
- """Create a TimeoutManager instance with priority: command line > marker > config.""" + """Create a TimeoutManager instance with priority: marker > command line."""Alternatively, if the intended priority is command line over marker, update the implementation:
- if timeout_from_marker is not None: + if timeout_from_command_line is not None: + timeout_value = timeout_from_command_line + elif timeout_from_marker is not None: timeout_value = timeout_from_marker - elif timeout_from_command_line is not None: - timeout_value = timeout_from_command_line
🧹 Nitpick comments (2)
tests/integration/defs/examples/test_gpt.py (1)
640-641: Good integration of timeout management parameter.The addition of
timeout_managerparameter aligns well with the PR's timeout management objectives.Note: The docstring on line 642 should end with a period per static analysis hint D415.
- "Build & Run GPT-3 175B: 96 layer w/ plugins" + "Build & Run GPT-3 175B: 96 layer w/ plugins."tests/integration/defs/accuracy/accuracy_core.py (1)
704-756: Excellent timeout management integration with backward compatibility.The implementation effectively integrates timeout management while maintaining backward compatibility. The conditional approach cleanly separates timeout-managed execution from the original behavior, and each phase is properly wrapped with descriptive operation names.
The parameter forwarding is consistent across both execution paths, ensuring no functionality is lost when timeout management is disabled.
Fix the docstring formatting per static analysis hint D205:
def run(self, tasks: Optional[List[AccuracyTask]] = None, dtype: str = 'auto', quant_algo: Optional[str] = None, kv_cache_quant_algo: Optional[str] = None, spec_dec_algo: Optional[str] = None, extra_acc_spec: Optional[str] = None, tp_size: int = 1, pp_size: int = 1, cp_size: int = 1, extra_convert_args: Optional[list] = None, extra_build_args: Optional[list] = None, extra_summarize_args: Optional[list] = None, extra_eval_long_context_args: Optional[list] = None, env: Optional[Dict[str, str]] = None, timeout_manager=None): """ Run all accuracy test phases with timeout management. + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/utils/__init__.pytests/integration/defs/common.pytests/integration/defs/examples/test_commandr.pytests/integration/defs/examples/test_exaone.pytests/integration/defs/accuracy/test_cli_flow.pytests/integration/defs/conftest.pytests/integration/defs/examples/test_llama.pytests/integration/defs/examples/test_gpt.pytests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/utils/timeout_manager.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. The block should be prepended to the top of all files, including .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/utils/__init__.pytests/integration/defs/common.pytests/integration/defs/examples/test_commandr.pytests/integration/defs/examples/test_exaone.pytests/integration/defs/accuracy/test_cli_flow.pytests/integration/defs/conftest.pytests/integration/defs/examples/test_llama.pytests/integration/defs/examples/test_gpt.pytests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/utils/timeout_manager.py
🪛 Ruff (0.12.2)
tests/integration/defs/examples/test_gpt.py
640-640: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
🔇 Additional comments (38)
tests/integration/defs/utils/__init__.py (1)
1-28: LGTM! Clean package initialization following Python best practices.The package initialization is well-structured with proper copyright header, clear documentation, and explicit API definition via
__all__. The imports from the timeout_manager submodule are clean and the module serves its intended purpose as a utility package for the timeout management system.tests/integration/test_lists/qa/examples_test_list.txt (2)
18-18: Timeout annotation added for commandr test.The 120-second timeout annotation aligns with the broader timeout management system being introduced to prevent test hangs during intensive workloads.
22-31: Consistent timeout annotations for enc_dec tests.All enc_dec tests now have uniform 90-second timeout annotations, providing structured timeout handling that aligns with the new timeout management framework. The timeout duration appears appropriate for these test types.
tests/integration/defs/accuracy/test_cli_flow.py (2)
1158-1166: Timeout management integration for fp8_tp2pp2 test.The addition of the
timeout_managerparameter and its forwarding toself.run()properly integrates structured timeout handling without affecting the core test logic. This aligns with the broader timeout management framework being introduced.
1176-1189: Timeout management integration for int8_plugin_tp8 test.The
timeout_managerparameter addition follows the same pattern as other tests, enabling structured timeout handling for this resource-intensive test method. The implementation is clean and maintains the existing test functionality.tests/integration/defs/examples/test_exaone.py (8)
36-36: LGTM! Consistent timeout management integration.The
timeout_managerparameter has been properly added to both test functions, enabling structured timeout handling for TensorRT flow test cases as intended by the PR objectives.
42-53: Effective timeout management for weight conversion.The weight conversion operation is properly wrapped with
timeout_manager.timed_operation("convert")and the remaining timeout is correctly passed to theconvert_weightsfunction. This ensures proper timeout tracking and enforcement during the conversion phase.
56-66: Consistent timeout handling for engine building.The engine building phase is correctly wrapped with the timeout manager using the "build" label, and
timeout_manager.remaining_timeoutis properly passed to thecheck_callfunction. This maintains consistency with the overall timeout management pattern.
88-91: Proper timeout management for summary execution.The summary execution is appropriately wrapped with
timeout_manager.timed_operation("summary")and the timeout is correctly forwarded tovenv_check_call. This completes the timeout management for all three critical phases of the test.
103-104: Consistent parameter handling across test functions.The second test function
test_llm_exaone_2gpucorrectly includes thetimeout_managerparameter, maintaining consistency with the first function. The parameter positioning is appropriate.
111-123: Proper timeout management implementation for 2GPU test.The weight conversion in the 2GPU test follows the same timeout management pattern as the 1GPU test, with proper context manager usage and timeout parameter forwarding. The implementation is consistent and correct.
126-134: Consistent engine building timeout handling.The engine building phase in the 2GPU test properly implements the same timeout management pattern, using the "build" label and forwarding the remaining timeout to the subprocess call.
150-154: Appropriate timeout management for MPI execution.The summary execution for the 2GPU test correctly uses
venv_mpi_check_callwith proper timeout forwarding, maintaining consistency with the timeout management pattern while using the appropriate MPI-enabled function for multi-GPU execution.tests/integration/defs/common.py (8)
46-46: **Proper kwargs forwarding for MPI functions.The addition of
**kwargsparameter tovenv_mpi_check_callenables passing additional parameters (including timeout) through the MPI wrapper function. This change aligns with the timeout management requirements.
63-63: Consistent kwargs forwarding implementation.The
venv.run_cmdcall now properly forwards the**kwargs, ensuring that timeout and other parameters can be passed through to the underlying command execution.
66-66: Matching kwargs support for MPI output function.The
venv_mpi_check_outputfunction has been updated with**kwargsparameter, maintaining consistency with thevenv_mpi_check_callfunction and enabling timeout support for MPI output operations.
83-83: Proper kwargs forwarding in MPI output function.The
venv.run_cmdcall invenv_mpi_check_outputcorrectly forwards the**kwargs, ensuring consistent parameter handling across both MPI wrapper functions.
508-508: Clean timeout parameter extraction.The timeout parameter is properly extracted from kwargs using
pop(), which removes it from the kwargs dictionary before processing other parameters. This prevents the timeout from being passed as a command-line argument.
518-518: Correct timeout parameter forwarding.The timeout parameter is properly passed to
venv_check_callas a keyword argument, enabling timeout enforcement during weight conversion operations. The implementation correctly handles the case where timeout might be None.
610-610: Consistent timeout handling in quantize_data.The
quantize_datafunction follows the same timeout extraction pattern asconvert_weights, maintaining consistency across the codebase for timeout parameter handling.
621-621: Proper timeout forwarding in quantization.The timeout parameter is correctly passed to
venv_check_callduring quantization operations, ensuring that timeout constraints are enforced consistently across different operations.tests/integration/defs/conftest.py (2)
352-358: LGTM! Clean implementation of timeout marker extraction.The fixture correctly extracts timeout values from pytest markers with proper error handling.
361-368: LGTM! Proper command line timeout extraction.The fixture correctly retrieves and converts timeout values from command line arguments.
tests/integration/defs/examples/test_commandr.py (5)
88-88: LGTM! Proper timeout manager integration.The function signature correctly adds the
timeout_managerfixture parameter for structured timeout handling.
94-106: LGTM! Proper timeout management for checkpoint conversion.The timeout context manager is correctly implemented with appropriate timeout passing to the subprocess call.
129-133: LGTM! Consistent timeout management for engine building.The timeout context manager is properly applied to the engine build operation with correct timeout parameter passing.
137-142: LGTM! Proper timeout management for engine execution.The timeout context manager correctly wraps the MPI execution call with appropriate timeout handling.
154-159: LGTM! Complete timeout management for summary execution.The final timeout context manager properly handles the summary operation, completing the comprehensive timeout management for all test phases.
tests/integration/defs/examples/test_gpt.py (2)
645-656: Excellent timeout management implementation for checkpoint conversion.The code correctly wraps the conversion phase with
timeout_manager.timed_operationand passes the remaining timeout to the subprocess call. This ensures proper resource cleanup and timeout enforcement during the conversion phase.
658-702: Well-structured timeout management for build and inference phases.The implementation properly wraps both the build and inference phases with timeout management. The explicit handling of
context_fmhalogic (lines 670-675) is a good improvement that makes the conditional behavior clearer whenuse_attention_pluginis true butcontext_fmhais false.All subprocess calls correctly receive the
remaining_timeoutparameter, ensuring consistent timeout enforcement across the test workflow.tests/integration/defs/examples/test_llama.py (2)
3028-3100: LGTM! Excellent timeout management implementation.The timeout management integration is well-structured and comprehensive:
- Function signature properly updated to accept
timeout_managerparameter- All major operations (dataset generation, checkpoint conversion, engine building, evaluation) are wrapped in
timeout_manager.timed_operation()context managers- Subprocess calls consistently receive
timeout=timeout_manager.remaining_timeoutto enforce time limits- The structured approach ensures proper resource cleanup and prevents test hangs during intensive workloads
The implementation follows the established pattern and aligns with the PR objectives for improving test stability.
3400-3506: LGTM! Consistent and thorough timeout management implementation.The timeout management integration is well-executed:
- Function signature correctly updated to include
timeout_managerparameter- All subprocess operations (convert weights, build engines, dataset generation, evaluation, MMLU testing) are properly wrapped in
timeout_manager.timed_operation()contexts- Consistent application of
timeout=timeout_manager.remaining_timeoutacross all subprocess calls- The implementation maintains the same high-quality pattern established in the codebase
The changes align perfectly with the PR objectives to prevent test hangs and improve overall test completion reliability.
tests/integration/defs/utils/timeout_manager.py (6)
1-19: LGTM! Proper copyright header and imports.The copyright header follows the required NVIDIA format and the imports are appropriate for the timeout management functionality.
21-40: Well-designed class with proper initialization.The
TimeoutManagerclass follows Python naming conventions and properly initializes all attributes. The constructor design with optional timeout is flexible for different use cases.
41-68: Core timeout management methods are well-implemented.The property getter, reset method, and timeout checking logic are correctly implemented with appropriate error handling and clear documentation.
69-95: Excellent context manager implementation.The
timed_operationcontext manager is well-designed with proper error handling using try/finally. It correctly handles both timeout and no-timeout scenarios, and ensures accurate time tracking.
96-139: Useful convenience methods for timeout execution.Both
execute_with_timeoutandcall_with_timeoutprovide clean interfaces for wrapping function calls with timeout tracking. The implementation correctly delegates to thetimed_operationcontext manager.
141-152: Simple and effective factory function.The
create_timeout_managerfactory function provides a clean interface for creating TimeoutManager instances from pytest marker values.
|
PR_Github #12889 [ run ] triggered by Bot |
|
PR_Github #12889 [ run ] completed with state |
f9cfd2f to
6b525d6
Compare
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/accuracy_core.py (1)
706-709: Fix docstring formatting.The docstring needs a blank line between the summary and description according to Python docstring conventions.
- """ - Run all accuracy test phases with timeout management. - If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. - """ + """ + Run all accuracy test phases with timeout management. + + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- tests/integration/test_lists/qa/examples_test_list.txt
- tests/integration/defs/accuracy/test_cli_flow.py
- tests/integration/defs/common.py
- tests/integration/defs/conftest.py
- tests/integration/defs/utils/init.py
- tests/integration/defs/examples/test_exaone.py
- tests/integration/defs/examples/test_llama.py
- tests/integration/defs/utils/timeout_manager.py
- tests/integration/defs/examples/test_commandr.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. The block should be prepended to the top of all files, including .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
🪛 Ruff (0.12.2)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
tests/integration/defs/examples/test_gpt.py
640-640: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/integration/defs/examples/test_gpt.py (4)
640-641: LGTM: Clean timeout management integration.The function signature appropriately adds the
timeout_managerparameter, enabling structured timeout handling for this test case.
645-656: LGTM: Proper timeout management for checkpoint conversion.The convert phase is correctly wrapped with timeout management, using a descriptive operation name and properly passing the remaining timeout to the subprocess call.
660-691: LGTM: Enhanced build phase with timeout management and improved logic.The build phase correctly integrates timeout management and improves the
context_fmhaflag handling logic. The explicit conditional handling ensures proper flag values are passed based on the parameter combinations.
695-702: LGTM: Proper timeout management for inference execution.The inference run phase is correctly wrapped with timeout management, using a descriptive operation name and properly passing the remaining timeout to the MPI subprocess call.
tests/integration/defs/accuracy/accuracy_core.py (2)
704-705: LGTM: Clean method signature extension.The addition of the optional
timeout_managerparameter with a default value maintains backward compatibility while enabling timeout management functionality.
710-756: LGTM: Correct timeout management implementation.The conditional timeout management logic correctly wraps each phase when a timeout manager is provided, while maintaining the original behavior as a fallback. The phase names are descriptive and match the actual operations.
Note: There's code duplication between the two branches, but this explicit approach provides clarity about the timeout management behavior.
|
PR_Github #12905 [ run ] triggered by Bot |
|
PR_Github #12905 [ run ] completed with state |
6b525d6 to
11d7b3d
Compare
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/accuracy_core.py (1)
706-708: Fix docstring formatting to comply with PEP 257.The docstring needs a blank line between the summary line and the description.
- """ - Run all accuracy test phases with timeout management. - If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. - """ + """ + Run all accuracy test phases with timeout management. + + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/defs/utils/init.py
🚧 Files skipped from review as they are similar to previous changes (8)
- tests/integration/defs/accuracy/test_cli_flow.py
- tests/integration/defs/utils/timeout_manager.py
- tests/integration/defs/examples/test_exaone.py
- tests/integration/defs/common.py
- tests/integration/defs/examples/test_commandr.py
- tests/integration/test_lists/qa/examples_test_list.txt
- tests/integration/defs/examples/test_llama.py
- tests/integration/defs/conftest.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
🪛 Ruff (0.12.2)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
tests/integration/defs/examples/test_gpt.py
640-640: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/integration/defs/examples/test_gpt.py (4)
640-641: LGTM! Function signature correctly updated for timeout management.The addition of the
timeout_managerparameter enables structured timeout handling for this comprehensive GPT-3 175B test.
645-657: Excellent timeout management implementation for checkpoint conversion.The conversion phase is properly wrapped with timeout tracking, and the remaining timeout is correctly passed to the subprocess call.
660-691: Well-structured timeout management and build logic.The build phase correctly:
- Wraps the operation with timeout tracking
- Explicitly handles
context_fmhaconfiguration based on theuse_attention_pluginflag- Passes the remaining timeout to the subprocess call
The conditional logic properly disables
context_fmhawhenuse_attention_pluginis false, and explicitly sets it when the plugin is enabled.
695-702: Proper timeout management for MPI-based inference execution.The inference phase correctly wraps the MPI execution with timeout tracking and passes the remaining timeout to the multi-process subprocess call.
tests/integration/defs/accuracy/accuracy_core.py (1)
704-757: Excellent timeout management implementation with proper fallback.The conditional timeout management design is well-structured:
- Maintains backward compatibility with optional
timeout_managerparameter- Consistently wraps all major phases (
install_requirements,initialize_case,convert,build,evaluate) with timeout tracking- Provides clear fallback behavior when timeout management isn't needed
- Uses descriptive operation names for better debugging
This implementation enables granular timeout control across the entire accuracy test workflow.
|
PR_Github #13129 [ run ] triggered by Bot |
|
PR_Github #13129 [ run ] completed with state |
11d7b3d to
2e8ee03
Compare
|
/bot reuse-pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/accuracy_core.py (1)
704-709: Good timeout management integration with minor docstring formatting issue.The method signature correctly adds optional timeout management support while maintaining backward compatibility. The implementation approach is sound.
Fix the docstring formatting to comply with PEP 257:
- """ - Run all accuracy test phases with timeout management. - If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. - """ + """Run all accuracy test phases with timeout management. + + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/defs/utils/init.py
🚧 Files skipped from review as they are similar to previous changes (8)
- tests/integration/defs/accuracy/test_cli_flow.py
- tests/integration/defs/common.py
- tests/integration/defs/conftest.py
- tests/integration/test_lists/qa/examples_test_list.txt
- tests/integration/defs/examples/test_commandr.py
- tests/integration/defs/examples/test_exaone.py
- tests/integration/defs/utils/timeout_manager.py
- tests/integration/defs/examples/test_llama.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
🪛 Ruff (0.12.2)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
tests/integration/defs/examples/test_gpt.py
640-640: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
🔇 Additional comments (5)
tests/integration/defs/examples/test_gpt.py (4)
640-641: LGTM: Function signature updated correctly.The
timeout_managerparameter has been properly added to integrate structured timeout management into this test function.
645-657: Well-implemented timeout management for checkpoint conversion.The convert phase is properly wrapped with
timeout_manager.timed_operation("convert")context manager, and the subprocess call correctly receives the remaining timeout parameter.
660-692: Excellent timeout management and logic improvement.The build phase is properly wrapped with timeout management, and the
--context_fmhalogic has been improved for better clarity. The explicit handling of the case whereuse_attention_pluginis true butcontext_fmhais false makes the build argument logic more readable and maintainable.
695-703: Proper timeout management for inference run phase.The run phase is correctly wrapped with
timeout_manager.timed_operation("run"), and the MPI subprocess call properly receives the remaining timeout parameter while preserving the original execution structure.tests/integration/defs/accuracy/accuracy_core.py (1)
710-757: Excellent backward-compatible timeout management implementation.The conditional approach maintains backward compatibility while enabling structured timeout management. Each phase is properly wrapped with descriptive operation names, and the fallback path preserves existing behavior. The code duplication between the two paths is justified for clarity and maintainability.
2e8ee03 to
65243ea
Compare
|
/bot reuse-pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/integration/defs/utils/timeout_manager.py (1)
155-184: The previous review suggestions still apply.The decorator implementation still has the same issues identified in the previous review: functools import should be at module level, missing error handling for
timeout_from_marker, and the decorator assumes the decorated function accepts atimeout_managerparameter without verification.
🧹 Nitpick comments (3)
tests/integration/defs/examples/test_gpt.py (1)
638-641: Fix docstring punctuation and approve timeout parameter addition.The addition of the
timeout_managerparameter is well-integrated with the timeout management system. However, the docstring should end with proper punctuation.- "Build & Run GPT-3 175B: 96 layer w/ plugins" + "Build & Run GPT-3 175B: 96 layer w/ plugins."tests/integration/defs/accuracy/accuracy_core.py (1)
690-709: Fix docstring formatting and approve timeout parameter addition.The addition of the
timeout_managerparameter with proper default value maintains backward compatibility while enabling timeout management. However, the docstring formatting needs improvement.def run(self, tasks: Optional[List[AccuracyTask]] = None, dtype: str = 'auto', quant_algo: Optional[str] = None, kv_cache_quant_algo: Optional[str] = None, spec_dec_algo: Optional[str] = None, extra_acc_spec: Optional[str] = None, tp_size: int = 1, pp_size: int = 1, cp_size: int = 1, extra_convert_args: Optional[list] = None, extra_build_args: Optional[list] = None, extra_summarize_args: Optional[list] = None, extra_eval_long_context_args: Optional[list] = None, env: Optional[Dict[str, str]] = None, timeout_manager=None): """ Run all accuracy test phases with timeout management. + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. """tests/integration/defs/utils/timeout_manager.py (1)
56-67: Improve timeout validation logic.The current check only validates
<= 0, but this could be confusing if the remaining timeout becomes significantly negative due to long operations. Consider checking for negative values and providing more informative error messages.def check_timeout(self, phase_name: str = "operation") -> None: """ Check if timeout has been exceeded and raise TimeoutError if so. Args: phase_name: Name of the current phase for error message. Raises: TimeoutError: If timeout has been exceeded. """ - if self._remaining_timeout is not None and self._remaining_timeout <= 0: - raise TimeoutError(f"Timeout exceeded after {phase_name} phase!") + if self._remaining_timeout is not None and self._remaining_timeout <= 0: + exceeded_by = abs(self._remaining_timeout) + raise TimeoutError(f"Timeout exceeded by {exceeded_by:.2f}s after {phase_name} phase!")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/defs/utils/init.py
🚧 Files skipped from review as they are similar to previous changes (7)
- tests/integration/defs/accuracy/test_cli_flow.py
- tests/integration/defs/examples/test_commandr.py
- tests/integration/defs/examples/test_exaone.py
- tests/integration/test_lists/qa/examples_test_list.txt
- tests/integration/defs/examples/test_llama.py
- tests/integration/defs/conftest.py
- tests/integration/defs/common.py
🧰 Additional context used
🪛 Ruff (0.12.2)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
tests/integration/defs/examples/test_gpt.py
640-640: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (7)
tests/integration/defs/examples/test_gpt.py (3)
645-657: LGTM: Proper timeout management for checkpoint conversion.The checkpoint conversion phase is correctly wrapped with timeout management, ensuring that the operation respects the remaining timeout and updates the manager's state accordingly.
659-691: LGTM: Improved build phase with timeout management and better context_fmha logic.The build phase correctly implements timeout management, and the explicit handling of
context_fmha=disablewhenuse_attention_plugin=Truebutcontext_fmha=Falseimproves the logic clarity.
693-702: LGTM: Proper timeout management for inference run phase.The inference run phase is correctly wrapped with timeout management, ensuring that the MPI-based execution respects the remaining timeout while maintaining the original command structure.
tests/integration/defs/accuracy/accuracy_core.py (2)
710-736: LGTM: Well-structured timeout management implementation.The timeout-managed execution path properly wraps each phase in
timeout_manager.timed_operationcontexts, enabling precise timeout tracking and enforcement across all accuracy test phases. The parameter passing is complete and correct.
737-756: LGTM: Proper fallback implementation for backward compatibility.The fallback execution path maintains the original sequential behavior when no timeout manager is provided, ensuring backward compatibility while keeping the implementation clean and straightforward.
tests/integration/defs/utils/timeout_manager.py (2)
69-94: LGTM! Well-designed context manager.The
timed_operationcontext manager properly handles both timeout and no-timeout scenarios, correctly updates the remaining timeout in thefinallyblock, and provides clear error reporting. The design ensures that timeout tracking occurs regardless of whether the operation succeeds or fails.
141-152: LGTM! Clean factory function.The factory function provides a clean abstraction for creating TimeoutManager instances and follows the expected pattern for pytest marker integration.
|
PR_Github #13172 [ reuse-pipeline ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/integration/defs/utils/timeout_manager.py (2)
30-39: Add validation for non-negative timeout values.The constructor should validate that
initial_timeoutis non-negative when provided to prevent unexpected behavior.def __init__(self, initial_timeout: Optional[float] = None): """ Initialize the timeout manager. Args: initial_timeout: Initial timeout value in seconds. If None, no timeout is enforced. """ + if initial_timeout is not None and initial_timeout < 0: + raise ValueError("initial_timeout must be non-negative") self._initial_timeout = initial_timeout self._remaining_timeout = initial_timeout self._start_time = None
155-184: Improve the decorator implementation for better robustness.The decorator has several areas for improvement:
- The
functoolsimport should be at module level, not inside the function- Missing error handling when
timeout_from_markeris not in kwargs- The decorator assumes the decorated function accepts a
timeout_managerparameterConsider this improved implementation:
+import functools + # Convenience decorator for test functions def with_timeout_management(func: Callable) -> Callable: """ Decorator to automatically inject timeout management into test functions. - This decorator expects the test function to have a 'timeout_from_marker' parameter + This decorator expects the test function to have 'timeout_from_marker' and 'timeout_manager' parameters and automatically creates a TimeoutManager instance. Args: func: The test function to decorate. Returns: The decorated function. """ - import functools @functools.wraps(func) def wrapper(*args, **kwargs): # Extract timeout_from_marker from kwargs - timeout_from_marker = kwargs.get('timeout_from_marker') + timeout_from_marker = kwargs.get('timeout_from_marker') + + # Only inject timeout_manager if the function expects it + import inspect + sig = inspect.signature(func) + if 'timeout_manager' not in sig.parameters: + return func(*args, **kwargs) # Create timeout manager timeout_manager = create_timeout_manager(timeout_from_marker) # Add timeout_manager to kwargs kwargs['timeout_manager'] = timeout_manager return func(*args, **kwargs) return wrapper
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/accuracy_core.py (1)
704-756: LGTM! Well-structured timeout management implementation with backward compatibility.The method correctly implements conditional timeout management:
- Each phase (
install_requirements,initialize_case,convert,build,evaluate) is wrapped intimed_operationcontexts whentimeout_manageris provided- Maintains backward compatibility with a fallback to original sequential execution
- Clean separation between timeout-managed and non-timeout-managed execution paths
Consider adding a blank line between the docstring summary and description to address the static analysis hint:
- """ - Run all accuracy test phases with timeout management. - If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. - """ + """ + Run all accuracy test phases with timeout management. + + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- tests/integration/defs/examples/test_commandr.py
- tests/integration/test_lists/qa/examples_test_list.txt
- tests/integration/defs/examples/test_exaone.py
- tests/integration/defs/conftest.py
- tests/integration/defs/utils/init.py
- tests/integration/defs/examples/test_llama.py
- tests/integration/defs/common.py
- tests/integration/defs/accuracy/test_cli_flow.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/utils/timeout_manager.pytests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/utils/timeout_manager.pytests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/defs/examples/test_gpt.py
🪛 Ruff (0.12.2)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
tests/integration/defs/examples/test_gpt.py
647-647: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tests/integration/defs/examples/test_gpt.py (4)
647-648: LGTM! Function signature updated for timeout management integration.The addition of the
timeout_managerparameter is consistent with the broader timeout management implementation across the test suite.
652-663: LGTM! Proper timeout management for checkpoint conversion.The conversion phase is correctly wrapped with
timed_operation("convert")and the subprocess call receives the appropriate timeout value fromtimeout_manager.remaining_timeout.
665-698: LGTM! Enhanced build phase with timeout management and improved logic.The build phase correctly implements timeout management with
timed_operation("build"). The improvedcontext_fmhaflag handling explicitly disables it whenuse_attention_pluginis true butcontext_fmhais false, making the command construction more explicit and maintainable.
700-709: LGTM! Complete timeout management implementation for inference phase.The inference phase is properly wrapped with
timed_operation("run")and the MPI subprocess call receives the appropriate timeout value. This completes the comprehensive timeout management for all major test phases.tests/integration/defs/utils/timeout_manager.py (4)
1-19: LGTM - Copyright header and imports are correct.The NVIDIA copyright header includes the current year range (2022-2024) as required, and the imports are appropriate for the functionality provided.
41-94: LGTM - Core timeout management methods are well implemented.The property, reset method, check_timeout method, and timed_operation context manager provide a clean and robust interface for timeout management. The logic correctly handles None timeout values for no enforcement scenarios.
96-138: LGTM - Execution wrapper methods provide clean abstractions.Both
execute_with_timeoutandcall_with_timeoutmethods provide convenient wrappers around the coretimed_operationcontext manager, with appropriate parameter handling and documentation.
141-152: LGTM - Factory function is simple and appropriate.The
create_timeout_managerfactory function provides a clean interface for creating TimeoutManager instances from pytest marker values.
Head branch was pushed to by a user without write access
0c8707f to
336439a
Compare
|
/bot reuse-pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/accuracy_core.py (1)
706-709: Fix docstring formattingThe docstring needs a blank line between the summary and description to comply with PEP 257.
- """ - Run all accuracy test phases with timeout management. - If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. - """ + """ + Run all accuracy test phases with timeout management. + + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- tests/integration/test_lists/qa/examples_test_list.txt
- tests/integration/defs/examples/test_exaone.py
- tests/integration/defs/accuracy/test_cli_flow.py
- tests/integration/defs/utils/init.py
- tests/integration/defs/examples/test_commandr.py
- tests/integration/defs/common.py
- tests/integration/defs/conftest.py
- tests/integration/defs/examples/test_llama.py
- tests/integration/defs/utils/timeout_manager.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/examples/test_gpt.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/defs/examples/test_gpt.py
🪛 Ruff (0.12.2)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
tests/integration/defs/examples/test_gpt.py
647-647: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/integration/defs/examples/test_gpt.py (4)
647-648: LGTM: Clean timeout management integrationThe addition of the
timeout_managerparameter follows the established pattern for integrating timeout management across the test suite.
652-663: LGTM: Proper timeout management for conversion phaseThe checkpoint conversion is correctly wrapped with timeout management, ensuring proper resource cleanup and timeout enforcement during the conversion process.
665-698: LGTM: Enhanced build phase with improved context_fmha logicThe build phase correctly integrates timeout management while also improving the context_fmha logic clarity. The explicit conditional handling (lines 679-682) makes it clear when context_fmha should be enabled versus disabled when
use_attention_pluginis true, which is an improvement over the previous implicit behavior.
700-709: LGTM: Consistent timeout management for inference phaseThe inference run phase correctly implements timeout management consistent with the convert and build phases, ensuring comprehensive timeout coverage across all test phases.
tests/integration/defs/accuracy/accuracy_core.py (1)
704-756: LGTM: Excellent timeout management integration with backward compatibilityThe implementation excellently integrates timeout management while maintaining backward compatibility. The dual-path approach (lines 711-735 for timeout management, lines 737-756 for fallback) ensures existing functionality remains intact while providing comprehensive timeout coverage for all test phases when the timeout manager is available.
|
PR_Github #13964 [ reuse-pipeline ] triggered by Bot |
|
PR_Github #13964 [ reuse-pipeline ] completed with state |
|
/bot run |
|
PR_Github #13967 [ run ] triggered by Bot |
|
PR_Github #13967 [ run ] completed with state |
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
336439a to
be850fb
Compare
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/integration/defs/utils/timeout_manager.py (2)
30-39: Add validation for non-negative timeout values.The constructor should validate that
initial_timeoutis non-negative to prevent unexpected behavior.
155-184: Improve the decorator implementation for better robustness.The decorator needs several improvements including moving the functools import to module level, adding error handling, and parameter inspection to avoid unexpected argument errors.
🧹 Nitpick comments (4)
tests/integration/defs/examples/test_gpt.py (1)
647-648: Good timeout management integration with minor docstring fix needed.The addition of the
timeout_managerparameter correctly integrates with the new timeout management framework. However, the docstring should end with a period for proper formatting.Apply this diff to fix the docstring formatting:
- "Build & Run GPT-3 175B: 96 layer w/ plugins" + "Build & Run GPT-3 175B: 96 layer w/ plugins."tests/integration/defs/accuracy/accuracy_core.py (1)
704-709: Good timeout management integration with docstring formatting fix needed.The addition of the optional
timeout_managerparameter properly integrates with the timeout management framework. The docstring clearly explains the new functionality.Apply this diff to fix the docstring formatting per PEP 257:
def run(self, tasks: Optional[List[AccuracyTask]] = None, dtype: str = 'auto', quant_algo: Optional[str] = None, kv_cache_quant_algo: Optional[str] = None, spec_dec_algo: Optional[str] = None, extra_acc_spec: Optional[str] = None, tp_size: int = 1, pp_size: int = 1, cp_size: int = 1, extra_convert_args: Optional[list] = None, extra_build_args: Optional[list] = None, extra_summarize_args: Optional[list] = None, extra_eval_long_context_args: Optional[list] = None, env: Optional[Dict[str, str]] = None, timeout_manager=None): """ Run all accuracy test phases with timeout management. + If timeout_manager is provided, each phase will be wrapped to track and deduct remaining timeout. """tests/integration/defs/utils/timeout_manager.py (2)
37-39: Remove unused_start_timeattribute.The
_start_timeattribute is initialized and reset but never used elsewhere in the class. Consider removing it to simplify the implementation.def __init__(self, initial_timeout: Optional[float] = None): """ Initialize the timeout manager. Args: initial_timeout: Initial timeout value in seconds. If None, no timeout is enforced. """ self._initial_timeout = initial_timeout self._remaining_timeout = initial_timeout - self._start_time = None def reset(self, timeout: Optional[float] = None) -> None: """ Reset the timeout manager with a new timeout value. Args: timeout: New timeout value. If None, uses the initial timeout. """ self._remaining_timeout = timeout if timeout is not None else self._initial_timeout - self._start_time = NoneAlso applies to: 53-54
96-115: Consider consolidating similar methods.
execute_with_timeoutandcall_with_timeoutprovide similar functionality. The more flexiblecall_with_timeoutcould potentially replaceexecute_with_timeoutto reduce code duplication.If you decide to consolidate, you could remove
execute_with_timeoutand update callers to usecall_with_timeout:-def execute_with_timeout(self, - operation: Callable[[], Any], - phase_name: str = "operation", - **kwargs) -> Any: - """ - Execute an operation with timeout tracking. - - Args: - operation: The operation to execute. - phase_name: Name of the phase for timeout checking. - **kwargs: Additional arguments to pass to the operation. - - Returns: - The result of the operation. - - Raises: - TimeoutError: If timeout is exceeded after the operation. - """ - with self.timed_operation(phase_name): - return operation(**kwargs)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
tests/integration/defs/accuracy/accuracy_core.py(1 hunks)tests/integration/defs/accuracy/test_cli_flow.py(3 hunks)tests/integration/defs/common.py(7 hunks)tests/integration/defs/conftest.py(1 hunks)tests/integration/defs/examples/test_commandr.py(3 hunks)tests/integration/defs/examples/test_exaone.py(4 hunks)tests/integration/defs/examples/test_gpt.py(1 hunks)tests/integration/defs/examples/test_llama.py(4 hunks)tests/integration/defs/utils/__init__.py(1 hunks)tests/integration/defs/utils/timeout_manager.py(1 hunks)tests/integration/test_lists/qa/examples_test_list.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- tests/integration/defs/conftest.py
- tests/integration/defs/common.py
- tests/integration/defs/examples/test_exaone.py
- tests/integration/defs/utils/init.py
- tests/integration/test_lists/qa/examples_test_list.txt
- tests/integration/defs/examples/test_commandr.py
- tests/integration/defs/accuracy/test_cli_flow.py
- tests/integration/defs/examples/test_llama.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/defs/examples/test_gpt.py
🪛 Ruff (0.12.2)
tests/integration/defs/accuracy/accuracy_core.py
706-708: 1 blank line required between summary line and description
(D205)
tests/integration/defs/examples/test_gpt.py
647-647: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tests/integration/defs/examples/test_gpt.py (3)
652-663: LGTM! Proper timeout management for convert phase.The convert checkpoint phase is correctly wrapped with
timeout_manager.timed_operation("convert")and passes the remaining timeout to the subprocess call. This ensures proper timeout tracking and enforcement during the conversion phase.
665-698: Excellent timeout management and improved context_fmha logic.The build phase is properly wrapped with timeout management and the subprocess call receives the remaining timeout. Additionally, the context_fmha logic has been improved to be more explicit - when
use_attention_pluginis true, it now explicitly enables or disablescontext_fmhabased on the parameter value, making the behavior clearer.
700-709: LGTM! Complete timeout management implementation.The run inference phase properly completes the timeout management integration by wrapping the operation and passing the remaining timeout to the MPI subprocess call. All three major phases (convert, build, run) now have consistent timeout handling.
tests/integration/defs/accuracy/accuracy_core.py (1)
710-756: Excellent timeout management implementation with backward compatibility.The implementation demonstrates good design principles:
- Backward compatibility: The fallback path ensures existing code continues to work when
timeout_manageris None- Logical phase separation: Each major phase (install_requirements, initialize_case, convert, build, evaluate) is properly wrapped with
timeout_manager.timed_operation- Consistent pattern: The timeout management follows the same pattern established in other test files
This approach allows for gradual adoption of timeout management across the test suite while maintaining existing functionality.
tests/integration/defs/utils/timeout_manager.py (1)
141-152: LGTM!The factory function provides a clean interface for creating TimeoutManager instances from pytest markers.
|
PR_Github #14086 [ run ] triggered by Bot |
|
PR_Github #14086 [ run ] completed with state |
Signed-off-by: Ivy Zhang <[email protected]> Signed-off-by: 皓聪 <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Improvements
Chores
Description
This PR adds timeout management system to handle TensorRT flow test cases with proper resource cleanup.
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.