-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options #6831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options #6831
Conversation
📝 WalkthroughWalkthroughReplaces the boolean Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as CLI / Tests
participant LLM
participant Args as LLM Args
participant PyConfig as PyTorchConfig
participant SamplerFactory as instantiate_sampler
participant TorchSampler
participant TRTLLMSampler
Caller->>LLM: __init__(..., sampler_type)
LLM->>Args: get_pytorch_backend_config()
Args->>PyConfig: PyTorchConfig(..., sampler_type)
LLM->>SamplerFactory: instantiate_sampler(executor_config)
SamplerFactory->>SamplerFactory: decoding_mode = get_decoding_mode(executor_config)
alt sampler_type == "TRTLLMSampler" or (sampler_type == "auto" and decoding_mode == BeamSearch)
SamplerFactory->>TRTLLMSampler: construct(decoding_mode)
SamplerFactory-->>LLM: TRTLLMSampler
else
SamplerFactory->>TorchSampler: construct()
SamplerFactory-->>LLM: TorchSampler
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
|
/bot run --disable-fail-fast |
|
PR_Github #14987 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
tests/unittest/llmapi/apps/utils.py (1)
154-165: Fail fast on invalid sampler_type to catch typos earlyConsider validating sampler_type against the allowed set to avoid silently passing an unsupported value through the YAML.
Apply this diff:
sampler_type = getattr(request, 'param', {}).get('sampler_type', "auto") + allowed_sampler_types = {"TorchSampler", "TRTLLMSampler", "auto"} + if sampler_type not in allowed_sampler_types: + pytest.fail(f"Unsupported sampler_type: {sampler_type}")tests/unittest/llmapi/apps/_test_openai_chat.py (1)
536-540: Optionally add an 'auto' case to exercise detection pathTo cover sampler auto-detection logic end-to-end, consider adding a third case with {'sampler_type': "auto"} and an accompanying id. This ensures the server path that resolves sampler based on decoding mode/options is also validated.
tests/unittest/llmapi/apps/_test_openai_completions.py (1)
386-390: Optionally add an 'auto' case to validate sampler detectionSimilarly to the chat test, consider adding {'sampler_type': "auto"} (with a matching id) to validate the auto-detection behavior for the completions endpoint as well.
tensorrt_llm/llmapi/llm_args.py (1)
30-30: Import statement formatting issueThe import is correctly placed but should follow the module namespace convention per the coding guidelines.
-from tensorrt_llm._torch.pyexecutor.config import SamplerType +from tensorrt_llm._torch.pyexecutor import config as pyexecutor_configThen update line 2053-2054 to use:
sampler_type: pyexecutor_config.SamplerType = Field( default=pyexecutor_config.SamplerType.auto,tests/unittest/_torch/test_return_logits.py (1)
30-31: Consider adding a comment explaining the limitationThe skip condition is correct, but it would be helpful to document why TorchSampler doesn't support gather_context_logits.
if sampler_type == "TorchSampler" and gather_context_logits: + # TorchSampler doesn't support gather_context_logits due to architectural limitations pytest.skip("TorchSampler does not support gather_context_logits")tensorrt_llm/_torch/pyexecutor/_util.py (1)
633-640: Consider making the function more comprehensiveThe docstring correctly notes this function is not exhaustive. Consider tracking the missing expert attribute names in a TODO or creating a registry pattern for extensibility.
def _try_infer_num_experts(model_config: ModelConfig) -> int: """ Attempt to infer the number of experts from the model configuration. Different MoE models use different attribute names for storing the number of experts, so this function checks for various possible names and returns a default of 1 if none are found. TODO: Add support for additional MoE model attribute names as they are discovered. Known missing patterns: - Add specific model types here as they are identified """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
examples/llm-api/quickstart_advanced.py(2 hunks)tensorrt_llm/_torch/pyexecutor/_util.py(2 hunks)tensorrt_llm/_torch/pyexecutor/config.py(3 hunks)tensorrt_llm/llmapi/llm_args.py(3 hunks)tests/integration/defs/accuracy/test_llm_api_pytorch.py(3 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml(0 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml(2 hunks)tests/integration/defs/disaggregated/test_disaggregated.py(5 hunks)tests/unittest/_torch/speculative/test_draft_target.py(0 hunks)tests/unittest/_torch/speculative/test_eagle3.py(0 hunks)tests/unittest/_torch/test_overlap_scheduler.py(4 hunks)tests/unittest/_torch/test_return_logits.py(4 hunks)tests/unittest/api_stability/references/llm.yaml(1 hunks)tests/unittest/llmapi/apps/_test_openai_chat.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py(0 hunks)tests/unittest/llmapi/apps/_test_openai_completions.py(1 hunks)tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py(0 hunks)tests/unittest/llmapi/apps/utils.py(2 hunks)tests/unittest/llmapi/test_llm_pytorch.py(1 hunks)
💤 Files with no reviewable changes (5)
- tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
- tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py
- tests/unittest/_torch/speculative/test_draft_target.py
- tests/unittest/_torch/speculative/test_eagle3.py
- tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/unittest/llmapi/apps/_test_openai_chat.pytests/unittest/llmapi/apps/_test_openai_completions.pytests/unittest/llmapi/apps/utils.pytests/unittest/llmapi/test_llm_pytorch.pytensorrt_llm/_torch/pyexecutor/config.pytests/unittest/_torch/test_overlap_scheduler.pytests/unittest/_torch/test_return_logits.pytests/integration/defs/disaggregated/test_disaggregated.pytests/integration/defs/accuracy/test_llm_api_pytorch.pytensorrt_llm/llmapi/llm_args.pyexamples/llm-api/quickstart_advanced.pytensorrt_llm/_torch/pyexecutor/_util.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/unittest/llmapi/apps/_test_openai_chat.pytests/unittest/llmapi/apps/_test_openai_completions.pytests/unittest/llmapi/apps/utils.pytests/unittest/llmapi/test_llm_pytorch.pytensorrt_llm/_torch/pyexecutor/config.pytests/unittest/_torch/test_overlap_scheduler.pytests/unittest/_torch/test_return_logits.pytests/integration/defs/disaggregated/test_disaggregated.pytests/integration/defs/accuracy/test_llm_api_pytorch.pytensorrt_llm/llmapi/llm_args.pyexamples/llm-api/quickstart_advanced.pytensorrt_llm/_torch/pyexecutor/_util.py
🧬 Code Graph Analysis (4)
tests/unittest/llmapi/test_llm_pytorch.py (1)
tests/unittest/llmapi/test_llm.py (1)
llm_test_harness(109-136)
tests/unittest/_torch/test_overlap_scheduler.py (3)
tests/unittest/llmapi/test_llm_kv_cache_events.py (1)
create_llm(50-54)tensorrt_llm/llmapi/llm_args.py (2)
model_dir(1081-1083)model_dir(1086-1090)tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)
model_path(39-44)
tests/integration/defs/disaggregated/test_disaggregated.py (1)
tests/integration/defs/conftest.py (4)
disaggregated_test_root(2339-2344)llm_venv(707-723)disaggregated_example_root(270-275)llama_model_root(964-1039)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)
tensorrt_llm/llmapi/llm.py (1)
LLM(1079-1095)tensorrt_llm/_torch/llm.py (1)
LLM(4-9)tensorrt_llm/llmapi/llm_args.py (1)
CudaGraphConfig(105-161)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py
23-23: Redefinition of unused PyTorchConfig from line 11
(F811)
594-594: Line too long (158 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (27)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2)
14-14: SamplerType replacement LGTMUsing sampler_type: "TRTLLMSampler" aligns with the enum-based migration and keeps the disaggregated config explicit for this test.
Also applies to: 30-30
14-14: Confirm enum values and config loader coveragePlease double-check that:
- The string "TRTLLMSampler" exactly matches the SamplerType value expected by the runtime (case-sensitive).
- The disaggregated config loader reads sampler_type from both context_servers and generation_servers sections.
This helps avoid silent fallbacks if unknown keys are ignored.
Also applies to: 30-30
tests/unittest/llmapi/apps/utils.py (1)
154-165: Fixture migration to sampler_type looks correctReads sampler_type from request.param and writes it to extra_llm_api_options. This aligns with the new enum-based API.
tests/unittest/llmapi/apps/_test_openai_chat.py (1)
536-540: Param update to sampler_type is correctSwitching to {'sampler_type': "TorchSampler"} and {'sampler_type': "TRTLLMSampler"} matches the new configuration contract and keeps the intended coverage for both sampler backends.
tests/unittest/llmapi/apps/_test_openai_completions.py (1)
386-390: Param update to sampler_type is correctUsing {'sampler_type': "TorchSampler"} and {'sampler_type': "TRTLLMSampler"} aligns with the SamplerType-based API and maintains parity with the chat tests.
tests/unittest/api_stability/references/llm.yaml (1)
114-117: LGTM! API stability test updated for new sampler configuration.The change correctly reflects the transition from the boolean
use_torch_samplerto the enum-basedsampler_typeparameter with the new annotation and default value.tensorrt_llm/_torch/pyexecutor/config.py (2)
17-22: LGTM! Well-designed enum with clear naming.The
SamplerTypeenum provides a clean tri-state configuration for sampler selection with intuitive value names.
69-73: LGTM! Field transition properly implemented with good defaults.The replacement of the boolean
use_torch_samplerwith the enum-basedsampler_typeis well-executed with clear documentation about the auto selection behavior.tests/integration/defs/disaggregated/test_disaggregated.py (4)
55-56: LGTM! Test configuration updated for new sampler type.The config map entry correctly reflects the transition from "torch_sampler" to "trtllm_sampler" test configuration.
211-211: LGTM! Chat endpoint condition updated consistently.The conditional logic correctly includes the new "trtllm_sampler" test descriptor alongside the existing "overlap" condition.
234-234: LGTM! Output verification condition updated consistently.The output file filtering logic is consistently updated to include the new "trtllm_sampler" test descriptor.
488-503: LGTM! Test function renamed and configured properly.The test function name change from
test_disaggregated_torch_samplertotest_disaggregated_trtllm_samplercorrectly reflects the new sampler type being tested.examples/llm-api/quickstart_advanced.py (2)
68-70: LGTM! CLI argument properly defined for new sampler type.The argument definition correctly provides the tri-state options with appropriate default and choices matching the SamplerType enum values.
230-230: LGTM! LLM constructor updated correctly.The parameter change from
use_torch_samplertosampler_typein the LLM constructor correctly uses the new enum-based configuration.tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)
199-199: LGTM! Explicit sampler parameter removed correctly.The removal of the explicit
use_torch_sampler=Trueparameter allows the LLM to use the default sampler selection logic based on the new enum configuration.
1989-1989: LGTM! pytorch_config dictionary updated correctly.The removal of the explicit
use_torch_samplerparameter from the configuration dictionary allows the system to use the default sampler type selection.
2011-2011: LGTM! Another pytorch_config dictionary updated correctly.Consistent removal of the explicit
use_torch_samplerparameter maintains the pattern across test configurations.tensorrt_llm/llmapi/llm_args.py (2)
2053-2057: LGTM! Well-designed API change from boolean to enumThe migration from
use_torch_sampler: booltosampler_type: SamplerTypewith the auto-detection mode is a clean API improvement that provides more flexibility and better defaults.
2342-2342: LGTM! Proper parameter propagation to PyTorchConfigThe parameter passing correctly propagates the sampler_type value to the backend configuration.
tests/unittest/_torch/test_return_logits.py (2)
19-19: LGTM! Test parameterization updated correctlyThe parameterization properly tests both sampler types using string values that align with the enum options.
44-44: LGTM! Consistent parameter usage across test functionsBoth test functions correctly pass the sampler_type parameter to the LLM constructor.
Also applies to: 111-111
tensorrt_llm/_torch/pyexecutor/_util.py (2)
592-609: LGTM! Clean implementation of sampler selection logicThe new sampler instantiation logic is well-structured:
- Properly checks for TRTLLMSampler or auto mode with beam search
- Includes appropriate debug logging for TRTLLMSampler selection
- Falls back to TorchSampler as the default
The removal of the complex auto-override logic in favor of this simpler approach improves code maintainability.
612-630: LGTM! Simplified decoding mode logicThe streamlined get_decoding_mode function is much cleaner, removing unnecessary complexity while maintaining the essential functionality of determining the appropriate decoding mode based on configuration and beam width.
tests/unittest/_torch/test_overlap_scheduler.py (4)
24-27: LGTM! Function signature properly updatedThe function signature and implementation correctly use the new sampler_type parameter.
44-44: LGTM! Test parameterization follows the new enum patternThe parameterization correctly tests both sampler types using their string representations.
52-52: LGTM! Simplified stop_words handlingThe unconditional assignment of stop_words is appropriate since the test case structure is consistent regardless of the sampler type.
63-64: LGTM! Consistent sampler_type parameter usageBoth LLM creation calls correctly pass the sampler_type parameter.
Also applies to: 75-76
|
PR_Github #14987 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
22-22: Import consolidation handled correctly.The duplicate import issue has been resolved as mentioned in the previous review comments.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
593-593: Fix line length formatting issue.The line exceeds the 120-character limit as flagged by the linter.
- assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'" + assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", ( + "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'" + )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/_torch/pyexecutor/_util.py(2 hunks)tensorrt_llm/llmapi/llm_args.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tensorrt_llm/llmapi/llm_args.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tensorrt_llm/_torch/pyexecutor/_util.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tensorrt_llm/_torch/pyexecutor/_util.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tensorrt_llm/_torch/pyexecutor/_util.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py
593-593: Line too long (158 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tensorrt_llm/_torch/pyexecutor/_util.py (4)
591-591: LGTM! Decoding mode extraction improves code organization.The extraction of decoding mode calculation to a separate variable enhances readability and prepares for the new sampler selection logic.
598-608: BREAKING CHANGE: Sampler selection logic updated to use SamplerType enum.The implementation correctly replaces the boolean
use_torch_samplerwith a tri-stateSamplerTypeenum. The logic properly handles:
- Explicit
SamplerType.TRTLLMSamplerselection- Auto mode that chooses
TRTLLMSamplerfor beam search scenarios- Fallback to
TorchSamplerfor other casesThe debug logging for decoding mode provides useful diagnostics.
614-629: Streamlined decoding mode inference logic.The refactored
get_decoding_modefunction is much cleaner and more readable than the previous implementation. The logic correctly:
- Respects explicitly set decoding modes (when not Auto)
- Uses TopKTopP for beam width 1, BeamSearch otherwise
- Provides appropriate warning and override when beam width is 1 but BeamSearch is selected
This aligns well with the AI summary's description of simplifying decoding-mode inference.
632-656: Documentation improvement for _try_infer_num_experts.The enhanced docstring clearly communicates the method's limitations and need for future revision, which is helpful for maintainability.
|
/bot run --disable-fail-fast |
|
PR_Github #15084 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tensorrt_llm/llmapi/llm_args.py (1)
1965-1970: Enum introduction looks good; consider casing consistency and header nitThe SamplerType definition itself is fine. Two small points:
- Value casing is inconsistent with other StrEnums in this module (e.g., STATIC/INFLIGHT are UPPERCASE). If you keep mixed casing for user-facing readability, add normalization (see next comment) to accept common variants.
- Coding guideline: prepend NVIDIA copyright header to source files.
Example header to add at the top of the file:
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tensorrt_llm/_torch/pyexecutor/config.py(2 hunks)tensorrt_llm/llmapi/llm_args.py(3 hunks)tests/unittest/api_stability/references/llm.yaml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unittest/api_stability/references/llm.yaml
- tensorrt_llm/_torch/pyexecutor/config.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tensorrt_llm/llmapi/llm_args.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tensorrt_llm/llmapi/llm_args.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tensorrt_llm/llmapi/llm_args.py (1)
2348-2349: Verified sampler_type support and no legacy knobs remainThe
PyTorchConfigconstructor intensorrt_llm/_torch/pyexecutor/config.pydeclaressampler_type: SamplerType = SamplerType.auto, and theinstantiate_samplerlogic in_util.pycorrectly branches onpytorch_backend_config.sampler_type. A search confirms nouse_torch_samplerremnants exist. Everything is wired up as expected.
|
/bot run --disable-fail-fast |
|
PR_Github #15092 [ run ] triggered by Bot |
|
PR_Github #15084 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #15140 [ run ] triggered by Bot |
|
PR_Github #15092 [ run ] completed with state |
f96ad39 to
461f7f7
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #15159 [ run ] triggered by Bot |
|
PR_Github #15140 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
10-21: Remove duplicate ModelConfig importModelConfig is imported twice (absolute and relative). Keep only the relative import for consistency and to avoid shadowing.
Apply this diff:
-from tensorrt_llm._torch.model_config import ModelConfig @@ -from ..model_config import ModelConfig +from ..model_config import ModelConfig
♻️ Duplicate comments (1)
tensorrt_llm/llmapi/llm_args.py (1)
2064-2068: Make sampler_type robust (aliases, case-insensitive) and add deprecation shim for use_torch_samplerTo minimize breakage and improve UX, please:
- Accept common aliases and case-insensitive strings (e.g., "trtllm", "torch", "auto").
- Gracefully handle legacy use_torch_sampler by mapping to the new enum and logging a deprecation warning (pre-model validator). This avoids failures when older configs are passed in StrictBaseModel contexts.
Also, enrich the JSON schema metadata for clearer API docs.
Apply this small diff within the field to enrich schema metadata:
sampler_type: SamplerType = Field( default=SamplerType.auto, description= - "The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested.", + "The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless beam search is requested.", + json_schema_extra={"type": "Literal['TRTLLMSampler','TorchSampler','auto']"}, status="prototype")Add the following validators inside class TorchLlmArgs to normalize inputs and provide a backwards-compatibility shim:
# Place inside class TorchLlmArgs @model_validator(mode="before") @classmethod def _compat_use_torch_sampler(cls, data): # Backwards-compat for legacy configs if isinstance(data, dict) and "use_torch_sampler" in data: uts = data.pop("use_torch_sampler") # Only set sampler_type if the new field is not already provided if "sampler_type" not in data: data["sampler_type"] = ( SamplerType.TorchSampler if uts else SamplerType.TRTLLMSampler ) logger.warning( "use_torch_sampler is deprecated; use sampler_type " "(TRTLLMSampler | TorchSampler | auto) instead." ) return data @field_validator("sampler_type", mode="before") @classmethod def _normalize_sampler_type(cls, v): if isinstance(v, SamplerType): return v if isinstance(v, str): s = v.strip().lower() if s in {"trtllm", "trtllmsampler", "tensorrt-llm", "tensorrtllm"}: return SamplerType.TRTLLMSampler if s in {"torch", "torchsampler", "pytorch"}: return SamplerType.TorchSampler if s in {"auto", ""}: return SamplerType.auto raise ValueError( f"Invalid sampler_type: {v}. Allowed values: " f"{', '.join(m.value for m in SamplerType)}" )Run this script to find lingering uses of the legacy flag that would benefit from the shim:
#!/bin/bash # Search for legacy uses that could break StrictBaseModel without the shim rg -n --hidden --no-ignore -S '\buse_torch_sampler\b' | sed -E 's/^/legacy_usage: /'
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
592-594: Wrap long assert to satisfy linters and readability (E501)Line exceeds typical 120-char limit; wrap the assertion message.
Apply this diff:
- if mapping.cp_config.get('cp_type') == 'star_attention': - assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'" + if mapping.cp_config.get('cp_type') == 'star_attention': + assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", ( + "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'" + )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
examples/llm-api/quickstart_advanced.py(2 hunks)tensorrt_llm/_torch/pyexecutor/_util.py(2 hunks)tensorrt_llm/_torch/pyexecutor/config.py(2 hunks)tensorrt_llm/llmapi/llm_args.py(4 hunks)tests/integration/defs/accuracy/test_llm_api_pytorch.py(3 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml(0 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml(2 hunks)tests/integration/defs/disaggregated/test_disaggregated.py(5 hunks)tests/integration/test_lists/qa/llm_function_full.txt(1 hunks)tests/integration/test_lists/qa/llm_function_sanity.txt(1 hunks)tests/unittest/_torch/speculative/test_draft_target.py(0 hunks)tests/unittest/_torch/speculative/test_eagle3.py(0 hunks)tests/unittest/_torch/test_overlap_scheduler.py(4 hunks)tests/unittest/_torch/test_return_logits.py(4 hunks)tests/unittest/api_stability/api_stability_core.py(1 hunks)tests/unittest/api_stability/references/llm.yaml(1 hunks)tests/unittest/llmapi/apps/_test_openai_chat.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py(0 hunks)tests/unittest/llmapi/apps/_test_openai_completions.py(1 hunks)tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py(0 hunks)tests/unittest/llmapi/apps/utils.py(2 hunks)tests/unittest/llmapi/test_llm_pytorch.py(1 hunks)
💤 Files with no reviewable changes (5)
- tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
- tests/unittest/_torch/speculative/test_eagle3.py
- tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py
- tests/unittest/_torch/speculative/test_draft_target.py
- tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
🚧 Files skipped from review as they are similar to previous changes (14)
- tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml
- tests/unittest/_torch/test_overlap_scheduler.py
- tests/unittest/llmapi/apps/utils.py
- tests/unittest/llmapi/test_llm_pytorch.py
- tests/unittest/llmapi/apps/_test_openai_completions.py
- examples/llm-api/quickstart_advanced.py
- tests/unittest/api_stability/references/llm.yaml
- tests/integration/test_lists/qa/llm_function_full.txt
- tests/integration/defs/disaggregated/test_disaggregated.py
- tests/unittest/_torch/test_return_logits.py
- tests/unittest/llmapi/apps/_test_openai_chat.py
- tests/integration/defs/accuracy/test_llm_api_pytorch.py
- tensorrt_llm/_torch/pyexecutor/config.py
- tests/integration/test_lists/qa/llm_function_sanity.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/unittest/api_stability/api_stability_core.pytensorrt_llm/llmapi/llm_args.pytensorrt_llm/_torch/pyexecutor/_util.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/unittest/api_stability/api_stability_core.pytensorrt_llm/llmapi/llm_args.pytensorrt_llm/_torch/pyexecutor/_util.py
🧠 Learnings (1)
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
tensorrt_llm/_torch/pyexecutor/_util.py
🧬 Code Graph Analysis (1)
tensorrt_llm/_torch/pyexecutor/_util.py (2)
cpp/include/tensorrt_llm/executor/types.h (2)
tensorrt_llm(36-39)tensorrt_llm(41-663)tensorrt_llm/llmapi/llm_args.py (1)
SamplerType(1950-1954)
🪛 Ruff (0.12.2)
tests/unittest/api_stability/api_stability_core.py
30-30: Multiple statements on one line (colon)
(E701)
tensorrt_llm/_torch/pyexecutor/_util.py
593-593: Line too long (158 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/llmapi/llm_args.py (2)
1970-1975: LGTM: Public SamplerType enum is clear and scoped correctlyEnum names/values are self-explanatory and align with downstream usage.
2353-2354: LGTM: Propagating sampler_type to PyTorchConfigCorrectly threads the new enum through to backend config.
tests/unittest/api_stability/api_stability_core.py (1)
30-30: LGTM: Importing SamplerType into test namespaceThis ensures eval/annotation in references YAML can resolve SamplerType during API stability checks.
tensorrt_llm/_torch/pyexecutor/_util.py (2)
591-605: Sampler selection logic aligns with new sampler_type and decoding mode
- Honors explicit TRTLLMSampler.
- Honors auto by selecting TRTLLM for beam search and Torch for others.
- Preserves spec-decoder path and star-attention guard.
Looks correct and keeps behavior predictable.
611-629: LGTM: Simplified, consistent get_decoding_mode
- Respects explicit non-Auto decoding_mode.
- Uses TopKTopP for beam_width==1; BeamSearch otherwise.
- Downgrades BeamSearch to TopKTopP when beam_width==1 with a clear warning.
Matches stated behavior and avoids confusing combinations.
|
/bot run --disable-fail-fast |
|
PR_Github #15290 [ run ] triggered by Bot |
|
PR_Github #15271 [ run ] completed with state |
|
PR_Github #15290 [ run ] completed with state |
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Co-authored-by: Enwei Zhu <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
cc5d0ce to
6ce8051
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #15444 [ run ] triggered by Bot |
|
PR_Github #15444 [ run ] completed with state |
…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>
…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>
…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>
…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Summary by CodeRabbit
New Features
Refactor
Tests
Documentation
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.