[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options #6831

dcampora · 2025-08-12T14:52:03Z

Summary by CodeRabbit

New Features
- CLI and API now support a tri-state sampler_type (auto, TorchSampler, TRTLLMSampler); CLI flag --sampler_type (default "auto") lets you select the sampling backend. Auto chooses TorchSampler by default unless beam-search is requested.
Refactor
- Replaced boolean sampler toggle with sampler_type across public APIs and runtime config, standardizing sampler selection.
Tests
- Tests, fixtures, and test lists updated to use and validate sampler_type parameterization.
Documentation
- Improved API doc generation handling for enum/type names.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-08-12T14:52:12Z

📝 Walkthrough

Walkthrough

Replaces the boolean use_torch_sampler with a tri-state sampler_type enum across API, config, CLI, sampler-instantiation logic, and tests. Adds SamplerType; sampler selection now uses sampler_type plus decoding-mode inference; decoding-mode inference simplified; test/config fixtures updated accordingly.

Changes

Cohort / File(s)	Summary
Public API & args `tensorrt_llm/llmapi/llm_args.py`, `tests/unittest/api_stability/references/llm.yaml`	Add `SamplerType(StrEnum)` and replace `use_torch_sampler: bool` with `sampler_type: Union[str, SamplerType]` (default `auto`); propagate `sampler_type` through backend config.
PyTorch config `tensorrt_llm/_torch/pyexecutor/config.py`	Replace `use_torch_sampler: bool` with `sampler_type: SamplerType = SamplerType.auto` and update docs/imports.
Sampler instantiation & decoding-mode logic `tensorrt_llm/_torch/pyexecutor/_util.py`	Instantiate sampler based on `SamplerType` and `get_decoding_mode(executor_config)`; simplified `get_decoding_mode` (fewer auto-overrides); explicit TRTLLMSampler chosen for explicit request or auto+BeamSearch, otherwise TorchSampler.
CLI example `examples/llm-api/quickstart_advanced.py`	Replace `--use_torch_sampler` with `--sampler_type` (choices: `auto`,`TorchSampler`,`TRTLLMSampler`) and pass `sampler_type` to `LLM`.
Integration tests & configs `tests/integration/defs/.../test_llm_api_pytorch.py`, `tests/integration/defs/disaggregated/.../disagg_config_.yaml`, `tests/integration/defs/disaggregated/test_disaggregated.py`, `tests/integration/test_lists/qa/.txt`	Remove or replace `use_torch_sampler` usages in tests/YAMLs; add `sampler_type: TRTLLMSampler` where appropriate; rename test to target TRTLLMSampler.
Unit tests — parameterization & helpers `tests/unittest/_torch/test_overlap_scheduler.py`, `tests/unittest/_torch/test_return_logits.py`, `tests/unittest/_torch/speculative/*`	Replace boolean `use_torch_sampler` parameterization/signatures with `sampler_type` (e.g., `"TorchSampler"
LLM API app tests & fixtures `tests/unittest/llmapi/apps/*`, `tests/unittest/llmapi/apps/utils.py`	Switch fixture keys/param dicts from `use_torch_sampler` to `sampler_type` (default `"auto"`); remove explicit torch-sampler flags where unnecessary.
API docs utility `tensorrt_llm/llmapi/utils.py`	Robust type-name extraction for doc generation using regex to handle class/enum patterns.
API-stability test module `tests/unittest/api_stability/api_stability_core.py`	Add import `from tensorrt_llm.llmapi.llm_args import SamplerType`.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as CLI / Tests
  participant LLM
  participant Args as LLM Args
  participant PyConfig as PyTorchConfig
  participant SamplerFactory as instantiate_sampler
  participant TorchSampler
  participant TRTLLMSampler

  Caller->>LLM: __init__(..., sampler_type)
  LLM->>Args: get_pytorch_backend_config()
  Args->>PyConfig: PyTorchConfig(..., sampler_type)
  LLM->>SamplerFactory: instantiate_sampler(executor_config)
  SamplerFactory->>SamplerFactory: decoding_mode = get_decoding_mode(executor_config)
  alt sampler_type == "TRTLLMSampler" or (sampler_type == "auto" and decoding_mode == BeamSearch)
    SamplerFactory->>TRTLLMSampler: construct(decoding_mode)
    SamplerFactory-->>LLM: TRTLLMSampler
  else
    SamplerFactory->>TorchSampler: construct()
    SamplerFactory-->>LLM: TorchSampler
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[TRTLLM-5061] chore: add status tags to LLM API reference #5707 — Modifies tensorrt_llm/llmapi/llm_args.py; related to the SamplerType/sampler_type changes in this PR.

Suggested labels

Speculative Decoding, CI

Suggested reviewers

nv-guomingz
litaotju
yilin-void
juney-nvidia
shaharmor98

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4f1825 and 17f7e66.

📒 Files selected for processing (1)

tests/unittest/api_stability/references/llm.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unittest/api_stability/references/llm.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

dcampora · 2025-08-12T14:52:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-12T14:58:25Z

PR_Github #14987 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (6)

tests/unittest/llmapi/apps/utils.py (1)
154-165: Fail fast on invalid sampler_type to catch typos early

Consider validating sampler_type against the allowed set to avoid silently passing an unsupported value through the YAML.

Apply this diff:
         sampler_type = getattr(request, 'param', {}).get('sampler_type', "auto")
+        allowed_sampler_types = {"TorchSampler", "TRTLLMSampler", "auto"}
+        if sampler_type not in allowed_sampler_types:
+            pytest.fail(f"Unsupported sampler_type: {sampler_type}")
tests/unittest/llmapi/apps/_test_openai_chat.py (1)

536-540: Optionally add an 'auto' case to exercise detection path

To cover sampler auto-detection logic end-to-end, consider adding a third case with {'sampler_type': "auto"} and an accompanying id. This ensures the server path that resolves sampler based on decoding mode/options is also validated.

tests/unittest/llmapi/apps/_test_openai_completions.py (1)

386-390: Optionally add an 'auto' case to validate sampler detection

Similarly to the chat test, consider adding {'sampler_type': "auto"} (with a matching id) to validate the auto-detection behavior for the completions endpoint as well.
tensorrt_llm/llmapi/llm_args.py (1)
30-30: Import statement formatting issue

The import is correctly placed but should follow the module namespace convention per the coding guidelines.
-from tensorrt_llm._torch.pyexecutor.config import SamplerType
+from tensorrt_llm._torch.pyexecutor import config as pyexecutor_config
Then update line 2053-2054 to use:
sampler_type: pyexecutor_config.SamplerType = Field(
    default=pyexecutor_config.SamplerType.auto,
tests/unittest/_torch/test_return_logits.py (1)
30-31: Consider adding a comment explaining the limitation

The skip condition is correct, but it would be helpful to document why TorchSampler doesn't support gather_context_logits.
 if sampler_type == "TorchSampler" and gather_context_logits:
+    # TorchSampler doesn't support gather_context_logits due to architectural limitations
     pytest.skip("TorchSampler does not support gather_context_logits")
tensorrt_llm/_torch/pyexecutor/_util.py (1)
633-640: Consider making the function more comprehensive

The docstring correctly notes this function is not exhaustive. Consider tracking the missing expert attribute names in a TODO or creating a registry pattern for extensibility.
def _try_infer_num_experts(model_config: ModelConfig) -> int:
    """
    Attempt to infer the number of experts from the model configuration.

    Different MoE models use different attribute names for storing the number of experts,
    so this function checks for various possible names and returns a default of 1 if none are found.
    
    TODO: Add support for additional MoE model attribute names as they are discovered.
    Known missing patterns:
    - Add specific model types here as they are identified
    """

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a060e12 and 5e00501.

📒 Files selected for processing (19)

examples/llm-api/quickstart_advanced.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/_util.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/config.py (3 hunks)
tensorrt_llm/llmapi/llm_args.py (3 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (3 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (0 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2 hunks)
tests/integration/defs/disaggregated/test_disaggregated.py (5 hunks)
tests/unittest/_torch/speculative/test_draft_target.py (0 hunks)
tests/unittest/_torch/speculative/test_eagle3.py (0 hunks)
tests/unittest/_torch/test_overlap_scheduler.py (4 hunks)
tests/unittest/_torch/test_return_logits.py (4 hunks)
tests/unittest/api_stability/references/llm.yaml (1 hunks)
tests/unittest/llmapi/apps/_test_openai_chat.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (0 hunks)
tests/unittest/llmapi/apps/_test_openai_completions.py (1 hunks)
tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py (0 hunks)
tests/unittest/llmapi/apps/utils.py (2 hunks)
tests/unittest/llmapi/test_llm_pytorch.py (1 hunks)

💤 Files with no reviewable changes (5)

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py
tests/unittest/_torch/speculative/test_draft_target.py
tests/unittest/_torch/speculative/test_eagle3.py
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tests/unittest/llmapi/apps/_test_openai_chat.py
tests/unittest/llmapi/apps/_test_openai_completions.py
tests/unittest/llmapi/apps/utils.py
tests/unittest/llmapi/test_llm_pytorch.py
tensorrt_llm/_torch/pyexecutor/config.py
tests/unittest/_torch/test_overlap_scheduler.py
tests/unittest/_torch/test_return_logits.py
tests/integration/defs/disaggregated/test_disaggregated.py
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tensorrt_llm/llmapi/llm_args.py
examples/llm-api/quickstart_advanced.py
tensorrt_llm/_torch/pyexecutor/_util.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tests/unittest/llmapi/apps/_test_openai_chat.py
tests/unittest/llmapi/apps/_test_openai_completions.py
tests/unittest/llmapi/apps/utils.py
tests/unittest/llmapi/test_llm_pytorch.py
tensorrt_llm/_torch/pyexecutor/config.py
tests/unittest/_torch/test_overlap_scheduler.py
tests/unittest/_torch/test_return_logits.py
tests/integration/defs/disaggregated/test_disaggregated.py
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tensorrt_llm/llmapi/llm_args.py
examples/llm-api/quickstart_advanced.py
tensorrt_llm/_torch/pyexecutor/_util.py

🧬 Code Graph Analysis (4)

tests/unittest/llmapi/test_llm_pytorch.py (1)

tests/unittest/llmapi/test_llm.py (1)

llm_test_harness (109-136)

tests/unittest/_torch/test_overlap_scheduler.py (3)

tests/unittest/llmapi/test_llm_kv_cache_events.py (1)

create_llm (50-54)

tensorrt_llm/llmapi/llm_args.py (2)

model_dir (1081-1083)

model_dir (1086-1090)

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)

model_path (39-44)

tests/integration/defs/disaggregated/test_disaggregated.py (1)

tests/integration/defs/conftest.py (4)

disaggregated_test_root (2339-2344)

llm_venv (707-723)

disaggregated_example_root (270-275)

llama_model_root (964-1039)

tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)

tensorrt_llm/llmapi/llm.py (1)

LLM (1079-1095)

tensorrt_llm/_torch/llm.py (1)

LLM (4-9)

tensorrt_llm/llmapi/llm_args.py (1)

CudaGraphConfig (105-161)

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/_util.py

23-23: Redefinition of unused PyTorchConfig from line 11

(F811)

594-594: Line too long (158 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (27)

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2)

14-14: SamplerType replacement LGTM

Using sampler_type: "TRTLLMSampler" aligns with the enum-based migration and keeps the disaggregated config explicit for this test.

Also applies to: 30-30

14-14: Confirm enum values and config loader coverage

Please double-check that:

The string "TRTLLMSampler" exactly matches the SamplerType value expected by the runtime (case-sensitive).

The disaggregated config loader reads sampler_type from both context_servers and generation_servers sections.

This helps avoid silent fallbacks if unknown keys are ignored.

Also applies to: 30-30

tests/unittest/llmapi/apps/utils.py (1)

154-165: Fixture migration to sampler_type looks correct

Reads sampler_type from request.param and writes it to extra_llm_api_options. This aligns with the new enum-based API.

tests/unittest/llmapi/apps/_test_openai_chat.py (1)

536-540: Param update to sampler_type is correct

Switching to {'sampler_type': "TorchSampler"} and {'sampler_type': "TRTLLMSampler"} matches the new configuration contract and keeps the intended coverage for both sampler backends.

tests/unittest/llmapi/apps/_test_openai_completions.py (1)

386-390: Param update to sampler_type is correct

Using {'sampler_type': "TorchSampler"} and {'sampler_type': "TRTLLMSampler"} aligns with the SamplerType-based API and maintains parity with the chat tests.

tests/unittest/api_stability/references/llm.yaml (1)

114-117: LGTM! API stability test updated for new sampler configuration.

The change correctly reflects the transition from the boolean use_torch_sampler to the enum-based sampler_type parameter with the new annotation and default value.

tensorrt_llm/_torch/pyexecutor/config.py (2)

17-22: LGTM! Well-designed enum with clear naming.

The SamplerType enum provides a clean tri-state configuration for sampler selection with intuitive value names.

69-73: LGTM! Field transition properly implemented with good defaults.

The replacement of the boolean use_torch_sampler with the enum-based sampler_type is well-executed with clear documentation about the auto selection behavior.

tests/integration/defs/disaggregated/test_disaggregated.py (4)

55-56: LGTM! Test configuration updated for new sampler type.

The config map entry correctly reflects the transition from "torch_sampler" to "trtllm_sampler" test configuration.

211-211: LGTM! Chat endpoint condition updated consistently.

The conditional logic correctly includes the new "trtllm_sampler" test descriptor alongside the existing "overlap" condition.

234-234: LGTM! Output verification condition updated consistently.

The output file filtering logic is consistently updated to include the new "trtllm_sampler" test descriptor.

488-503: LGTM! Test function renamed and configured properly.

The test function name change from test_disaggregated_torch_sampler to test_disaggregated_trtllm_sampler correctly reflects the new sampler type being tested.

examples/llm-api/quickstart_advanced.py (2)

68-70: LGTM! CLI argument properly defined for new sampler type.

The argument definition correctly provides the tri-state options with appropriate default and choices matching the SamplerType enum values.

230-230: LGTM! LLM constructor updated correctly.

The parameter change from use_torch_sampler to sampler_type in the LLM constructor correctly uses the new enum-based configuration.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)

199-199: LGTM! Explicit sampler parameter removed correctly.

The removal of the explicit use_torch_sampler=True parameter allows the LLM to use the default sampler selection logic based on the new enum configuration.

1989-1989: LGTM! pytorch_config dictionary updated correctly.

The removal of the explicit use_torch_sampler parameter from the configuration dictionary allows the system to use the default sampler type selection.

2011-2011: LGTM! Another pytorch_config dictionary updated correctly.

Consistent removal of the explicit use_torch_sampler parameter maintains the pattern across test configurations.

tensorrt_llm/llmapi/llm_args.py (2)

2053-2057: LGTM! Well-designed API change from boolean to enum

The migration from use_torch_sampler: bool to sampler_type: SamplerType with the auto-detection mode is a clean API improvement that provides more flexibility and better defaults.

2342-2342: LGTM! Proper parameter propagation to PyTorchConfig

The parameter passing correctly propagates the sampler_type value to the backend configuration.

tests/unittest/_torch/test_return_logits.py (2)

19-19: LGTM! Test parameterization updated correctly

The parameterization properly tests both sampler types using string values that align with the enum options.

44-44: LGTM! Consistent parameter usage across test functions

Both test functions correctly pass the sampler_type parameter to the LLM constructor.

Also applies to: 111-111

tensorrt_llm/_torch/pyexecutor/_util.py (2)

592-609: LGTM! Clean implementation of sampler selection logic

The new sampler instantiation logic is well-structured:

Properly checks for TRTLLMSampler or auto mode with beam search

Includes appropriate debug logging for TRTLLMSampler selection

Falls back to TorchSampler as the default

The removal of the complex auto-override logic in favor of this simpler approach improves code maintainability.

612-630: LGTM! Simplified decoding mode logic

The streamlined get_decoding_mode function is much cleaner, removing unnecessary complexity while maintaining the essential functionality of determining the appropriate decoding mode based on configuration and beam width.

tests/unittest/_torch/test_overlap_scheduler.py (4)

24-27: LGTM! Function signature properly updated

The function signature and implementation correctly use the new sampler_type parameter.

44-44: LGTM! Test parameterization follows the new enum pattern

The parameterization correctly tests both sampler types using their string representations.

52-52: LGTM! Simplified stop_words handling

The unconditional assignment of stop_words is appropriate since the test case structure is consistent regardless of the sampler type.

63-64: LGTM! Consistent sampler_type parameter usage

Both LLM creation calls correctly pass the sampler_type parameter.

Also applies to: 75-76

tensorrt_llm/_torch/pyexecutor/_util.py

tests/unittest/llmapi/test_llm_pytorch.py

tensorrt-cicd · 2025-08-12T15:28:00Z

PR_Github #14987 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #11317 completed with status: 'FAILURE'

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

22-22: Import consolidation handled correctly.

The duplicate import issue has been resolved as mentioned in the previous review comments.

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

593-593: Fix line length formatting issue.

The line exceeds the 120-character limit as flagged by the linter.

-    assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'"
+    assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", (
+        "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'"
+    )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e00501 and 6245ba5.

📒 Files selected for processing (2)

tensorrt_llm/_torch/pyexecutor/_util.py (2 hunks)
tensorrt_llm/llmapi/llm_args.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tensorrt_llm/llmapi/llm_args.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/_torch/pyexecutor/_util.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/_torch/pyexecutor/_util.py

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tensorrt_llm/_torch/pyexecutor/_util.py

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/_util.py

593-593: Line too long (158 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (4)

tensorrt_llm/_torch/pyexecutor/_util.py (4)

591-591: LGTM! Decoding mode extraction improves code organization.

The extraction of decoding mode calculation to a separate variable enhances readability and prepares for the new sampler selection logic.

598-608: BREAKING CHANGE: Sampler selection logic updated to use SamplerType enum.

The implementation correctly replaces the boolean use_torch_sampler with a tri-state SamplerType enum. The logic properly handles:

Explicit SamplerType.TRTLLMSampler selection

Auto mode that chooses TRTLLMSampler for beam search scenarios

Fallback to TorchSampler for other cases

The debug logging for decoding mode provides useful diagnostics.

614-629: Streamlined decoding mode inference logic.

The refactored get_decoding_mode function is much cleaner and more readable than the previous implementation. The logic correctly:

Respects explicitly set decoding modes (when not Auto)

Uses TopKTopP for beam width 1, BeamSearch otherwise

Provides appropriate warning and override when beam width is 1 but BeamSearch is selected

This aligns well with the AI summary's description of simplifying decoding-mode inference.

632-656: Documentation improvement for _try_infer_num_experts.

The enhanced docstring clearly communicates the method's limitations and need for future revision, which is helpful for maintainability.

dcampora · 2025-08-13T06:49:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-13T06:55:16Z

PR_Github #15084 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tensorrt_llm/llmapi/llm_args.py (1)
1965-1970: Enum introduction looks good; consider casing consistency and header nit

The SamplerType definition itself is fine. Two small points:

Value casing is inconsistent with other StrEnums in this module (e.g., STATIC/INFLIGHT are UPPERCASE). If you keep mixed casing for user-facing readability, add normalization (see next comment) to accept common variants.

Coding guideline: prepend NVIDIA copyright header to source files.

Example header to add at the top of the file:
# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6245ba5 and e75304a.

📒 Files selected for processing (3)

tensorrt_llm/_torch/pyexecutor/config.py (2 hunks)
tensorrt_llm/llmapi/llm_args.py (3 hunks)
tests/unittest/api_stability/references/llm.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/unittest/api_stability/references/llm.yaml
tensorrt_llm/_torch/pyexecutor/config.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/llmapi/llm_args.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/llmapi/llm_args.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tensorrt_llm/llmapi/llm_args.py (1)

2348-2349: Verified sampler_type support and no legacy knobs remain

The PyTorchConfig constructor in tensorrt_llm/_torch/pyexecutor/config.py declares sampler_type: SamplerType = SamplerType.auto, and the instantiate_sampler logic in _util.py correctly branches on pytorch_backend_config.sampler_type. A search confirms no use_torch_sampler remnants exist. Everything is wired up as expected.

tensorrt_llm/llmapi/llm_args.py

dcampora · 2025-08-13T07:30:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-13T07:36:34Z

PR_Github #15092 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-13T07:36:36Z

PR_Github #15084 [ run ] completed with state ABORTED

dcampora · 2025-08-13T12:36:27Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-13T12:41:48Z

PR_Github #15140 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-13T12:41:51Z

PR_Github #15092 [ run ] completed with state ABORTED

dcampora · 2025-08-13T16:06:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-13T16:11:50Z

PR_Github #15159 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-13T16:11:53Z

PR_Github #15140 [ run ] completed with state ABORTED

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)
10-21: Remove duplicate ModelConfig import

ModelConfig is imported twice (absolute and relative). Keep only the relative import for consistency and to avoid shadowing.

Apply this diff:
-from tensorrt_llm._torch.model_config import ModelConfig
@@
-from ..model_config import ModelConfig
+from ..model_config import ModelConfig

♻️ Duplicate comments (1)

tensorrt_llm/llmapi/llm_args.py (1)

2064-2068: Make sampler_type robust (aliases, case-insensitive) and add deprecation shim for use_torch_sampler

To minimize breakage and improve UX, please:

Accept common aliases and case-insensitive strings (e.g., "trtllm", "torch", "auto").
Gracefully handle legacy use_torch_sampler by mapping to the new enum and logging a deprecation warning (pre-model validator). This avoids failures when older configs are passed in StrictBaseModel contexts.

Also, enrich the JSON schema metadata for clearer API docs.

Apply this small diff within the field to enrich schema metadata:

     sampler_type: SamplerType = Field(
         default=SamplerType.auto,
         description=
-        "The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested.",
+        "The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless beam search is requested.",
+        json_schema_extra={"type": "Literal['TRTLLMSampler','TorchSampler','auto']"},
         status="prototype")

Add the following validators inside class TorchLlmArgs to normalize inputs and provide a backwards-compatibility shim:

# Place inside class TorchLlmArgs

@model_validator(mode="before")
@classmethod
def _compat_use_torch_sampler(cls, data):
    # Backwards-compat for legacy configs
    if isinstance(data, dict) and "use_torch_sampler" in data:
        uts = data.pop("use_torch_sampler")
        # Only set sampler_type if the new field is not already provided
        if "sampler_type" not in data:
            data["sampler_type"] = (
                SamplerType.TorchSampler if uts else SamplerType.TRTLLMSampler
            )
        logger.warning(
            "use_torch_sampler is deprecated; use sampler_type "
            "(TRTLLMSampler | TorchSampler | auto) instead."
        )
    return data

@field_validator("sampler_type", mode="before")
@classmethod
def _normalize_sampler_type(cls, v):
    if isinstance(v, SamplerType):
        return v
    if isinstance(v, str):
        s = v.strip().lower()
        if s in {"trtllm", "trtllmsampler", "tensorrt-llm", "tensorrtllm"}:
            return SamplerType.TRTLLMSampler
        if s in {"torch", "torchsampler", "pytorch"}:
            return SamplerType.TorchSampler
        if s in {"auto", ""}:
            return SamplerType.auto
    raise ValueError(
        f"Invalid sampler_type: {v}. Allowed values: "
        f"{', '.join(m.value for m in SamplerType)}"
    )

Run this script to find lingering uses of the legacy flag that would benefit from the shim:

#!/bin/bash
# Search for legacy uses that could break StrictBaseModel without the shim
rg -n --hidden --no-ignore -S '\buse_torch_sampler\b' | sed -E 's/^/legacy_usage: /'

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

592-594: Wrap long assert to satisfy linters and readability (E501)

Line exceeds typical 120-char limit; wrap the assertion message.

Apply this diff:

-    if mapping.cp_config.get('cp_type') == 'star_attention':
-        assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'"
+    if mapping.cp_config.get('cp_type') == 'star_attention':
+        assert pytorch_backend_config.attn_backend == "FLASHINFER_STAR_ATTENTION", (
+            "attention backend of star attention should be 'FLASHINFER_STAR_ATTENTION'"
+        )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f96ad39 and 461f7f7.

📒 Files selected for processing (22)

examples/llm-api/quickstart_advanced.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/_util.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/config.py (2 hunks)
tensorrt_llm/llmapi/llm_args.py (4 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (3 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (0 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2 hunks)
tests/integration/defs/disaggregated/test_disaggregated.py (5 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
tests/integration/test_lists/qa/llm_function_sanity.txt (1 hunks)
tests/unittest/_torch/speculative/test_draft_target.py (0 hunks)
tests/unittest/_torch/speculative/test_eagle3.py (0 hunks)
tests/unittest/_torch/test_overlap_scheduler.py (4 hunks)
tests/unittest/_torch/test_return_logits.py (4 hunks)
tests/unittest/api_stability/api_stability_core.py (1 hunks)
tests/unittest/api_stability/references/llm.yaml (1 hunks)
tests/unittest/llmapi/apps/_test_openai_chat.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (0 hunks)
tests/unittest/llmapi/apps/_test_openai_completions.py (1 hunks)
tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py (0 hunks)
tests/unittest/llmapi/apps/utils.py (2 hunks)
tests/unittest/llmapi/test_llm_pytorch.py (1 hunks)

💤 Files with no reviewable changes (5)

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
tests/unittest/_torch/speculative/test_eagle3.py
tests/unittest/llmapi/apps/_test_trtllm_serve_multimodal_example.py
tests/unittest/_torch/speculative/test_draft_target.py
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml

🚧 Files skipped from review as they are similar to previous changes (14)

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml
tests/unittest/_torch/test_overlap_scheduler.py
tests/unittest/llmapi/apps/utils.py
tests/unittest/llmapi/test_llm_pytorch.py
tests/unittest/llmapi/apps/_test_openai_completions.py
examples/llm-api/quickstart_advanced.py
tests/unittest/api_stability/references/llm.yaml
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/defs/disaggregated/test_disaggregated.py
tests/unittest/_torch/test_return_logits.py
tests/unittest/llmapi/apps/_test_openai_chat.py
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tensorrt_llm/_torch/pyexecutor/config.py
tests/integration/test_lists/qa/llm_function_sanity.txt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tests/unittest/api_stability/api_stability_core.py
tensorrt_llm/llmapi/llm_args.py
tensorrt_llm/_torch/pyexecutor/_util.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tests/unittest/api_stability/api_stability_core.py
tensorrt_llm/llmapi/llm_args.py
tensorrt_llm/_torch/pyexecutor/_util.py

🧠 Learnings (1)

📚 Learning: 2025-08-01T15:14:45.673Z

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tensorrt_llm/_torch/pyexecutor/_util.py

🧬 Code Graph Analysis (1)

tensorrt_llm/_torch/pyexecutor/_util.py (2)

cpp/include/tensorrt_llm/executor/types.h (2)

tensorrt_llm (36-39)

tensorrt_llm (41-663)

tensorrt_llm/llmapi/llm_args.py (1)

SamplerType (1950-1954)

🪛 Ruff (0.12.2)

tests/unittest/api_stability/api_stability_core.py

30-30: Multiple statements on one line (colon)

(E701)

tensorrt_llm/_torch/pyexecutor/_util.py

593-593: Line too long (158 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (5)

tensorrt_llm/llmapi/llm_args.py (2)

1970-1975: LGTM: Public SamplerType enum is clear and scoped correctly

Enum names/values are self-explanatory and align with downstream usage.

2353-2354: LGTM: Propagating sampler_type to PyTorchConfig

Correctly threads the new enum through to backend config.

tests/unittest/api_stability/api_stability_core.py (1)

30-30: LGTM: Importing SamplerType into test namespace

This ensures eval/annotation in references YAML can resolve SamplerType during API stability checks.

tensorrt_llm/_torch/pyexecutor/_util.py (2)

591-605: Sampler selection logic aligns with new sampler_type and decoding mode

Honors explicit TRTLLMSampler.

Honors auto by selecting TRTLLM for beam search and Torch for others.

Preserves spec-decoder path and star-attention guard.

Looks correct and keeps behavior predictable.

611-629: LGTM: Simplified, consistent get_decoding_mode

Respects explicit non-Auto decoding_mode.

Uses TopKTopP for beam_width==1; BeamSearch otherwise.

Downgrades BeamSearch to TopKTopP when beam_width==1 with a clear warning.

Matches stated behavior and avoids confusing combinations.

tensorrt_llm/llmapi/llm_args.py

dcampora · 2025-08-14T10:59:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-14T11:05:06Z

PR_Github #15290 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-14T11:05:08Z

PR_Github #15271 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-08-14T15:52:00Z

PR_Github #15290 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11543 completed with status: 'FAILURE'

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Signed-off-by: Daniel Campora <[email protected]>

Co-authored-by: Enwei Zhu <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

dcampora · 2025-08-15T13:00:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-15T13:05:34Z

PR_Github #15444 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-16T04:27:22Z

PR_Github #15444 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11638 completed with status: 'SUCCESS'

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

dcampora requested review from a team as code owners August 12, 2025 14:52

dcampora requested review from Naveassaf, Superjomn, chzblych and kaiyux August 12, 2025 14:52

DomBrown approved these changes Aug 12, 2025

View reviewed changes

coderabbitai bot reviewed Aug 12, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/_util.py Outdated Show resolved Hide resolved

tests/unittest/llmapi/test_llm_pytorch.py Show resolved Hide resolved

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

tensorrt_llm/llmapi/llm_args.py Outdated Show resolved Hide resolved

litaotju approved these changes Aug 13, 2025

View reviewed changes

dcampora force-pushed the user/dcampora/introduce_sampler_type branch from f96ad39 to 461f7f7 Compare August 13, 2025 16:06

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

tensorrt_llm/llmapi/llm_args.py Outdated Show resolved Hide resolved

litaotju removed the weekly_release_blocker label Aug 15, 2025

litaotju reviewed Aug 15, 2025

View reviewed changes

tests/integration/defs/accuracy/test_llm_api_pytorch.py Show resolved Hide resolved

dcampora and others added 11 commits August 15, 2025 12:31

Introduced sampler_type. Use auto by default.

1a51992

Signed-off-by: Daniel Campora <[email protected]>

Remove imports.

692aa76

Signed-off-by: Daniel Campora <[email protected]>

Move SamplerType to llm args.

60e1bd6

Signed-off-by: Daniel Campora <[email protected]>

Fix llm function name.

604be0d

Signed-off-by: Daniel Campora <[email protected]>

Trying to get api stability core test to pass.

b66e0b0

Signed-off-by: Daniel Campora <[email protected]>

Use strenum instead of enum.

7c3c8fe

Signed-off-by: Daniel Campora <[email protected]>

Fix utils.py so that it supports enum str, fix by Enwei.

2f50efe

Signed-off-by: Daniel Campora <[email protected]>

Accept str as well.

5d17cf9

Signed-off-by: Daniel Campora <[email protected]>

Fix nemotron test.

7938d02

Signed-off-by: Daniel Campora <[email protected]>

Update tests/unittest/api_stability/references/llm.yaml

157fec7

Co-authored-by: Enwei Zhu <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]>

Update tests to not require use_torch_sampler.

6ce8051

Signed-off-by: Daniel Campora <[email protected]>

dcampora force-pushed the user/dcampora/introduce_sampler_type branch from cc5d0ce to 6ce8051 Compare August 15, 2025 12:39

dcampora merged commit 53312ee into NVIDIA:main Aug 16, 2025
4 checks passed

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025

[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sa…

c139452

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025

[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sa…

67937c2

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025

[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sa…

09f4446

…mpler according to options (NVIDIA#6831) Signed-off-by: Daniel Campora <[email protected]>

coderabbitai bot mentioned this pull request Aug 27, 2025

[TRTLLM-7728][feat] batched sampling by strategy (supersedes enable_mixed_sampler, cf. TRTLLM-7156) #7294

Merged

[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options #6831

[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options #6831

Uh oh!

Conversation

dcampora commented Aug 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

dcampora commented Aug 12, 2025

Uh oh!

tensorrt-cicd commented Aug 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Aug 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dcampora commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcampora commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

dcampora commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

dcampora commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcampora commented Aug 14, 2025

Uh oh!

tensorrt-cicd commented Aug 14, 2025

Uh oh!

tensorrt-cicd commented Aug 14, 2025

Uh oh!

tensorrt-cicd commented Aug 14, 2025

dcampora commented Aug 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 12, 2025 •

edited

Loading