Skip to content

Conversation

Funatiq
Copy link
Collaborator

@Funatiq Funatiq commented Jul 7, 2025

Description

  • Remove decoder_batch::Input and related interfaces
    • Modify MakeDecodingBatchInputOutput to accept DecoderInputBuffers directly, simplifying the interface.
    • Update GptDecoderBatched to work with the new DecoderInputBuffers structure.
  • Adjust Python bindings to reflect changes in the C++ interface.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

  • New Features

    • Added Python bindings for decoder buffer management classes, enabling direct access to decoder input/output buffers and slot buffers from Python.
    • Exposed buffer structures for easier integration and manipulation in Python-based workflows.
  • Refactor

    • Simplified and streamlined decoder input preparation by updating interfaces to modify buffers in place instead of returning new objects.
    • Unified and clarified buffer member usage and naming for logits and decoding steps across the codebase.
    • Removed deprecated or redundant classes and bindings, improving maintainability and consistency.
    • Updated decoder and runtime interfaces to use the new buffer structures, removing legacy input classes.
  • Bug Fixes

    • Updated internal references to buffer members to ensure correct handling of logits and decoding steps.
  • Chores

    • Integrated new buffer bindings into module initialization for both pybind11 and nanobind backends.
    • Added new source files for buffer bindings to build configurations.

@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from 965d11e to d00ffaa Compare July 7, 2025 19:51
@Funatiq
Copy link
Collaborator Author

Funatiq commented Jul 8, 2025

/bot run --stage-list "A10-CPP-1, A30-CPP-1, A30-CPP-2, A30-CPP-3, H100_PCIe-CPP-1, H100_PCIe-CPP-2"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11302 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11302 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8357 (Partly Tested) completed with status: 'FAILURE'

@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from d00ffaa to d2d3482 Compare July 8, 2025 13:43
@Funatiq
Copy link
Collaborator Author

Funatiq commented Jul 8, 2025

/bot run --stage-list "A10-CPP-1, A30-CPP-1, A30-CPP-2, A30-CPP-3, H100_PCIe-CPP-1, H100_PCIe-CPP-2"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11310 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11310 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8364 (Partly Tested) completed with status: 'FAILURE'

@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch 3 times, most recently from 8df6d26 to d7923e5 Compare July 14, 2025 20:13
@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from d7923e5 to 079793f Compare July 18, 2025 14:30
Copy link
Contributor

coderabbitai bot commented Jul 18, 2025

📝 Walkthrough

Walkthrough

This change refactors how decoder input buffers and logits are managed and exposed in both C++ and Python bindings. It eliminates the decoder_batch::Input class, updates interfaces to use DecoderInputBuffers directly, and adjusts related logic to operate on buffer objects in place. Python and nanobind bindings for buffer classes are restructured, with new files added for buffer bindings and previous bindings removed or relocated.

Changes

Cohort / File(s) Change Summary
DecoderInputBuffers Refactor
cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h
Added new type alias and members to DecoderInputBuffers, refactored logits storage, clarified comments, and supported multi-step decoding.
Decoder Batch Input Interface Simplification
cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h, cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp
Changed interface to operate on DecoderInputBuffers in place, removed ownership semantics, simplified method signatures, and updated internal logic.
GptDecoderBatched Interface Update
cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h, cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp
Updated method signatures to accept DecoderInputBuffers instead of decoder_batch::Input, adjusted internal member accesses, and updated includes.
Removal of decoder_batch::Input
cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h
Removed decoder_batch::Input class, updated interface methods to use DecoderInputBuffers.
Logits Container Usage Updates
cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp, cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp, cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp, cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp
Updated references to logits containers to use decoderLogits member in DecoderInputBuffers.
Inflight Batching Refactor
cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp, cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
Removed mDecodingInputs member and related logic, updated to operate directly on buffer objects.
Test Adjustments
cpp/tests/batch_manager/guidedDecoderTest.cpp, cpp/tests/runtime/gptDecoderBatchedTest.cpp
Updated tests to use new buffer member names and in-place buffer modification pattern.
Pybind Buffer Bindings Addition
cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp, cpp/tensorrt_llm/pybind/batch_manager/buffers.h, cpp/tensorrt_llm/pybind/bindings.cpp
Added new files and code to expose buffer classes and members to Python via pybind11.
Pybind Buffer Bindings Removal/Refactor
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp, cpp/tensorrt_llm/pybind/runtime/bindings.cpp
Removed previous buffer class bindings and decoder batch input binding, updated function signatures and argument order, adjusted BufferManager binding location.
Nanobind Buffer Bindings Addition
cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp, cpp/tensorrt_llm/nanobind/batch_manager/buffers.h, cpp/tensorrt_llm/nanobind/bindings.cpp
Introduced nanobind bindings for buffer classes, exposed members, and integrated into module initialization.
Nanobind Buffer Bindings Removal/Refactor
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp, cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
Removed previous buffer class bindings and decoder batch input binding, updated BufferManager binding location.
CMake Build Update
cpp/tensorrt_llm/nanobind/CMakeLists.txt
Added new source file for buffer bindings to build process.
Python Executor and Sampler Refactor
tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py, tensorrt_llm/_torch/pyexecutor/sampler.py
Changed batch input creation to mutate buffers in place, updated interface usage, and removed intermediate decoding input storage.
Miscellaneous Minor Updates
cpp/tests/executor/executorTest.cpp
Added logging of beam width during token comparison tests.

Sequence Diagram(s)

sequenceDiagram
    participant Python
    participant Pybind/Nanobind
    participant DecoderInputBuffers
    participant DecoderState
    participant Decoder

    Python->>Pybind/Nanobind: Call make_decoding_batch_input(decoderInputBuffers, decoderState, ...)
    Pybind/Nanobind->>DecoderInputBuffers: Mutate logits, batch slots, etc. in place
    Pybind/Nanobind->>DecoderState: Update generation steps, etc.
    Python->>Decoder: Call forward_async(decoderInputBuffers, decoderState)
    Decoder->>DecoderInputBuffers: Access batchLogits, batch slots, etc.
    Decoder->>DecoderState: Use state for decoding
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • feat: nanobind bindings #6185: Introduces nanobind bindings and refactors batch manager classes including DecoderInputBuffers, removing decoder_batch::Input and updating related APIs, closely related to the current changes.

Suggested reviewers

  • DomBrown
  • chzblych

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b803b0 and 9c5a145.

📒 Files selected for processing (11)
  • cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (2 hunks)
  • cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h (1 hunks)
  • cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h (2 hunks)
  • cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (2 hunks)
  • cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp (5 hunks)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h (0 hunks)
💤 Files with no reviewable changes (1)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp
🚧 Files skipped from review as they are similar to previous changes (9)
  • cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp
  • cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp
  • cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp
  • cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
  • cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h
  • cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h
  • cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h
  • cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tensorrt_llm/_torch/pyexecutor/sampler.py (1)

659-677: Remove commented code and finalize the interface transition.

The commented out code indicates a transitional state. Since the new make_decoding_batch_input interface is working correctly, the commented code should be removed to clean up the codebase.

The new implementation correctly uses the updated interface with separated context and generation requests and in-place modification of DecoderInputBuffers.

-        # TODO: Enable this back once nanobind is merged and/or llm request is a pure python object
-        # self.algs.make_decoding_batch_input_output(
-        #     self.store["decoder_input_buffers"][self.micro_batch_idx],
-        #     self.store["decoder_state"],
-        #     scheduled_requests,
-        #     model_outputs["logits"],
-        #     beam_width,
-        #     num_context_logits_prefix_sum,
-        # )
-
         make_decoding_batch_input(
tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py (1)

22-23: Add type annotation for decoder_state parameter.

For consistency with other parameters and to improve code clarity, consider adding a type annotation for the decoder_state parameter.

-        decoder_input_buffers: DecoderInputBuffers,
-        decoder_state,
+        decoder_input_buffers: DecoderInputBuffers,
+        decoder_state: 'DecoderState',

Note: You may need to import the appropriate DecoderState class or use a string annotation if there are circular import concerns.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44040ed and 079793f.

📒 Files selected for processing (19)
  • cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (2 hunks)
  • cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h (1 hunks)
  • cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h (2 hunks)
  • cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (2 hunks)
  • cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp (5 hunks)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h (0 hunks)
  • cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp (2 hunks)
  • cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/pybind/runtime/bindings.cpp (3 hunks)
  • cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (7 hunks)
  • cpp/tests/batch_manager/guidedDecoderTest.cpp (1 hunks)
  • cpp/tests/runtime/gptDecoderBatchedTest.cpp (6 hunks)
  • tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py (3 hunks)
  • tensorrt_llm/_torch/pyexecutor/sampler.py (2 hunks)
💤 Files with no reviewable changes (1)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
🧰 Additional context used
🧬 Code Graph Analysis (2)
cpp/tensorrt_llm/pybind/runtime/bindings.cpp (2)
cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp (1)
  • batchLogits (55-55)
cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (1)
  • maxDecoderSteps (69-69)
cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (2)
cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (1)
  • DecoderInputBuffers (36-82)
cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (4)
  • forwardAsync (172-191)
  • forwardAsync (172-173)
  • forward (193-199)
  • forward (193-193)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (31)
cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp (1)

88-88: LGTM: Correct member access update

The change from inputBuffers.logits to inputBuffers.decoderLogits correctly aligns with the refactoring to separate decoder logits from batch logits in DecoderInputBuffers.

cpp/tests/batch_manager/guidedDecoderTest.cpp (1)

140-140: LGTM: Test updated for new interface

The test correctly updates to use decoderInputBuffers.decoderLogits instead of the old logits member, ensuring compatibility with the refactored DecoderInputBuffers structure.

cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp (1)

166-166: LGTM: Correct logits access update

The change from inputBuffers.logits.at(requestIdx) to inputBuffers.decoderLogits.at(requestIdx) correctly updates the guided decoder to use the new decoder logits container in DecoderInputBuffers.

cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1)

52-52: LGTM: Correct logits container update

The change from inputBuffers.logits.at(batchIdx) to inputBuffers.decoderLogits.at(batchIdx) correctly updates the logits post-processor to use the new decoder logits container in DecoderInputBuffers.

cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp (1)

82-82: LGTM: Consistent decoder logits access

The change from inputBuffers.logits to inputBuffers.decoderLogits correctly updates the context logits handler to use the new decoder logits container, maintaining consistency with the refactoring across all batch manager components.

cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp (2)

23-23: LGTM! Include addition is necessary for the refactoring.

The include for decoderBuffers.h is correctly added to support the direct use of DecoderInputBuffers class in the refactored interface.


137-138: LGTM! Method signature simplification aligns with the refactoring.

The removal of the max_num_sequences parameter is consistent with the shift from creating new decoding input objects to modifying existing DecoderInputBuffers in place. This simplifies the interface and reduces parameter overhead.

cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp (1)

2060-2062: LGTM! Clean refactoring that eliminates intermediate abstraction.

The changes successfully remove the decoder_batch::Input abstraction by:

  1. Having MakeDecodingBatchInputOutput modify decoderInputBuffers in-place rather than creating new objects
  2. Passing decoderInputBuffers directly to forwardAsync instead of using intermediate input objects

This simplifies the interface and eliminates the need to manage a vector of unique pointers to decoding input objects, which aligns perfectly with the PR's refactoring objectives.

cpp/tests/runtime/gptDecoderBatchedTest.cpp (4)

45-45: LGTM: TensorPtr alias updated to reflect interface refactoring.

The change from decoder_batch::Input::TensorPtr to ITensor::SharedPtr correctly aligns with the removal of the decoder_batch::Input class and simplifies the type alias.


355-357: LGTM: Updated to use in-place modification of DecoderInputBuffers.

The refactoring correctly updates the test to use the new interface where MakeDecodingBatchInputOutput::createDecoderBatchInputs modifies inputBuffers in place, and decoder.forward accepts DecoderInputBuffers directly. This simplifies the ownership semantics and aligns with the broader refactoring.


484-486: LGTM: Consistent interface changes in testDecoderWavefront.

The changes follow the same pattern as in testDecoder, correctly updating to use in-place modification of DecoderInputBuffers and direct parameter passing to decoder.forward.


641-643: LGTM: Consistent interface changes in testDecoderDraft.

The changes follow the same pattern as in the other test functions, correctly updating to use the new DecoderInputBuffers interface.

cpp/tensorrt_llm/pybind/runtime/bindings.cpp (3)

20-20: LGTM: Added necessary header for DecoderInputBuffers.

The inclusion of tensorrt_llm/batch_manager/decoderBuffers.h is required for the new DecoderInputBuffers Python binding.


52-52: LGTM: Added namespace alias for batch_manager.

The namespace alias tb for tensorrt_llm::batch_manager simplifies the code and follows the existing naming pattern.


381-393: LGTM: New Python binding for DecoderInputBuffers.

The new binding correctly exposes the DecoderInputBuffers class with its constructor and necessary member variables. This replaces the previous decoder_batch::Input binding and provides the required interface for the Python layer.

cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h (2)

55-58: LGTM: Updated method signatures to use DecoderInputBuffers.

The forwardAsync and forward method signatures have been correctly updated to use batch_manager::DecoderInputBuffers const& input instead of the previous decoder_batch::Input. The const-correctness is maintained appropriately.


81-81: LGTM: Updated private method signature for consistency.

The forwardDispatch method signature has been correctly updated to use batch_manager::DecoderInputBuffers const& input, maintaining consistency with the public interface changes.

tensorrt_llm/_torch/pyexecutor/sampler.py (3)

18-18: LGTM: Updated import to use new interface.

The import change from make_decoding_batch_input_output to make_decoding_batch_input correctly reflects the new interface that modifies DecoderInputBuffers in place.


20-20: LGTM: Added import for DecoderInputBuffers.

The import of DecoderInputBuffers from the runtime module is necessary for the updated interface.


679-682: LGTM: Updated decoder forward call to use DecoderInputBuffers directly.

The call to decoder.forward_async has been correctly updated to pass the DecoderInputBuffers directly, which aligns with the new interface that eliminates the need for unique pointer dereferencing.

tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py (1)

65-70: LGTM! The in-place modification pattern is correctly implemented.

The changes properly update the decoder state and input buffers without creating new objects, which aligns with the refactoring objectives.

cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (2)

41-41: Good addition of const tensor pointer type alias.

The TensorConstPtr type alias improves code clarity by explicitly distinguishing between mutable and immutable tensor pointers.


67-76: Well-structured changes for multi-step decoding support.

The addition of maxDecoderSteps with a sensible default and the restructuring of batchLogits as a nested vector of const pointers properly supports multiple decoding steps while maintaining immutability semantics.

cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (1)

116-118: Member access correctly updated for the new buffer structure.

The changes properly access forwardBatchSlots and batchLogits using the step index, which aligns with the multi-step decoding support in DecoderInputBuffers.

cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (1)

30-32: Clean interface refactoring with proper forward declarations.

The removal of decoder_batch::Input and replacement with batch_manager::DecoderInputBuffers simplifies the interface while the forward declaration maintains proper dependency management.

Also applies to: 61-67

cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp (1)

49-49: Good improvement in batch slot sizing.

Using activeSlots.size() instead of maxNumSequences ensures the batch slots are sized exactly to the number of active requests, which is more efficient.

cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h (2)

43-43: LGTM!

The type alias change correctly reflects the removal of the decoder_batch::Input dependency, simplifying the interface.


49-53: Interface changes align with in-place modification pattern.

The change from returning std::unique_ptr<runtime::decoder_batch::Input> to void and directly modifying DecoderInputBuffers& is consistent with the refactoring objectives. This eliminates unnecessary allocations and ownership transfers.

cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (3)

441-444: Python binding correctly updated to match C++ interface changes.

The function signature now properly accepts DecoderInputBuffers& and DecoderState& as parameters, maintaining consistency with the C++ API changes.


446-513: Implementation correctly handles both context and generation requests.

The function properly:

  • Processes only the last chunk of context requests
  • Handles beam width tiling appropriately for multi-beam search
  • Filters generation requests based on their state
  • Manages batch slots consistently with active slots

The in-place modification pattern is correctly implemented throughout.


502-502: No changes needed: batchLogits is the correct member.

The DecoderInputBuffers class defines two distinct members:

  • std::vector<TensorPtr> decoderLogits;
  • std::vector<std::vector<TensorConstPtr>> batchLogits;

The assignment on line 502 (decoderInputBuffers.batchLogits = logitsVec;) correctly targets the batched‐logits member. The decoderLogits vector is used elsewhere for per-request logits, so there’s no inconsistency here.

@Funatiq
Copy link
Collaborator Author

Funatiq commented Jul 29, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13383 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13383 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10009 completed with status: 'FAILURE'

@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from 079793f to 3ebe76b Compare July 29, 2025 17:21
@coderabbitai coderabbitai bot requested review from chzblych, DomBrown and syuoni July 29, 2025 17:22
@Funatiq
Copy link
Collaborator Author

Funatiq commented Jul 29, 2025

/bot run --stage-list "A10-CPP-1, A30-CPP-1, A30-CPP-2, A30-CPP-3, H100_PCIe-CPP-1, H100_PCIe-CPP-2"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13400 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13400 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10023 (Partly Tested) completed with status: 'FAILURE'

@Funatiq Funatiq removed request for chzblych and syuoni August 1, 2025 10:13
@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from 3ebe76b to 6522b34 Compare August 1, 2025 10:14
@Funatiq Funatiq changed the title refactor: decoding inputs, part 2 [None][refactor] decoding inputs, part 2 Aug 1, 2025
@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from 6522b34 to 7654d08 Compare August 1, 2025 10:36
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp (1)

35-36: Missing namespace closing comment.

According to the coding guidelines, closing braces of namespaces should have a comment saying the namespace it closes.

Apply this diff to add the missing namespace closing comment:

 namespace tensorrt_llm::pybind::batch_manager
 {

The closing brace at line 74 should be:

-} // namespace tensorrt_llm::pybind::batch_manager
+} // namespace tensorrt_llm::pybind::batch_manager
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (1)

486-486: Missing namespace closing comment.

The closing brace should include a comment indicating which namespace it closes.

Apply this diff:

-} // namespace tensorrt_llm::pybind::batch_manager
+} // namespace tensorrt_llm::pybind::batch_manager
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1)

480-480: Missing namespace closing comment.

The closing brace should include a comment indicating which namespace it closes.

Apply this diff:

-} // namespace tensorrt_llm::nanobind::batch_manager
+} // namespace tensorrt_llm::nanobind::batch_manager
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6522b34 and 7654d08.

📒 Files selected for processing (29)
  • cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (2 hunks)
  • cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h (1 hunks)
  • cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h (2 hunks)
  • cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (2 hunks)
  • cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp (5 hunks)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp (1 hunks)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h (0 hunks)
  • cpp/tensorrt_llm/nanobind/CMakeLists.txt (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/CMakeLists.txt (1 hunks)
  • cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/batch_manager/buffers.h (1 hunks)
  • cpp/tensorrt_llm/pybind/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/pybind/runtime/bindings.cpp (3 hunks)
  • cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (7 hunks)
  • cpp/tests/batch_manager/guidedDecoderTest.cpp (1 hunks)
  • cpp/tests/runtime/gptDecoderBatchedTest.cpp (6 hunks)
  • tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py (3 hunks)
  • tensorrt_llm/_torch/pyexecutor/sampler.py (1 hunks)
💤 Files with no reviewable changes (1)
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
✅ Files skipped from review due to trivial changes (5)
  • cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp
  • cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp
  • cpp/tensorrt_llm/nanobind/CMakeLists.txt
  • cpp/tensorrt_llm/pybind/CMakeLists.txt
  • cpp/tensorrt_llm/pybind/bindings.cpp
🚧 Files skipped from review as they are similar to previous changes (20)
  • cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp
  • cpp/tensorrt_llm/nanobind/bindings.cpp
  • cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp
  • cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp
  • tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py
  • cpp/tensorrt_llm/pybind/runtime/bindings.cpp
  • cpp/tests/batch_manager/guidedDecoderTest.cpp
  • cpp/tensorrt_llm/pybind/batch_manager/buffers.h
  • cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
  • cpp/tests/runtime/gptDecoderBatchedTest.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
  • cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
  • cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h
  • cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h
  • tensorrt_llm/_torch/pyexecutor/sampler.py
  • cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h
  • cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp
  • cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic...

Files:

  • cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp
  • cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp
  • cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp (4)

1-16: LGTM! Copyright header is compliant.

The NVIDIA copyright header includes the current year (2024) and follows the required format for TensorRT-LLM Open Source Software code.


40-51: LGTM! DecoderInputBuffers binding is well-structured.

The binding correctly exposes the constructor and all member variables with appropriate read-write access. The parameter names and types align with the C++ interface.


53-60: LGTM! DecoderOutputBuffers binding is correctly implemented.

The binding appropriately exposes read-write access to buffer members. The special handling for new_output_tokens_host using a lambda and tr::Torch::tensor conversion is correct for exposing the underlying tensor.


62-72: LGTM! SlotDecoderBuffers binding is properly structured.

The constructor parameters and member bindings are correctly exposed with appropriate access patterns.

cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (2)

1-16: LGTM! Copyright header is updated correctly.

The copyright header includes the updated year range (2022-2025) which is appropriate for the current changes.


410-483: Function refactoring aligns with PR objectives.

The make_decoding_batch_input function has been successfully refactored to:

  • Accept DecoderInputBuffers& and DecoderState& as mutable references (in-place modification)
  • Remove the return of std::unique_ptr<decoder_batch::Input>
  • Assign logits directly to decoderInputBuffers.batchLogits

The logic correctly processes context and generation requests, handles beam width tiling, and populates the batch slots. This aligns perfectly with the PR objective of removing the decoder_batch::Input class.

cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)

1-16: LGTM! Copyright header is properly updated.

The copyright header includes the updated year range (2022-2025) consistent with the pybind11 version.


404-477: Excellent consistency with pybind11 implementation.

The nanobind version of make_decoding_batch_input correctly mirrors the pybind11 implementation:

  • Same function signature with mutable references
  • Identical logic for processing context and generation requests
  • Proper handling of beam width and logits tiling
  • Consistent in-place modification of decoderInputBuffers.batchLogits

The use of nanobind-specific syntax (nb::arg) is correct.

@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from 7654d08 to 5b803b0 Compare August 6, 2025 16:23
Funatiq added 8 commits August 7, 2025 14:46
- Modified MakeDecodingBatchInputOutput to accept DecoderInputBuffers directly, simplifying the interface.
- Removed decoder_batch::Input and related interfaces.
- Updated GptDecoderBatched to work with the new DecoderInputBuffers structure.
- Adjusted Python bindings to reflect changes in the C++ interface.

Signed-off-by: Robin Kobus <[email protected]>
- Modified MakeDecodingBatchInputOutput to accept DecoderInputBuffers directly, simplifying the interface.
- Removed decoder_batch::Input and related interfaces.
- Updated GptDecoderBatched to work with the new DecoderInputBuffers structure.
- Adjusted Python bindings to reflect changes in the C++ interface.

Signed-off-by: Robin Kobus <[email protected]>
…tOutput

- Removed maxNumSequences parameter from createDecoderBatchInputs and related function calls, streamlining the interface.
- Updated all relevant implementations and tests to reflect the changes in function signatures.

Signed-off-by: Robin Kobus <[email protected]>
- Move bindings for `DecoderInputBuffers`, `DecoderOutputBuffers`, and `SlotDecoderBuffers` to `buffers.cpp`.
- Adjust related function calls and Python bindings to reflect the new structure.

Signed-off-by: Robin Kobus <[email protected]>
Signed-off-by: Robin Kobus <[email protected]>
- Introduced logging to track the current beam width during token comparison tests, enhancing test traceability.

Signed-off-by: Robin Kobus <[email protected]>
…s beam width

- Modified the ChangeBeamWidth test to use the same number of return sequences as the beam width, improving test consistency.

Signed-off-by: Robin Kobus <[email protected]>
@Funatiq Funatiq force-pushed the dev/refactor_decoder_inputs_2 branch from 5b803b0 to 9c5a145 Compare August 7, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants