-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][refactor] decoding inputs, part 2 #5799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
965d11e
to
d00ffaa
Compare
/bot run --stage-list "A10-CPP-1, A30-CPP-1, A30-CPP-2, A30-CPP-3, H100_PCIe-CPP-1, H100_PCIe-CPP-2" |
PR_Github #11302 [ run ] triggered by Bot |
PR_Github #11302 [ run ] completed with state |
d00ffaa
to
d2d3482
Compare
/bot run --stage-list "A10-CPP-1, A30-CPP-1, A30-CPP-2, A30-CPP-3, H100_PCIe-CPP-1, H100_PCIe-CPP-2" |
PR_Github #11310 [ run ] triggered by Bot |
PR_Github #11310 [ run ] completed with state |
8df6d26
to
d7923e5
Compare
d7923e5
to
079793f
Compare
📝 WalkthroughWalkthroughThis change refactors how decoder input buffers and logits are managed and exposed in both C++ and Python bindings. It eliminates the Changes
Sequence Diagram(s)sequenceDiagram
participant Python
participant Pybind/Nanobind
participant DecoderInputBuffers
participant DecoderState
participant Decoder
Python->>Pybind/Nanobind: Call make_decoding_batch_input(decoderInputBuffers, decoderState, ...)
Pybind/Nanobind->>DecoderInputBuffers: Mutate logits, batch slots, etc. in place
Pybind/Nanobind->>DecoderState: Update generation steps, etc.
Python->>Decoder: Call forward_async(decoderInputBuffers, decoderState)
Decoder->>DecoderInputBuffers: Access batchLogits, batch slots, etc.
Decoder->>DecoderState: Use state for decoding
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (11)
💤 Files with no reviewable changes (1)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tensorrt_llm/_torch/pyexecutor/sampler.py (1)
659-677
: Remove commented code and finalize the interface transition.The commented out code indicates a transitional state. Since the new
make_decoding_batch_input
interface is working correctly, the commented code should be removed to clean up the codebase.The new implementation correctly uses the updated interface with separated context and generation requests and in-place modification of
DecoderInputBuffers
.- # TODO: Enable this back once nanobind is merged and/or llm request is a pure python object - # self.algs.make_decoding_batch_input_output( - # self.store["decoder_input_buffers"][self.micro_batch_idx], - # self.store["decoder_state"], - # scheduled_requests, - # model_outputs["logits"], - # beam_width, - # num_context_logits_prefix_sum, - # ) - make_decoding_batch_input(tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py (1)
22-23
: Add type annotation fordecoder_state
parameter.For consistency with other parameters and to improve code clarity, consider adding a type annotation for the
decoder_state
parameter.- decoder_input_buffers: DecoderInputBuffers, - decoder_state, + decoder_input_buffers: DecoderInputBuffers, + decoder_state: 'DecoderState',Note: You may need to import the appropriate
DecoderState
class or use a string annotation if there are circular import concerns.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h
(2 hunks)cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h
(1 hunks)cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h
(2 hunks)cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h
(2 hunks)cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp
(5 hunks)cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
(0 hunks)cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp
(2 hunks)cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp
(2 hunks)cpp/tensorrt_llm/pybind/runtime/bindings.cpp
(3 hunks)cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp
(7 hunks)cpp/tests/batch_manager/guidedDecoderTest.cpp
(1 hunks)cpp/tests/runtime/gptDecoderBatchedTest.cpp
(6 hunks)tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py
(3 hunks)tensorrt_llm/_torch/pyexecutor/sampler.py
(2 hunks)
💤 Files with no reviewable changes (1)
- cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
🧰 Additional context used
🧬 Code Graph Analysis (2)
cpp/tensorrt_llm/pybind/runtime/bindings.cpp (2)
cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp (1)
batchLogits
(55-55)cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (1)
maxDecoderSteps
(69-69)
cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (2)
cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (1)
DecoderInputBuffers
(36-82)cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (4)
forwardAsync
(172-191)forwardAsync
(172-173)forward
(193-199)forward
(193-193)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (31)
cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp (1)
88-88
: LGTM: Correct member access updateThe change from
inputBuffers.logits
toinputBuffers.decoderLogits
correctly aligns with the refactoring to separate decoder logits from batch logits inDecoderInputBuffers
.cpp/tests/batch_manager/guidedDecoderTest.cpp (1)
140-140
: LGTM: Test updated for new interfaceThe test correctly updates to use
decoderInputBuffers.decoderLogits
instead of the oldlogits
member, ensuring compatibility with the refactoredDecoderInputBuffers
structure.cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp (1)
166-166
: LGTM: Correct logits access updateThe change from
inputBuffers.logits.at(requestIdx)
toinputBuffers.decoderLogits.at(requestIdx)
correctly updates the guided decoder to use the new decoder logits container inDecoderInputBuffers
.cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1)
52-52
: LGTM: Correct logits container updateThe change from
inputBuffers.logits.at(batchIdx)
toinputBuffers.decoderLogits.at(batchIdx)
correctly updates the logits post-processor to use the new decoder logits container inDecoderInputBuffers
.cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp (1)
82-82
: LGTM: Consistent decoder logits accessThe change from
inputBuffers.logits
toinputBuffers.decoderLogits
correctly updates the context logits handler to use the new decoder logits container, maintaining consistency with the refactoring across all batch manager components.cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp (2)
23-23
: LGTM! Include addition is necessary for the refactoring.The include for
decoderBuffers.h
is correctly added to support the direct use ofDecoderInputBuffers
class in the refactored interface.
137-138
: LGTM! Method signature simplification aligns with the refactoring.The removal of the
max_num_sequences
parameter is consistent with the shift from creating new decoding input objects to modifying existingDecoderInputBuffers
in place. This simplifies the interface and reduces parameter overhead.cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp (1)
2060-2062
: LGTM! Clean refactoring that eliminates intermediate abstraction.The changes successfully remove the
decoder_batch::Input
abstraction by:
- Having
MakeDecodingBatchInputOutput
modifydecoderInputBuffers
in-place rather than creating new objects- Passing
decoderInputBuffers
directly toforwardAsync
instead of using intermediate input objectsThis simplifies the interface and eliminates the need to manage a vector of unique pointers to decoding input objects, which aligns perfectly with the PR's refactoring objectives.
cpp/tests/runtime/gptDecoderBatchedTest.cpp (4)
45-45
: LGTM: TensorPtr alias updated to reflect interface refactoring.The change from
decoder_batch::Input::TensorPtr
toITensor::SharedPtr
correctly aligns with the removal of thedecoder_batch::Input
class and simplifies the type alias.
355-357
: LGTM: Updated to use in-place modification of DecoderInputBuffers.The refactoring correctly updates the test to use the new interface where
MakeDecodingBatchInputOutput::createDecoderBatchInputs
modifiesinputBuffers
in place, anddecoder.forward
acceptsDecoderInputBuffers
directly. This simplifies the ownership semantics and aligns with the broader refactoring.
484-486
: LGTM: Consistent interface changes in testDecoderWavefront.The changes follow the same pattern as in testDecoder, correctly updating to use in-place modification of
DecoderInputBuffers
and direct parameter passing todecoder.forward
.
641-643
: LGTM: Consistent interface changes in testDecoderDraft.The changes follow the same pattern as in the other test functions, correctly updating to use the new
DecoderInputBuffers
interface.cpp/tensorrt_llm/pybind/runtime/bindings.cpp (3)
20-20
: LGTM: Added necessary header for DecoderInputBuffers.The inclusion of
tensorrt_llm/batch_manager/decoderBuffers.h
is required for the newDecoderInputBuffers
Python binding.
52-52
: LGTM: Added namespace alias for batch_manager.The namespace alias
tb
fortensorrt_llm::batch_manager
simplifies the code and follows the existing naming pattern.
381-393
: LGTM: New Python binding for DecoderInputBuffers.The new binding correctly exposes the
DecoderInputBuffers
class with its constructor and necessary member variables. This replaces the previousdecoder_batch::Input
binding and provides the required interface for the Python layer.cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h (2)
55-58
: LGTM: Updated method signatures to use DecoderInputBuffers.The
forwardAsync
andforward
method signatures have been correctly updated to usebatch_manager::DecoderInputBuffers const& input
instead of the previousdecoder_batch::Input
. The const-correctness is maintained appropriately.
81-81
: LGTM: Updated private method signature for consistency.The
forwardDispatch
method signature has been correctly updated to usebatch_manager::DecoderInputBuffers const& input
, maintaining consistency with the public interface changes.tensorrt_llm/_torch/pyexecutor/sampler.py (3)
18-18
: LGTM: Updated import to use new interface.The import change from
make_decoding_batch_input_output
tomake_decoding_batch_input
correctly reflects the new interface that modifiesDecoderInputBuffers
in place.
20-20
: LGTM: Added import for DecoderInputBuffers.The import of
DecoderInputBuffers
from the runtime module is necessary for the updated interface.
679-682
: LGTM: Updated decoder forward call to use DecoderInputBuffers directly.The call to
decoder.forward_async
has been correctly updated to pass theDecoderInputBuffers
directly, which aligns with the new interface that eliminates the need for unique pointer dereferencing.tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py (1)
65-70
: LGTM! The in-place modification pattern is correctly implemented.The changes properly update the decoder state and input buffers without creating new objects, which aligns with the refactoring objectives.
cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h (2)
41-41
: Good addition of const tensor pointer type alias.The
TensorConstPtr
type alias improves code clarity by explicitly distinguishing between mutable and immutable tensor pointers.
67-76
: Well-structured changes for multi-step decoding support.The addition of
maxDecoderSteps
with a sensible default and the restructuring ofbatchLogits
as a nested vector of const pointers properly supports multiple decoding steps while maintaining immutability semantics.cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (1)
116-118
: Member access correctly updated for the new buffer structure.The changes properly access
forwardBatchSlots
andbatchLogits
using the step index, which aligns with the multi-step decoding support inDecoderInputBuffers
.cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (1)
30-32
: Clean interface refactoring with proper forward declarations.The removal of
decoder_batch::Input
and replacement withbatch_manager::DecoderInputBuffers
simplifies the interface while the forward declaration maintains proper dependency management.Also applies to: 61-67
cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp (1)
49-49
: Good improvement in batch slot sizing.Using
activeSlots.size()
instead ofmaxNumSequences
ensures the batch slots are sized exactly to the number of active requests, which is more efficient.cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h (2)
43-43
: LGTM!The type alias change correctly reflects the removal of the
decoder_batch::Input
dependency, simplifying the interface.
49-53
: Interface changes align with in-place modification pattern.The change from returning
std::unique_ptr<runtime::decoder_batch::Input>
tovoid
and directly modifyingDecoderInputBuffers&
is consistent with the refactoring objectives. This eliminates unnecessary allocations and ownership transfers.cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (3)
441-444
: Python binding correctly updated to match C++ interface changes.The function signature now properly accepts
DecoderInputBuffers&
andDecoderState&
as parameters, maintaining consistency with the C++ API changes.
446-513
: Implementation correctly handles both context and generation requests.The function properly:
- Processes only the last chunk of context requests
- Handles beam width tiling appropriately for multi-beam search
- Filters generation requests based on their state
- Manages batch slots consistently with active slots
The in-place modification pattern is correctly implemented throughout.
502-502
: No changes needed:batchLogits
is the correct member.The
DecoderInputBuffers
class defines two distinct members:
std::vector<TensorPtr> decoderLogits;
std::vector<std::vector<TensorConstPtr>> batchLogits;
The assignment on line 502 (
decoderInputBuffers.batchLogits = logitsVec;
) correctly targets the batched‐logits member. ThedecoderLogits
vector is used elsewhere for per-request logits, so there’s no inconsistency here.
/bot run |
PR_Github #13383 [ run ] triggered by Bot |
PR_Github #13383 [ run ] completed with state |
079793f
to
3ebe76b
Compare
/bot run --stage-list "A10-CPP-1, A30-CPP-1, A30-CPP-2, A30-CPP-3, H100_PCIe-CPP-1, H100_PCIe-CPP-2" |
PR_Github #13400 [ run ] triggered by Bot |
PR_Github #13400 [ run ] completed with state |
3ebe76b
to
6522b34
Compare
6522b34
to
7654d08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp (1)
35-36
: Missing namespace closing comment.According to the coding guidelines, closing braces of namespaces should have a comment saying the namespace it closes.
Apply this diff to add the missing namespace closing comment:
namespace tensorrt_llm::pybind::batch_manager {The closing brace at line 74 should be:
-} // namespace tensorrt_llm::pybind::batch_manager +} // namespace tensorrt_llm::pybind::batch_managercpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (1)
486-486
: Missing namespace closing comment.The closing brace should include a comment indicating which namespace it closes.
Apply this diff:
-} // namespace tensorrt_llm::pybind::batch_manager +} // namespace tensorrt_llm::pybind::batch_managercpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1)
480-480
: Missing namespace closing comment.The closing brace should include a comment indicating which namespace it closes.
Apply this diff:
-} // namespace tensorrt_llm::nanobind::batch_manager +} // namespace tensorrt_llm::nanobind::batch_manager
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h
(2 hunks)cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h
(1 hunks)cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h
(2 hunks)cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h
(2 hunks)cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp
(5 hunks)cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
(1 hunks)cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
(0 hunks)cpp/tensorrt_llm/nanobind/CMakeLists.txt
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
(2 hunks)cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
(1 hunks)cpp/tensorrt_llm/nanobind/bindings.cpp
(2 hunks)cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
(1 hunks)cpp/tensorrt_llm/pybind/CMakeLists.txt
(1 hunks)cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp
(1 hunks)cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp
(2 hunks)cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp
(1 hunks)cpp/tensorrt_llm/pybind/batch_manager/buffers.h
(1 hunks)cpp/tensorrt_llm/pybind/bindings.cpp
(2 hunks)cpp/tensorrt_llm/pybind/runtime/bindings.cpp
(3 hunks)cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp
(7 hunks)cpp/tests/batch_manager/guidedDecoderTest.cpp
(1 hunks)cpp/tests/runtime/gptDecoderBatchedTest.cpp
(6 hunks)tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py
(3 hunks)tensorrt_llm/_torch/pyexecutor/sampler.py
(1 hunks)
💤 Files with no reviewable changes (1)
- cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.h
✅ Files skipped from review due to trivial changes (5)
- cpp/tensorrt_llm/pybind/batch_manager/algorithms.cpp
- cpp/tensorrt_llm/batch_manager/handleGenerationLogits.cpp
- cpp/tensorrt_llm/nanobind/CMakeLists.txt
- cpp/tensorrt_llm/pybind/CMakeLists.txt
- cpp/tensorrt_llm/pybind/bindings.cpp
🚧 Files skipped from review as they are similar to previous changes (20)
- cpp/tensorrt_llm/batch_manager/guidedDecoder.cpp
- cpp/tensorrt_llm/nanobind/bindings.cpp
- cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp
- cpp/tensorrt_llm/batch_manager/handleContextLogits.cpp
- tensorrt_llm/_torch/pyexecutor/make_decoding_batch_input_output.py
- cpp/tensorrt_llm/pybind/runtime/bindings.cpp
- cpp/tests/batch_manager/guidedDecoderTest.cpp
- cpp/tensorrt_llm/pybind/batch_manager/buffers.h
- cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
- cpp/tests/runtime/gptDecoderBatchedTest.cpp
- cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
- cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
- cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h
- cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
- cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h
- cpp/include/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.h
- tensorrt_llm/_torch/pyexecutor/sampler.py
- cpp/include/tensorrt_llm/batch_manager/decoderBuffers.h
- cpp/tensorrt_llm/batch_manager/makeDecodingBatchInputOutput.cpp
- cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx}
: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic...
Files:
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp
cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp
cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (8)
cpp/tensorrt_llm/pybind/batch_manager/buffers.cpp (4)
1-16
: LGTM! Copyright header is compliant.The NVIDIA copyright header includes the current year (2024) and follows the required format for TensorRT-LLM Open Source Software code.
40-51
: LGTM! DecoderInputBuffers binding is well-structured.The binding correctly exposes the constructor and all member variables with appropriate read-write access. The parameter names and types align with the C++ interface.
53-60
: LGTM! DecoderOutputBuffers binding is correctly implemented.The binding appropriately exposes read-write access to buffer members. The special handling for
new_output_tokens_host
using a lambda andtr::Torch::tensor
conversion is correct for exposing the underlying tensor.
62-72
: LGTM! SlotDecoderBuffers binding is properly structured.The constructor parameters and member bindings are correctly exposed with appropriate access patterns.
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp (2)
1-16
: LGTM! Copyright header is updated correctly.The copyright header includes the updated year range (2022-2025) which is appropriate for the current changes.
410-483
: Function refactoring aligns with PR objectives.The
make_decoding_batch_input
function has been successfully refactored to:
- Accept
DecoderInputBuffers&
andDecoderState&
as mutable references (in-place modification)- Remove the return of
std::unique_ptr<decoder_batch::Input>
- Assign logits directly to
decoderInputBuffers.batchLogits
The logic correctly processes context and generation requests, handles beam width tiling, and populates the batch slots. This aligns perfectly with the PR objective of removing the
decoder_batch::Input
class.cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)
1-16
: LGTM! Copyright header is properly updated.The copyright header includes the updated year range (2022-2025) consistent with the pybind11 version.
404-477
: Excellent consistency with pybind11 implementation.The nanobind version of
make_decoding_batch_input
correctly mirrors the pybind11 implementation:
- Same function signature with mutable references
- Identical logic for processing context and generation requests
- Proper handling of beam width and logits tiling
- Consistent in-place modification of
decoderInputBuffers.batchLogits
The use of nanobind-specific syntax (
nb::arg
) is correct.
7654d08
to
5b803b0
Compare
Signed-off-by: Robin Kobus <[email protected]>
- Modified MakeDecodingBatchInputOutput to accept DecoderInputBuffers directly, simplifying the interface. - Removed decoder_batch::Input and related interfaces. - Updated GptDecoderBatched to work with the new DecoderInputBuffers structure. - Adjusted Python bindings to reflect changes in the C++ interface. Signed-off-by: Robin Kobus <[email protected]>
- Modified MakeDecodingBatchInputOutput to accept DecoderInputBuffers directly, simplifying the interface. - Removed decoder_batch::Input and related interfaces. - Updated GptDecoderBatched to work with the new DecoderInputBuffers structure. - Adjusted Python bindings to reflect changes in the C++ interface. Signed-off-by: Robin Kobus <[email protected]>
…tOutput - Removed maxNumSequences parameter from createDecoderBatchInputs and related function calls, streamlining the interface. - Updated all relevant implementations and tests to reflect the changes in function signatures. Signed-off-by: Robin Kobus <[email protected]>
- Move bindings for `DecoderInputBuffers`, `DecoderOutputBuffers`, and `SlotDecoderBuffers` to `buffers.cpp`. - Adjust related function calls and Python bindings to reflect the new structure. Signed-off-by: Robin Kobus <[email protected]>
Signed-off-by: Robin Kobus <[email protected]>
- Introduced logging to track the current beam width during token comparison tests, enhancing test traceability. Signed-off-by: Robin Kobus <[email protected]>
…s beam width - Modified the ChangeBeamWidth test to use the same number of return sequences as the beam width, improving test consistency. Signed-off-by: Robin Kobus <[email protected]>
5b803b0
to
9c5a145
Compare
Description
decoder_batch::Input
and related interfacesMakeDecodingBatchInputOutput
to acceptDecoderInputBuffers
directly, simplifying the interface.GptDecoderBatched
to work with the newDecoderInputBuffers
structure.Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores