-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: moe prepare support topk % 4 != 0 #5742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: moe prepare support topk % 4 != 0 #5742
Conversation
/bot run |
4a3cad6
to
2c7a3ac
Compare
/bot run |
PR_Github #11754 [ run ] triggered by Bot |
PR_Github #11754 [ run ] completed with state |
f7358bc
to
6b718a3
Compare
/bot run |
PR_Github #11869 [ run ] triggered by Bot |
PR_Github #11869 [ run ] completed with state |
6b718a3
to
78f9117
Compare
""" WalkthroughThe changes refactor packet and pipeline size management in the MoE preparation kernels by introducing a templated Changes
Sequence Diagram(s)sequenceDiagram
participant Host
participant Kernel
participant PipelineConfig
Host->>PipelineConfig: Select config based on topK
Host->>Kernel: Launch allToAllMetadataDevice<PipelineConfig, ExpertType, ScaleType>()
Kernel->>PipelineConfig: Access size/type constants
Kernel->>Kernel: Perform packet/unit operations using PipelineConfig
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cpp/tensorrt_llm/kernels/moePrepareKernels.cu (1)
642-643
: Consider documenting the performance trade-offs between configurations.The
PipelineConfig<1, 64>
processes 4x more packets per step compared toPipelineConfig<4, 16>
, which may have different performance characteristics. Consider adding a comment explaining this design choice.Also applies to: 653-654
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cpp/tensorrt_llm/kernels/moePrepareKernels.cu
(14 hunks)cpp/tensorrt_llm/kernels/moePrepareKernels.h
(1 hunks)tests/unittest/_torch/thop/test_moe_alltoall.py
(1 hunks)
🔇 Additional comments (5)
tests/unittest/_torch/thop/test_moe_alltoall.py (1)
473-480
: Test coverage looks good for the new feature.The test modifications appropriately cover the new functionality:
- Line 474: Changed
top_k
from 4 to 3 (not divisible by 4)- Line 480: Added new test case with
top_k=6
(not divisible by 4)These changes align with the PR objective of supporting cases where
topk % 4 != 0
.cpp/tensorrt_llm/kernels/moePrepareKernels.h (2)
47-57
: Good refactoring with the templated PipelineConfig struct.The templated approach provides flexibility for different pipeline configurations based on runtime parameters. The computed constants are correctly derived from the template parameters.
59-60
: Verify that the 1MB FIFO size is sufficient for all pipeline configurations.The fixed FIFO size of 1MB replaces the previous dynamic calculation. Please ensure this size accommodates all possible
PipelineConfig
specializations, especially with larger packet sizes or step counts.cpp/tensorrt_llm/kernels/moePrepareKernels.cu (2)
640-661
: Correct implementation of conditional pipeline configuration.The logic properly handles both cases:
topK % 4 == 0
: UsesPipelineConfig<4, 16>
with vectorized types (int4/float4) for better performancetopK % 4 != 0
: UsesPipelineConfig<1, 64>
with scalar types (int/float)The static assertions ensure FIFO size sufficiency for both configurations.
669-671
: Workspace size calculation is correctly updated.The new calculation allocates 1MB FIFO buffer plus metadata (
MoeCommFifoConnInfo
) per rank, which aligns with the fixed FIFO size design.
Signed-off-by: Fred Wei <[email protected]>
78f9117
to
94f7f18
Compare
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cpp/tensorrt_llm/kernels/moePrepareKernels.cu
(14 hunks)cpp/tensorrt_llm/kernels/moePrepareKernels.h
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cpp/tensorrt_llm/kernels/moePrepareKernels.h
🔇 Additional comments (14)
cpp/tensorrt_llm/kernels/moePrepareKernels.cu (14)
322-327
: LGTM: Good architectural improvementThe refactoring from a templated communicator type to a base pointer improves flexibility and reduces template complexity while maintaining type safety.
334-334
: Correct parameterizationGood use of template parameters to replace hardcoded constants. The initialization logic is preserved correctly.
346-351
: Proper buffer arithmetic with template parametersThe pointer arithmetic correctly uses
PipelineConfig
constants, maintaining the original logic while adding flexibility.
369-369
: Consistent template parameter usageBuffer offset calculation properly uses the templated constants.
386-390
: Correct packet retrieval logicThe receive packet logic properly uses template parameters while preserving the original buffer management semantics.
406-406
: Proper buffer addressingPacket pointer calculation correctly uses templated constants.
427-427
: Well-designed template parametersThe addition of
ExpertType
andScaleType
template parameters enables support for both vectorized and scalar operations based on topK requirements.
436-444
: Correct template parameter usage in kernel setupThe group size calculation and local array sizing properly use template parameters. The pipeline instantiation correctly uses the new template approach.
450-450
: Correct unit count calculationUnit count properly uses template parameter for flexible pipeline configuration.
461-487
: Proper templated memory operationsThe memory access patterns correctly use
ExpertType
andScaleType
templates, enabling both vectorized and scalar operations. Buffer offset calculations usingPipelineConfig
constants are accurate.
495-495
: Correct static copy offsetBuffer offset calculation properly uses template parameter.
528-543
: Well-implemented receive path templatingThe receive logic correctly uses templated types for both expert and scale data handling. The pointer arithmetic and type casting are properly implemented.
551-551
: Consistent offset usageStatic copy operations correctly use template-based offset calculations.
670-670
: Simplified MOE prepare workspace calculation is adequately coveredAutomated tests in
tests/unittest/_torch/thop/test_moe_alltoall.py
invokeget_moe_prepare_workspace_size_per_rank(ep_size)
and exercise the prepare kernel across multipleep_size
scenarios, confirming that the fixed-1 MB FIFO workspace suffices. No additional changes required.
PR_Github #12380 [ run ] triggered by Bot |
PR_Github #12380 [ run ] completed with state |
Signed-off-by: Fred Wei <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
Signed-off-by: Fred Wei <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
feat: moe prepare support topk % 4 != 0
Summary by CodeRabbit