-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels #5426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels #5426
Conversation
/bot run |
PR_Github #9662 [ run ] triggered by Bot |
PR_Github #9662 [ run ] completed with state |
/bot run |
PR_Github #9676 [ run ] triggered by Bot |
PR_Github #9676 [ run ] completed with state |
9f5941d
to
3698cd4
Compare
/bot run --disable-fail-fast |
1 similar comment
/bot run --disable-fail-fast |
PR_Github #9783 [ run ] triggered by Bot |
… heurisitc Signed-off-by: Perkz Zheng <[email protected]>
Signed-off-by: Perkz Zheng <[email protected]>
Signed-off-by: Perkz Zheng <[email protected]>
PR_Github #9783 [ run ] completed with state |
3698cd4
to
462e339
Compare
/bot run |
PR_Github #9808 [ run ] triggered by Bot |
PR_Github #9808 [ run ] completed with state |
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>
[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels && update the heuristic
The trtllm-gen commit: 086115ea6893c06b351af19e01aa0288f61d6823
This PR adds the multiCtasKvMode + high-throughput MLA kernels, which can have 1.45x speedup in one important large-EP case.
It also removes unused MLA kernels (like fp16) to reduce the binary size.