[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels #5426

PerkzZheng · 2025-06-24T08:20:45Z

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels && update the heuristic

The trtllm-gen commit: 086115ea6893c06b351af19e01aa0288f61d6823

This PR adds the multiCtasKvMode + high-throughput MLA kernels, which can have 1.45x speedup in one important large-EP case.

It also removes unused MLA kernels (like fp16) to reduce the binary size.

PerkzZheng · 2025-06-24T08:21:45Z

/bot run

tensorrt-cicd · 2025-06-24T08:26:49Z

PR_Github #9662 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-24T08:39:05Z

PR_Github #9662 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7104 completed with status: 'FAILURE'

PerkzZheng · 2025-06-24T09:21:36Z

/bot run

tensorrt-cicd · 2025-06-24T09:26:43Z

PR_Github #9676 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-24T11:38:03Z

PR_Github #9676 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7113 completed with status: 'FAILURE'

PerkzZheng · 2025-06-24T13:21:50Z

/bot run --disable-fail-fast

PerkzZheng · 2025-06-25T01:54:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-25T02:01:38Z

PR_Github #9783 [ run ] triggered by Bot

… heurisitc Signed-off-by: Perkz Zheng <[email protected]>

Signed-off-by: Perkz Zheng <[email protected]>

tensorrt-cicd · 2025-06-25T05:14:34Z

PR_Github #9783 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7213 completed with status: 'FAILURE'

PerkzZheng · 2025-06-25T05:27:06Z

/bot run

tensorrt-cicd · 2025-06-25T05:35:50Z

PR_Github #9808 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-25T08:19:28Z

PR_Github #9808 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7235 completed with status: 'SUCCESS'

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

PerkzZheng force-pushed the user/perkzz/high-throughput-mla branch from 9f5941d to 3698cd4 Compare June 24, 2025 13:21

PerkzZheng added 3 commits June 25, 2025 05:11

support mutliCtasKvMode for high-throughput MLA kernels && update the…

1c140e6

… heurisitc Signed-off-by: Perkz Zheng <[email protected]>

update cubins

a1fdeac

Signed-off-by: Perkz Zheng <[email protected]>

fix

462e339

Signed-off-by: Perkz Zheng <[email protected]>

PerkzZheng force-pushed the user/perkzz/high-throughput-mla branch from 3698cd4 to 462e339 Compare June 25, 2025 05:26

PerkzZheng requested review from kaiyux and qiaoxj07 June 25, 2025 05:27

qiaoxj07 approved these changes Jun 25, 2025

View reviewed changes

PerkzZheng merged commit 1f292ff into NVIDIA:main Jun 25, 2025
3 checks passed

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

723f265

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

46c1236

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

3e1a691

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

aa2c642

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

a979635

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

2e1c91b

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

4fec463

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMod…

c5a3da1

…e for high-throughput MLA kernels (NVIDIA#5426) Signed-off-by: Perkz Zheng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels #5426

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels #5426

Uh oh!

PerkzZheng commented Jun 24, 2025 •

edited

Loading

Uh oh!

PerkzZheng commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

PerkzZheng commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

PerkzZheng commented Jun 24, 2025

Uh oh!

PerkzZheng commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

PerkzZheng commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels #5426

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels #5426

Uh oh!

Conversation

PerkzZheng commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels && update the heuristic

Uh oh!

PerkzZheng commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

PerkzZheng commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

PerkzZheng commented Jun 24, 2025

Uh oh!

PerkzZheng commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

PerkzZheng commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

PerkzZheng commented Jun 24, 2025 •

edited

Loading