Skip to content

Conversation

PerkzZheng
Copy link
Collaborator

@PerkzZheng PerkzZheng commented Jun 24, 2025

[https://jirasw.nvidia.com/browse/TRTLLM-4645] support mutliCtasKvMode for high-throughput MLA kernels && update the heuristic

The trtllm-gen commit: 086115ea6893c06b351af19e01aa0288f61d6823

This PR adds the multiCtasKvMode + high-throughput MLA kernels, which can have 1.45x speedup in one important large-EP case.

It also removes unused MLA kernels (like fp16) to reduce the binary size.

@PerkzZheng
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9662 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9662 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7104 completed with status: 'FAILURE'

@PerkzZheng
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9676 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9676 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7113 completed with status: 'FAILURE'

@PerkzZheng PerkzZheng force-pushed the user/perkzz/high-throughput-mla branch from 9f5941d to 3698cd4 Compare June 24, 2025 13:21
@PerkzZheng
Copy link
Collaborator Author

/bot run --disable-fail-fast

1 similar comment
@PerkzZheng
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9783 [ run ] triggered by Bot

Signed-off-by: Perkz Zheng <[email protected]>
Signed-off-by: Perkz Zheng <[email protected]>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9783 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7213 completed with status: 'FAILURE'

@PerkzZheng PerkzZheng force-pushed the user/perkzz/high-throughput-mla branch from 3698cd4 to 462e339 Compare June 25, 2025 05:26
@PerkzZheng
Copy link
Collaborator Author

/bot run

@PerkzZheng PerkzZheng requested review from kaiyux and qiaoxj07 June 25, 2025 05:27
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9808 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9808 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7235 completed with status: 'SUCCESS'

@PerkzZheng PerkzZheng merged commit 1f292ff into NVIDIA:main Jun 25, 2025
3 checks passed
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…e for high-throughput MLA kernels (NVIDIA#5426)

Signed-off-by: Perkz Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants