[Feat] add chunked-attention kernels on Hopper (for llama4) #4291

PerkzZheng · 2025-05-14T07:10:27Z

Add chunked-attention kernels on Hopper (for llama4)

fmha_v2 commit: 6552b99d4820fa3f5e8a48a392681a8c128bf623

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

PerkzZheng · 2025-05-14T07:10:47Z

/bot run --disable-fail-fast

PerkzZheng · 2025-05-14T07:24:42Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-14T07:30:42Z

PR_Github #5135 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-14T09:11:07Z

PR_Github #5135 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3745 completed with status: 'FAILURE'

PerkzZheng · 2025-05-14T11:43:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-14T11:49:07Z

PR_Github #5168 [ run ] triggered by Bot

PerkzZheng · 2025-05-14T14:08:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-14T14:13:52Z

PR_Github #5186 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-14T14:13:54Z

PR_Github #5168 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-05-14T22:38:15Z

PR_Github #5186 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3784 completed with status: 'FAILURE'

PerkzZheng · 2025-05-15T00:26:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-15T00:36:57Z

PR_Github #5223 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-15T06:29:12Z

PR_Github #5223 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3815 completed with status: 'SUCCESS'

cpp/tensorrt_llm/common/attentionOp.cpp

yuxianq

LGTM, please ensure that #4117 is merged first.

PerkzZheng · 2025-05-15T10:23:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-15T10:34:42Z

PR_Github #5337 [ run ] triggered by Bot

PerkzZheng · 2025-05-15T13:52:27Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-15T13:57:41Z

PR_Github #5354 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-15T13:57:50Z

PR_Github #5337 [ run ] completed with state ABORTED

cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_common.h

mikeiovine · 2025-05-15T17:55:50Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-15T17:57:00Z

PR_Github #5354 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3907 completed with status: 'FAILURE'

tensorrt-cicd · 2025-05-15T18:01:00Z

PR_Github #5388 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-15T21:48:33Z

PR_Github #5388 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3932 completed with status: 'FAILURE'

PerkzZheng · 2025-05-16T05:21:25Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-16T05:26:46Z

PR_Github #5454 [ run ] triggered by Bot

PerkzZheng · 2025-05-16T09:54:22Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-16T10:01:06Z

PR_Github #5501 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T10:01:16Z

PR_Github #5454 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-05-16T23:19:36Z

PR_Github #5501 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4008 completed with status: 'FAILURE'

PerkzZheng · 2025-05-18T08:59:26Z

/bot run --only-multi-gpu-test

tensorrt-cicd · 2025-05-18T09:04:35Z

PR_Github #5608 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-18T16:31:36Z

PR_Github #5608 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4093 (Partly Tested) completed with status: 'FAILURE'

Signed-off-by: Perkz Zheng <[email protected]>

… hopper fmha kernels Signed-off-by: Perkz Zheng <[email protected]>

EmmaQiaoCh · 2025-05-19T09:09:31Z

/bot run --only-multi-gpu-test

tensorrt-cicd · 2025-05-19T09:15:32Z

PR_Github #5705 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-19T16:38:12Z

PR_Github #5705 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4171 (Partly Tested) completed with status: 'SUCCESS'

schetlur-nv · 2025-05-19T16:45:05Z

/bot reuse-pipeline

tensorrt-cicd · 2025-05-19T16:50:37Z

PR_Github #5753 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-05-19T17:02:40Z

PR_Github #5753 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #5705 (Partly Tested) for commit 0d57171

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from d9137aa to dede72b Compare May 14, 2025 07:24

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from dede72b to 9948fe4 Compare May 14, 2025 11:42

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from 9948fe4 to 03cdd5a Compare May 14, 2025 14:08

PerkzZheng force-pushed the user/perkzz/chunked-attention branch 2 times, most recently from 899df5b to 38d516c Compare May 15, 2025 00:25

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from 38d516c to c7d0860 Compare May 15, 2025 06:30

PerkzZheng requested review from mikeiovine and yuxianq May 15, 2025 06:33

yuxianq reviewed May 15, 2025

View reviewed changes

cpp/tensorrt_llm/common/attentionOp.cpp Outdated Show resolved Hide resolved

yuxianq approved these changes May 15, 2025

View reviewed changes

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from c7d0860 to c8e678e Compare May 15, 2025 10:22

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from c8e678e to 0ecb8fa Compare May 15, 2025 13:48

mikeiovine approved these changes May 15, 2025

View reviewed changes

cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_common.h Outdated Show resolved Hide resolved

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from 0ecb8fa to 982a864 Compare May 16, 2025 05:13

PerkzZheng mentioned this pull request May 16, 2025

Feat: support MTP for fmha_v2 based MLA kernels. #4117

Closed

PerkzZheng force-pushed the user/perkzz/chunked-attention branch from 982a864 to 8155e75 Compare May 16, 2025 09:53

PerkzZheng added 2 commits May 19, 2025 17:09

update cubins

a2fadea

Signed-off-by: Perkz Zheng <[email protected]>

add mtp for fmha_v2 MLA kernels and add chunked-attention support for…

5303507

… hopper fmha kernels Signed-off-by: Perkz Zheng <[email protected]>

EmmaQiaoCh force-pushed the user/perkzz/chunked-attention branch from 8155e75 to 5303507 Compare May 19, 2025 09:09

Merge branch 'main' into user/perkzz/chunked-attention

0d57171

schetlur-nv merged commit 1c5b0d6 into NVIDIA:main May 19, 2025
1 of 2 checks passed

QiJune mentioned this pull request Jul 16, 2025

add release notes for 0.21 release #6049

Merged

[Feat] add chunked-attention kernels on Hopper (for llama4) #4291

[Feat] add chunked-attention kernels on Hopper (for llama4) #4291

Uh oh!

Conversation

PerkzZheng commented May 14, 2025

Add chunked-attention kernels on Hopper (for llama4)

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

PerkzZheng commented May 14, 2025

Uh oh!

PerkzZheng commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

PerkzZheng commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

PerkzZheng commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

tensorrt-cicd commented May 14, 2025

Uh oh!

PerkzZheng commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

Uh oh!

yuxianq left a comment

Choose a reason for hiding this comment

Uh oh!

PerkzZheng commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

PerkzZheng commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

Uh oh!

mikeiovine commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

PerkzZheng commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

PerkzZheng commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

PerkzZheng commented May 18, 2025

Uh oh!

tensorrt-cicd commented May 18, 2025

Uh oh!

tensorrt-cicd commented May 18, 2025

Uh oh!