-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[Feat] add chunked-attention kernels on Hopper (for llama4) #4291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] add chunked-attention kernels on Hopper (for llama4) #4291
Conversation
/bot run --disable-fail-fast |
d9137aa
to
dede72b
Compare
/bot run --disable-fail-fast |
PR_Github #5135 [ run ] triggered by Bot |
PR_Github #5135 [ run ] completed with state |
dede72b
to
9948fe4
Compare
/bot run --disable-fail-fast |
PR_Github #5168 [ run ] triggered by Bot |
9948fe4
to
03cdd5a
Compare
/bot run --disable-fail-fast |
PR_Github #5186 [ run ] triggered by Bot |
PR_Github #5168 [ run ] completed with state |
PR_Github #5186 [ run ] completed with state |
899df5b
to
38d516c
Compare
/bot run --disable-fail-fast |
PR_Github #5223 [ run ] triggered by Bot |
PR_Github #5223 [ run ] completed with state |
38d516c
to
c7d0860
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please ensure that #4117 is merged first.
c7d0860
to
c8e678e
Compare
/bot run --disable-fail-fast |
PR_Github #5337 [ run ] triggered by Bot |
c8e678e
to
0ecb8fa
Compare
/bot run --disable-fail-fast |
PR_Github #5354 [ run ] triggered by Bot |
PR_Github #5337 [ run ] completed with state |
cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_common.h
Outdated
Show resolved
Hide resolved
/bot run --disable-fail-fast |
PR_Github #5354 [ run ] completed with state |
PR_Github #5388 [ run ] triggered by Bot |
PR_Github #5388 [ run ] completed with state |
0ecb8fa
to
982a864
Compare
/bot run --disable-fail-fast |
PR_Github #5454 [ run ] triggered by Bot |
982a864
to
8155e75
Compare
/bot run --disable-fail-fast |
PR_Github #5501 [ run ] triggered by Bot |
PR_Github #5454 [ run ] completed with state |
PR_Github #5501 [ run ] completed with state |
/bot run --only-multi-gpu-test |
PR_Github #5608 [ run ] triggered by Bot |
PR_Github #5608 [ run ] completed with state |
Signed-off-by: Perkz Zheng <[email protected]>
… hopper fmha kernels Signed-off-by: Perkz Zheng <[email protected]>
8155e75
to
5303507
Compare
/bot run --only-multi-gpu-test |
PR_Github #5705 [ run ] triggered by Bot |
PR_Github #5705 [ run ] completed with state |
/bot reuse-pipeline |
PR_Github #5753 [ reuse-pipeline ] triggered by Bot |
PR_Github #5753 [ reuse-pipeline ] completed with state |
Add chunked-attention kernels on Hopper (for llama4)
fmha_v2 commit: 6552b99d4820fa3f5e8a48a392681a8c128bf623
Description
Please explain the issue and the solution in short.
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.