[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow #5615

DylanChen-NV · 2025-06-30T13:47:57Z

[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow

Description

Adapter existing FP8 row-wise dense GEMM kernels (sm89/sm90/sm120) to torch workflow.
Support FP8 row-wise GEMM as a TunableRunner for better performance.

Test Coverage

tests/unittest/_torch/thop/test_fp8_rowwise_linear.py: Test the output correctness of row-wise torch op

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

DylanChen-NV · 2025-07-01T08:01:57Z

/bot run

tensorrt-cicd · 2025-07-01T08:07:09Z

PR_Github #10461 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-01T08:18:13Z

PR_Github #10461 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7742 completed with status: 'FAILURE'

DylanChen-NV · 2025-07-01T08:28:17Z

/bot run

tensorrt-cicd · 2025-07-01T08:33:26Z

PR_Github #10464 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-01T08:45:34Z

PR_Github #10464 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7746 completed with status: 'FAILURE'

DylanChen-NV · 2025-07-01T08:54:49Z

/bot run

tensorrt-cicd · 2025-07-01T09:01:25Z

PR_Github #10473 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-01T11:01:30Z

PR_Github #10473 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7751 completed with status: 'FAILURE'

DylanChen-NV · 2025-07-01T11:41:41Z

/bot run

tensorrt-cicd · 2025-07-01T11:48:04Z

PR_Github #10503 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-01T14:49:55Z

PR_Github #10503 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7776 completed with status: 'FAILURE'

DylanChen-NV · 2025-07-01T14:58:15Z

/bot run

tensorrt-cicd · 2025-07-01T15:03:37Z

PR_Github #10512 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-01T17:15:07Z

PR_Github #10512 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7785 completed with status: 'FAILURE'

DylanChen-NV · 2025-07-02T03:24:39Z

/bot run

tensorrt-cicd · 2025-07-02T03:30:01Z

PR_Github #10562 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-02T05:45:23Z

PR_Github #10562 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7816 completed with status: 'SUCCESS'

tomeras91

Overall LGTM. Added a few nit comments.

What's more important: Can you edit the PR description, remove the irrelevant info from the description template and add important information on what is added in this PR?

cpp/tensorrt_llm/thop/fp8RowwiseGemm.cpp

tensorrt_llm/_torch/modules/linear.py

DylanChen-NV · 2025-07-04T04:01:23Z

Overall LGTM. Added a few nit comments.

What's more important: Can you edit the PR description, remove the irrelevant info from the description template and add important information on what is added in this PR?

@tomeras91 Thanks for the comments. I’ve implemented all the suggested changes. Please feel free to approve or let me know if further changes are needed.

DylanChen-NV · 2025-07-07T02:31:33Z

/bot run

tensorrt-cicd · 2025-07-07T02:37:08Z

PR_Github #11086 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-07T09:10:25Z

PR_Github #11086 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8195 completed with status: 'SUCCESS'

DylanChen-NV · 2025-07-07T09:24:28Z

/bot reuse-pipeline

tensorrt-cicd · 2025-07-07T09:30:00Z

PR_Github #11131 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-07-07T09:41:55Z

PR_Github #11131 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #11086 for commit cea6589

Signed-off-by: Dylan Chen <[email protected]>

DylanChen-NV · 2025-07-07T09:47:15Z

/bot reuse-pipeline

tensorrt-cicd · 2025-07-07T09:52:04Z

PR_Github #11133 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-07-07T10:01:36Z

PR_Github #11133 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #11086 for commit 819c9b8

…IDIA#5615) Signed-off-by: Dylan Chen <[email protected]> Signed-off-by: Yuxin <[email protected]>

DylanChen-NV requested a review from a team as a code owner June 30, 2025 13:47

DylanChen-NV requested review from dongxuy04 and liji-nv June 30, 2025 13:47

DylanChen-NV changed the title ~~[TRTLLM-5812][feat] support rowwise in torch flow~~ [TRTLLM-5812][feat] support FP8 row-wise GEMM in torch flow Jun 30, 2025

DylanChen-NV changed the title ~~[TRTLLM-5812][feat] support FP8 row-wise GEMM in torch flow~~ [TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow Jun 30, 2025

juney-nvidia requested review from Naveassaf and Tracin and removed request for dongxuy04 and liji-nv June 30, 2025 20:23

DylanChen-NV force-pushed the row_wise_torch_flow branch from 20840d9 to 41579d6 Compare July 1, 2025 08:01

DylanChen-NV force-pushed the row_wise_torch_flow branch from e1a100c to f6230e9 Compare July 2, 2025 03:19

Tracin approved these changes Jul 2, 2025

View reviewed changes

tomeras91 reviewed Jul 3, 2025

View reviewed changes

cpp/tensorrt_llm/thop/fp8RowwiseGemm.cpp Outdated Show resolved Hide resolved

tensorrt_llm/_torch/modules/linear.py Outdated Show resolved Hide resolved

tomeras91 approved these changes Jul 6, 2025

View reviewed changes

DylanChen-NV force-pushed the row_wise_torch_flow branch from 11461a1 to d4950c7 Compare July 7, 2025 02:31

byshiue approved these changes Jul 7, 2025

View reviewed changes

DylanChen-NV force-pushed the row_wise_torch_flow branch from d4950c7 to cea6589 Compare July 7, 2025 09:24

byshiue enabled auto-merge (squash) July 7, 2025 09:38

DylanChen-NV added 11 commits July 7, 2025 17:46

support rowwise in torch flow + autotune

0f28774

Signed-off-by: Dylan Chen <[email protected]>

support fp8 kv

7ba7e37

Signed-off-by: Dylan Chen <[email protected]>

add test

a36fefe

Signed-off-by: Dylan Chen <[email protected]>

add fp8RowwiseGemm.cpp

257d344

Signed-off-by: Dylan Chen <[email protected]>

add doc for qwen3

a7e3c3d

Signed-off-by: Dylan Chen <[email protected]>

fix build

28031d7

Signed-off-by: Dylan Chen <[email protected]>

skip test for sm100

b023077

Signed-off-by: Dylan Chen <[email protected]>

fix test

15be209

Signed-off-by: Dylan Chen <[email protected]>

modify test threshold

6cb6343

Signed-off-by: Dylan Chen <[email protected]>

refine

a0ff0db

Signed-off-by: Dylan Chen <[email protected]>

refine

819c9b8

Signed-off-by: Dylan Chen <[email protected]>

DylanChen-NV force-pushed the row_wise_torch_flow branch from cea6589 to 819c9b8 Compare July 7, 2025 09:47

byshiue merged commit 5ca2b9b into NVIDIA:main Jul 7, 2025
3 checks passed

zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025

[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (NV…

7585241

…IDIA#5615) Signed-off-by: Dylan Chen <[email protected]> Signed-off-by: Yuxin <[email protected]>

Uh oh!

[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow #5615

[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow #5615

Uh oh!

Conversation

DylanChen-NV commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

DylanChen-NV commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

DylanChen-NV commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

DylanChen-NV commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

DylanChen-NV commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

DylanChen-NV commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

DylanChen-NV commented Jul 2, 2025

Uh oh!

tensorrt-cicd commented Jul 2, 2025

Uh oh!

tensorrt-cicd commented Jul 2, 2025

Uh oh!

tomeras91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DylanChen-NV commented Jul 4, 2025

Uh oh!

DylanChen-NV commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

DylanChen-NV commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

DylanChen-NV commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

DylanChen-NV commented Jun 30, 2025 •

edited

Loading