[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM op. #5139

hyukn · 2025-06-12T01:30:40Z

Revise some implementations in the tunable FP8 batched GEMM op to reduce Python overhead for the original #4872.
The class TuningConfig is hashable so that the same contents will be converted to the same hash value. This will not cause any cache key miss. But it can introduce extra overheads for the inference phase.

tensorrt_llm/_torch/custom_ops/torch_custom_ops.py

hyukn · 2025-06-12T02:27:49Z

/bot run

tensorrt-cicd · 2025-06-12T02:33:28Z

PR_Github #8578 [ run ] triggered by Bot

hyukn · 2025-06-12T04:16:44Z

/bot run

tensorrt-cicd · 2025-06-12T04:22:53Z

PR_Github #8600 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-12T05:45:50Z

PR_Github #8600 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6241 completed with status: 'ABORTED'

hyukn · 2025-06-13T02:45:16Z

/bot run

tensorrt-cicd · 2025-06-13T02:51:07Z

PR_Github #8724 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-13T07:45:35Z

PR_Github #8724 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6328 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

hyukn · 2025-06-16T01:31:47Z

/bot run

tensorrt-cicd · 2025-06-16T01:37:54Z

PR_Github #8950 [ run ] triggered by Bot

tensorrt_llm/_torch/custom_ops/torch_custom_ops.py

tensorrt-cicd · 2025-06-16T04:36:52Z

PR_Github #8950 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6530 completed with status: 'FAILURE'

hyukn · 2025-06-16T07:51:10Z

/bot run

tensorrt-cicd · 2025-06-16T07:56:43Z

PR_Github #8995 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-16T15:34:33Z

PR_Github #8995 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6567 completed with status: 'FAILURE'

hyukn · 2025-06-17T00:18:23Z

/bot run

tensorrt-cicd · 2025-06-17T00:24:57Z

PR_Github #9071 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T01:07:34Z

PR_Github #9071 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6633 completed with status: 'FAILURE'

hyukn · 2025-06-17T01:52:42Z

/bot run

tensorrt-cicd · 2025-06-17T01:58:13Z

PR_Github #9083 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T02:35:49Z

PR_Github #9083 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6645 completed with status: 'FAILURE'

hyukn · 2025-06-17T03:38:22Z

/bot run

tensorrt-cicd · 2025-06-17T03:43:54Z

PR_Github #9116 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T04:23:25Z

PR_Github #9116 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6670 completed with status: 'FAILURE'

hyukn · 2025-06-17T09:29:34Z

/bot run

tensorrt-cicd · 2025-06-17T09:35:44Z

PR_Github #9183 [ run ] triggered by Bot

hyukn · 2025-06-17T11:02:07Z

/bot run

tensorrt-cicd · 2025-06-17T11:07:44Z

PR_Github #9197 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T11:07:48Z

PR_Github #9183 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #6728 completed with status: 'FAILURE'

DomBrown · 2025-06-17T16:59:14Z

/bot run

tensorrt-cicd · 2025-06-17T17:20:43Z

PR_Github #9233 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T17:24:20Z

PR_Github #9233 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6771 completed with status: 'FAILURE'

DomBrown · 2025-06-17T19:00:34Z

/bot run

tensorrt-cicd · 2025-06-17T19:06:30Z

PR_Github #9242 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T21:30:09Z

PR_Github #9242 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6779 completed with status: 'FAILURE'

Signed-off-by: Yukun He <[email protected]>

hyukn · 2025-06-18T01:15:31Z

/bot run

tensorrt-cicd · 2025-06-18T01:20:39Z

PR_Github #9263 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T04:49:56Z

PR_Github #9263 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6796 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

hyukn requested a review from DomBrown June 12, 2025 01:30

hyukn requested a review from a team as a code owner June 12, 2025 01:30

hyukn requested review from pcastonguay and liji-nv June 12, 2025 01:30

hyukn commented Jun 12, 2025

View reviewed changes

hyukn requested a review from litaotju June 12, 2025 01:42

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch 2 times, most recently from 26bb347 to 2d0ac26 Compare June 12, 2025 02:26

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch 2 times, most recently from 2c17b89 to f52d7d0 Compare June 12, 2025 04:16

DomBrown approved these changes Jun 12, 2025

View reviewed changes

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch 2 times, most recently from 2769352 to fc0c839 Compare June 13, 2025 02:44

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from fc0c839 to d0e3b33 Compare June 16, 2025 01:31

litaotju reviewed Jun 16, 2025

View reviewed changes

tensorrt_llm/_torch/custom_ops/torch_custom_ops.py Outdated Show resolved Hide resolved

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from d0e3b33 to 0894c7e Compare June 16, 2025 07:51

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from e3258ad to 5e301d1 Compare June 17, 2025 03:38

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from 5e301d1 to c7510d0 Compare June 17, 2025 09:29

[TRTLLM-5589] feat: minor optimizations for tunable FP8 batched GEMM op.

7f869dd

Signed-off-by: Yukun He <[email protected]>

hyukn force-pushed the feat/improve_fp8_bmm_tuning branch from c7510d0 to 7f869dd Compare June 18, 2025 01:15

litaotju approved these changes Jun 18, 2025

View reviewed changes

hyukn enabled auto-merge (squash) June 18, 2025 06:33

hyukn merged commit 6711ad9 into NVIDIA:main Jun 18, 2025
3 checks passed

[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM op. #5139

[TRTLLM-5589] feat: Minor optimizations for tunable FP8 batched GEMM op. #5139

Uh oh!

Conversation

hyukn commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hyukn commented Jun 12, 2025

Uh oh!

tensorrt-cicd commented Jun 12, 2025

Uh oh!

hyukn commented Jun 12, 2025

Uh oh!

tensorrt-cicd commented Jun 12, 2025

Uh oh!

tensorrt-cicd commented Jun 12, 2025

Uh oh!

hyukn commented Jun 13, 2025

Uh oh!

tensorrt-cicd commented Jun 13, 2025

Uh oh!

tensorrt-cicd commented Jun 13, 2025

Uh oh!

hyukn commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

hyukn commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

hyukn commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

hyukn commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

hyukn commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

hyukn commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

hyukn commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

DomBrown commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

DomBrown commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

hyukn commented Jun 12, 2025 •

edited

Loading