[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) #4335

farazkh80 · 2025-05-15T00:15:06Z

Description

This PR adds tests to end-to-end tests for Mixtral 8x7B FP4 to the TensorRT-LLM(w torch backend) test suite to be run on SM120.
Cutlass MoE GEMM did not support FP8 for SM120, thus to make this work, there has been a change in MoE GEMM for cutlass to use Ada (SM89) kernels for FP8 MoE GEMM.
The tests will be used by QA as a part of the B40 Bring-up (RTX6000 Pro SM120) effort.

Test Coverage

Single node tests

test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-BF16-llama-3.1-model/Meta-Llama-3.1-8B]

These tests will be included in the SM120 verification plan for QA sign-off.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

farazkh80 · 2025-05-15T02:06:09Z

/bot run

tensorrt-cicd · 2025-05-15T02:16:18Z

PR_Github #5238 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-15T06:29:08Z

PR_Github #5238 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3826 completed with status: 'SUCCESS'

farazkh80 · 2025-05-15T17:04:34Z

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

tensorrt-cicd · 2025-05-15T17:12:29Z

PR_Github #5381 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-15T19:57:47Z

PR_Github #5381 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3926 (Partly Tested) completed with status: 'FAILURE'

Signed-off-by: Faraz Khoubsirat <[email protected]>

farazkh80 · 2025-05-16T11:50:52Z

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

tensorrt-cicd · 2025-05-16T11:56:40Z

PR_Github #5507 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T14:32:56Z

PR_Github #5507 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #4013 (Partly Tested) completed with status: 'FAILURE'

farazkh80 · 2025-05-16T18:27:54Z

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

tensorrt-cicd · 2025-05-16T18:33:49Z

PR_Github #5529 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T20:22:37Z

PR_Github #5529 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4030 (Partly Tested) completed with status: 'SUCCESS'

Signed-off-by: Faraz Khoubsirat <[email protected]>

farazkh80 · 2025-05-16T20:46:52Z

/bot run

tensorrt-cicd · 2025-05-16T20:53:03Z

PR_Github #5538 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T22:55:36Z

PR_Github #5538 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4039 completed with status: 'FAILURE'

farazkh80 · 2025-05-17T14:25:40Z

/bot run --stage-list

tensorrt-cicd · 2025-05-17T14:33:25Z

PR_Github #5572 Bot args parsing error: usage: /bot run [--reuse-test [optionalpipeline-id]] [--disable-fail-fast]
[--skip-test] [--stage-list "A10-1, xxx"]
[--gpu-type "A30, H100_PCIe"] [--test-backend "pytorch, cpp"]
[--multi-gpu-test] [--add-multi-gpu-test]
[--only-multi-gpu-test] [--disable-multi-gpu-test]
[--post-merge] [--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
[--memory-profiling] [--disable-incremental-build]
[--enable-publish-last-known-good] [--debug]
/bot run: error: argument --stage-list: expected one argument

farazkh80 · 2025-05-17T14:39:50Z

/bot run

tensorrt-cicd · 2025-05-17T14:46:03Z

PR_Github #5574 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-17T14:48:36Z

PR_Github #5574 [ run ] completed with state FAILURE

farazkh80 · 2025-05-18T14:21:55Z

/bot run

tensorrt-cicd · 2025-05-18T14:27:44Z

PR_Github #5617 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-18T16:17:06Z

PR_Github #5617 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #4102 completed with status: 'FAILURE'

farazkh80 · 2025-05-18T21:02:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-18T21:08:04Z

PR_Github #5630 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-19T00:17:39Z

PR_Github #5630 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4114 completed with status: 'SUCCESS'

farazkh80 · 2025-05-19T00:36:19Z

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

tensorrt-cicd · 2025-05-19T00:43:20Z

PR_Github #5640 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-19T01:42:38Z

PR_Github #5640 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4122 (Partly Tested) completed with status: 'SUCCESS'

farazkh80 · 2025-05-19T01:46:55Z

/bot reuse-pipeline

tensorrt-cicd · 2025-05-19T01:53:07Z

PR_Github #5651 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-05-19T02:00:20Z

PR_Github #5651 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #5640 (Partly Tested) for commit 9865d64

farazkh80 force-pushed the moe-fp8-fix branch from 5b328b2 to 51b6b79 Compare May 15, 2025 17:03

farazkh80 changed the title ~~Add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm~~ # [TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) May 15, 2025

farazkh80 changed the title ~~# [TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120)~~ [TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) May 15, 2025

farazkh80 force-pushed the moe-fp8-fix branch from 96dd3f9 to cebd2cd Compare May 16, 2025 03:36

farazkh80 added 3 commits May 16, 2025 11:50

add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm

e086d37

Signed-off-by: Faraz Khoubsirat <[email protected]>

update cutlass versions

436fcbd

Signed-off-by: Faraz Khoubsirat <[email protected]>

added internal cutlass with fix and docker update

4d246a1

Signed-off-by: Faraz Khoubsirat <[email protected]>

farazkh80 force-pushed the moe-fp8-fix branch from cbf78cf to 4d246a1 Compare May 16, 2025 11:50

farazkh80 requested a review from pamelap-nvidia May 16, 2025 11:51

Merge branch 'main' into moe-fp8-fix

e7ced10

added mixtral to pro 6000

4c8a3b2

Signed-off-by: Faraz Khoubsirat <[email protected]>

Merge branch 'main' into moe-fp8-fix

11a47c6

Merge branch 'main' into moe-fp8-fix

2c12776

farazkh80 requested a review from kaiyux May 19, 2025 01:33

Merge branch 'main' into moe-fp8-fix

9865d64

kaiyux requested review from zongfeijing and nv-guomingz and removed request for kaiyux May 19, 2025 01:50

pamelap-nvidia approved these changes May 19, 2025

View reviewed changes

nv-guomingz approved these changes May 19, 2025

View reviewed changes

farazkh80 merged commit 7656af1 into NVIDIA:main May 19, 2025
3 checks passed

[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) #4335

[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) #4335

Uh oh!

Conversation

farazkh80 commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

farazkh80 commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

farazkh80 commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

tensorrt-cicd commented May 15, 2025

Uh oh!

farazkh80 commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

farazkh80 commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

farazkh80 commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

farazkh80 commented May 17, 2025

Uh oh!

tensorrt-cicd commented May 17, 2025

Uh oh!

farazkh80 commented May 17, 2025

Uh oh!

tensorrt-cicd commented May 17, 2025

Uh oh!

tensorrt-cicd commented May 17, 2025

Uh oh!

farazkh80 commented May 18, 2025

Uh oh!

tensorrt-cicd commented May 18, 2025

Uh oh!

tensorrt-cicd commented May 18, 2025

Uh oh!

farazkh80 commented May 18, 2025

Uh oh!

tensorrt-cicd commented May 18, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

farazkh80 commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

farazkh80 commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

tensorrt-cicd commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

farazkh80 commented May 15, 2025 •

edited

Loading