feat: large-scale EP(part 3: refactor - FusedMoe for redundant expert) #4495

dongxuy04 · 2025-05-20T15:08:42Z

Refactor FusedMoe for redundant expert

Description

Based on @wm2012011492 's investigation experiments and side branch commit, refactor the code of FusedMoe for coming redundant expert.

Here we separate the concept of Experts and Expert Slots.

Expert is the concept from model side, MoE weights from checkpoint has the key of expert id. MoE Model's Router will compute the expert ids for each token.
Expert Slot is the concept from engine side. FusedMoE Module load weights into Expert Slots (this extends of Module Weights to Weight Slots) and fused_moe op compute based on Expert Slots (this change from Expert to Expert Slots).

And for this PR, we only separate the concept but the Expert id is still same as Expert Slot id, so no need to do mapping. Later PR may use different ids and introduce id mapping.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Dongxu Yang <[email protected]>

dongxuy04 · 2025-05-20T15:11:54Z

/bot run

tensorrt-cicd · 2025-05-20T15:17:34Z

PR_Github #5888 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-20T17:49:36Z

PR_Github #5888 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4314 completed with status: 'SUCCESS'

jinyangyuan-nvidia

LGTM

juney-nvidia · 2025-05-21T09:17:16Z

@HuiGao-NV Hi Jerry, can you prioritize to review this PR from Dongxu?

Thanks
June

juney-nvidia · 2025-05-21T09:18:35Z

To unblock the large-scale EP progress, I just merged this PR, feel free to comment and in case there are some big conflicting with the ongoing FusedMoE refactoring work, pls let Dongxu and me know and let's coordinate.

@HuiGao-NV @hlu1

June

refactor fused_moe for redundant expert

ebdc169

Signed-off-by: Dongxu Yang <[email protected]>

dongxuy04 requested a review from wm2012011492 May 21, 2025 02:11

dongxuy04 marked this pull request as ready for review May 21, 2025 02:19

dongxuy04 requested a review from jinyangyuan-nvidia May 21, 2025 02:35

jinyangyuan-nvidia approved these changes May 21, 2025

View reviewed changes

dongxuy04 requested a review from HuiGao-NV May 21, 2025 02:58

juney-nvidia requested review from hlu1 and litaotju May 21, 2025 03:30

juney-nvidia changed the title ~~refactor: FusedMoe for redundant expert~~ feat: large-scale EP(part 3 - refactor: FusedMoe for redundant expert) May 21, 2025

juney-nvidia approved these changes May 21, 2025

View reviewed changes

juney-nvidia merged commit 4018806 into NVIDIA:main May 21, 2025
2 of 3 checks passed

juney-nvidia mentioned this pull request May 21, 2025

[Call for contributions]The development plan of large-scale EP support in TensorRT-LLM #4127

Open

juney-nvidia changed the title ~~feat: large-scale EP(part 3 - refactor: FusedMoe for redundant expert)~~ feat: large-scale EP(part 3: refactor - FusedMoe for redundant expert) May 21, 2025

amirkl94 mentioned this pull request May 29, 2025

chore: Mass integration of release/0.20. #4732

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: large-scale EP(part 3: refactor - FusedMoe for redundant expert) #4495

feat: large-scale EP(part 3: refactor - FusedMoe for redundant expert) #4495

dongxuy04 commented May 20, 2025 •

edited

Loading

Uh oh!

dongxuy04 commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

jinyangyuan-nvidia left a comment

Uh oh!

juney-nvidia commented May 21, 2025

Uh oh!

Uh oh!

juney-nvidia commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: large-scale EP(part 3: refactor - FusedMoe for redundant expert) #4495

feat: large-scale EP(part 3: refactor - FusedMoe for redundant expert) #4495

Conversation

dongxuy04 commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor FusedMoe for redundant expert

Description

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

dongxuy04 commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

tensorrt-cicd commented May 20, 2025

Uh oh!

jinyangyuan-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

juney-nvidia commented May 21, 2025

Uh oh!

Uh oh!

juney-nvidia commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dongxuy04 commented May 20, 2025 •

edited

Loading