Skip to content

Conversation

@dongxuy04
Copy link
Collaborator

@dongxuy04 dongxuy04 commented May 20, 2025

Refactor FusedMoe for redundant expert

Description

Based on @wm2012011492 's investigation experiments and side branch commit, refactor the code of FusedMoe for coming redundant expert.

Here we separate the concept of Experts and Expert Slots.

  • Expert is the concept from model side, MoE weights from checkpoint has the key of expert id. MoE Model's Router will compute the expert ids for each token.
  • Expert Slot is the concept from engine side. FusedMoE Module load weights into Expert Slots (this extends of Module Weights to Weight Slots) and fused_moe op compute based on Expert Slots (this change from Expert to Expert Slots).

And for this PR, we only separate the concept but the Expert id is still same as Expert Slot id, so no need to do mapping. Later PR may use different ids and introduce id mapping.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@dongxuy04
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5888 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #5888 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4314 completed with status: 'SUCCESS'

@dongxuy04 dongxuy04 requested a review from wm2012011492 May 21, 2025 02:11
@dongxuy04 dongxuy04 marked this pull request as ready for review May 21, 2025 02:19
Copy link
Collaborator

@jinyangyuan-nvidia jinyangyuan-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongxuy04 dongxuy04 requested a review from HuiGao-NV May 21, 2025 02:58
@juney-nvidia juney-nvidia requested review from hlu1 and litaotju May 21, 2025 03:30
@juney-nvidia juney-nvidia changed the title refactor: FusedMoe for redundant expert feat: large-scale EP(part 3 - refactor: FusedMoe for redundant expert) May 21, 2025
@juney-nvidia
Copy link
Collaborator

@HuiGao-NV Hi Jerry, can you prioritize to review this PR from Dongxu?

Thanks
June

@juney-nvidia juney-nvidia merged commit 4018806 into NVIDIA:main May 21, 2025
2 of 3 checks passed
@juney-nvidia
Copy link
Collaborator

To unblock the large-scale EP progress, I just merged this PR, feel free to comment and in case there are some big conflicting with the ongoing FusedMoE refactoring work, pls let Dongxu and me know and let's coordinate.

@HuiGao-NV @hlu1

June

@juney-nvidia juney-nvidia changed the title feat: large-scale EP(part 3 - refactor: FusedMoe for redundant expert) feat: large-scale EP(part 3: refactor - FusedMoe for redundant expert) May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants