[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

cboss6 · 2025-09-27T04:31:50Z

Description:
PR-23745 introduced the round-robin expert placement strategy for MoE models with multiple expert groups, providing a simple yet effective way to distribute experts evenly across devices.
This PR extends that work by ensuring full compatibility with EPLB (Expert Parallel Load Balancing). With this enhancement, round-robin placement can now be seamlessly combined with dynamic expert load balancing, enabling more flexible expert scheduling while maintaining balanced utilization and performance.

Performance

Conclusion: With configurations list below, when eplb is enabled, the round-robin strategy improves avg. throughput and end-to-end latency by approximately 3% than default linear strategy.

Test Platform:
Vllm version: vllm/vllm-openai:nightly-8c546102658f97b10d13bcf25193b65edc6ea6ff
Model: DeepSeek-V2-Chat-0628,
GPU: H20 * 8
Serving mode config :
python3 -u -m vllm.entrypoints.openai.api_server
--model ${MODEL_PATH}
--trust-remote-code
--gpu-memory-utilization 0.85
-tp 8 \
--enable-expert-parallel
--enable-eplb
--expert-placement-strategy "round_robin"

Benchmark config: input_len=1024, output_len=128, request_rate=4, max_concurrency=4, num_prompts=32:
python3 ./bench_serving.py
--backend vllm
--dataset-name random
--model ${MODEL_PATH}
--random-input-len 1024
--random-output-len 128
--random-range-ratio 0.5
--tokenizer ./tokenizer
--dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json
--request-rate 4
--max-concurrency 4
--num-prompts 32
--base-url http://127.0.0.1:8000
--port 8000

Accuracy Test

Tested with Deepseek-v2-chat-0628 on h20*8 with following serving cmd:

python3 -u -m vllm.entrypoints.openai.api_server \
            --model ${model_path} \
            --trust-remote-code \
            --gpu-memory-utilization 0.85 \
            -tp 8 \ 
            --enable-expert-parallel \
            --enable-eplb \
            --expert-placement-strategy "round_robin" \

Note: Deepseek-v2 has a bad behavior on our chosen dataset, just to make sure this PR has no impact on accuracy.

Dataset	vllm v0.10.1.1	This PR
Aime24	13.33%	20.00%
Gpqa	41.91%	44.44%
Math500	72.20%	72.40%

```

Signed-off-by: bruceszchen <[email protected]>

gemini-code-assist

Code Review

This pull request enables the round-robin expert placement strategy for MoE models with EPLB enabled. The changes involve refactoring the expert placement strategy logic into a utility function and updating the EPLB state creation to support the round-robin strategy. The refactoring improves code organization. However, I've found a critical bug in the implementation of the round-robin placement logic that occurs when the number of experts is not divisible by the number of expert parallel ranks. This can lead to incorrect model behavior. A fix is suggested to ensure correctness.

vllm/distributed/eplb/eplb_state.py

Signed-off-by: bruceszchen <[email protected]>

Enable round_robin expert placement strategy with eplb enabled.

d5fadd2

Signed-off-by: bruceszchen <[email protected]>

cboss6 requested a review from mgoin as a code owner September 27, 2025 04:31

gemini-code-assist bot reviewed Sep 27, 2025

View reviewed changes

vllm/distributed/eplb/eplb_state.py Outdated Show resolved Hide resolved

cboss6 changed the title ~~[Feat][EPLB] Enable Round-robin expert placement strategy with eplb enabled.~~ [Feat][EPLB][Perf] Enable Round-robin expert placement strategy with eplb enabled. Sep 27, 2025

cboss6 changed the title ~~[Feat][EPLB][Perf] Enable Round-robin expert placement strategy with eplb enabled.~~ [Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. Sep 27, 2025

cboss6 added 2 commits September 27, 2025 13:03

Fix a potential issue.

b0aa899

Signed-off-by: bruceszchen <[email protected]>

Fix a problem which makes expert-placement config inactive.

3b77117

Signed-off-by: bruceszchen <[email protected]>

cboss6 requested a review from 22quinn as a code owner September 28, 2025 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

cboss6 commented Sep 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

Are you sure you want to change the base?

[Feat][EPLB][Perf] Enable Round-robin expert placement strategy while eplb is enabled. #25798

Conversation

cboss6 commented Sep 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Accuracy Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

cboss6 commented Sep 27, 2025 •

edited by github-actions bot

Loading