Skip to content

Conversation

@ZJY0516
Copy link
Contributor

@ZJY0516 ZJY0516 commented Aug 29, 2025

Purpose

Enable encoder DP for MiniCPM-V

Related to: #22743

Test Plan

MiniCPM-V-2_6

vllm serve ./hf/MiniCPM-V-2_6/ -tp 2 --trust-remote-code --mm_encoder_tp_mode data

vllm bench serve \
--endpoint-type openai-chat \
--endpoint /v1/chat/completions \
--model "./hf/MiniCPM-V-2_6/" \
--dataset-name random-mm \
--random-mm-base-items-per-request 3 \
--tokenizer ~/hf/MiniCPM-V-2_6/ --trust-remote-code \
--max-concurrency 1 --num-prompts 50

Test Result

encoder dp
============ Serving Benchmark Result ============
Successful requests:                     50        
Maximum request concurrency:             1         
Benchmark duration (s):                  92.09     
Total input tokens:                      51034     
Total generated tokens:                  3652      
Request throughput (req/s):              0.54      
Output token throughput (tok/s):         39.66     
Total Token throughput (tok/s):          593.85    
---------------Time to First Token----------------
Mean TTFT (ms):                          987.47    
Median TTFT (ms):                        955.18    
P99 TTFT (ms):                           1295.46   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          11.86     
Median TPOT (ms):                        11.86     
P99 TPOT (ms):                           11.96     
---------------Inter-token Latency----------------
Mean ITL (ms):                           11.69     
Median ITL (ms):                         11.83     
P99 ITL (ms):                            12.59     
==================================================

default
============ Serving Benchmark Result ============
Successful requests:                     50        
Maximum request concurrency:             1         
Benchmark duration (s):                  95.14     
Total input tokens:                      51034     
Total generated tokens:                  3730      
Request throughput (req/s):              0.53      
Output token throughput (tok/s):         39.21     
Total Token throughput (tok/s):          575.64    
---------------Time to First Token----------------
Mean TTFT (ms):                          1025.98   
Median TTFT (ms):                        1021.61   
P99 TTFT (ms):                           1637.95   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          11.90     
Median TPOT (ms):                        11.87     
P99 TPOT (ms):                           12.15     
---------------Inter-token Latency----------------
Mean ITL (ms):                           11.74     
Median ITL (ms):                         11.85     
P99 ITL (ms):                            12.62     
==================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly enables encoder data parallelism for MiniCPM-V models by passing a use_data_parallel flag to the Idefics2VisionTransformer. However, there are two key issues. First, the supports_encoder_tp_data flag is only set for MiniCPMV4_0, which means this feature will likely not be enabled for versions 2.5, 2.6, and 4.5, despite code changes suggesting they should support it. This is a correctness issue. Second, as detailed in a specific comment, there is significant code duplication in init_vision_module across multiple classes, which impacts maintainability.

Comment on lines +1355 to +1360
model = Idefics2VisionTransformer(
config.vision_config,
quant_config=quant_config,
prefix=prefix,
use_data_parallel=self.use_data_parallel,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This init_vision_module implementation is nearly identical across MiniCPMV2_5, MiniCPMV2_6, MiniCPMV4_0, and MiniCPMV4_5. The only significant difference is the conditional logic for quant_config in the v4.x models. This duplication increases maintenance effort. Consider refactoring this into a shared method in a base class to improve code reuse and maintainability.

Signed-off-by: zjy0516 <[email protected]>
@ZJY0516 ZJY0516 requested a review from hmellor as a code owner August 29, 2025 16:13
@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 29, 2025
@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Aug 30, 2025

@DarkLight1337 Could you please take a look at this pr?

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you move the flag introduced by #23325 from the V4 model to the base model so that this is actually enabled for the other models?

Signed-off-by: zjy0516 <[email protected]>
@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Aug 30, 2025

Thanks, can you move the flag introduced by #23325 from the V4 model to the base model so that this is actually enabled for the other models?

Done

Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Jiangyun Zhu <[email protected]>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 30, 2025 06:03
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 30, 2025
@vllm-bot vllm-bot merged commit 3a6acad into vllm-project:main Aug 30, 2025
37 of 41 checks passed
@ZJY0516 ZJY0516 deleted the encoder-dp branch August 30, 2025 13:32
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: Jiangyun Zhu <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: Jiangyun Zhu <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants