[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL #28271

gcanlin · 2025-11-07T06:36:57Z

Purpose

This code path uses head-parallel attention, where each rank holds full Q/K/V vectors for its own subset of heads. All attention backends operate per-rank on local heads. Therefore, the previous tp_size > 1 communication logic (all_gather_interleave and re-slicing) is unnecessary and introduces redundant synchronization overhead. Removing it preserves correctness and improves performance.

Test Plan

It has been tested in vllm-ascend. And we indeed observed improved performance.

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors the split_qkv method in Qwen2_5_VisionAttention for the Qwen2.5-VL model. The change removes redundant tensor parallelism communication logic, specifically an all_gather_interleave operation followed by a re-slicing of the Q, K, and V tensors. As the model uses head-parallel attention where each tensor-parallel rank already computes its own subset of heads, these communication steps were unnecessary and introduced performance overhead. The removal of this logic is correct, simplifies the code, and should improve performance by eliminating redundant synchronization. The change is well-justified and looks good.

tjtanaa · 2025-11-07T19:40:50Z

@gcanlin I think this cannot be removed for correctness of the Omni model

There was a same PR #21493 opened before, for more comments about this change.

CC @DarkLight1337

gcanlin · 2025-11-08T11:11:12Z

@tjtanaa Thanks for the reminder! I reviewed the PR #16907 and #16974 mentioned by @DarkLight1337. It seems that the part actually causing the break is the gather function we previously considered redundant. For now, we haven’t observed this optimization causing any Omni-related break on Ascend. We also enable Omni to benefit from this optimization as well in this PR.

DarkLight1337 · 2025-11-10T05:28:07Z

@ywang96 can you ask Qwen team for comment?

gcanlin · 2025-11-12T03:21:42Z

@tjtanaa @DarkLight1337 I conducted a correctness verification for Qwen2.5-Omni-3B under tp=2, and the specific command and results are shown below. It seems that this PR’s removal of gather and slice operations did not cause any breakage for Qwen2.5-Omni in multi-TP settings. Could you please take a look for this verification? I’m not sure whether this verification is sufficient. Do I need to run additional tests to cover scenarios that might potentially cause a break? Please feel free to let me know if you think further testing is needed. Thanks!

Command

uv run lm_eval --model vllm-vlm --model_args "pretrained=Qwen/Qwen2.5-Omni-3B,model=Qwen/Qwen2.5-Omni-3B,tensor_parallel_size=2"  --tasks mmmu_val,chartqa  --batch_size 32 --apply_chat_template

Before

|                 Tasks                 |Version|Filter|n-shot|     Metric      |   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|-----------------|---|-----:|---|-----:|
|chartqa                                |      0|none  |     0|anywhere_accuracy|↑  |0.8092|±  |0.0079|
|                                       |       |none  |     0|exact_match      |↑  |0.5336|±  |0.0100|
|                                       |       |none  |     0|relaxed_accuracy |↑  |0.8020|±  |0.0080|
|mmmu_val                               |      0|none  |      |acc              |↑  |0.4689|±  |0.0160|
| - Art and Design                      |      0|none  |      |acc              |↑  |0.6000|±  |0.0404|
|  - Art                                |      0|none  |     0|acc              |↑  |0.6000|±  |0.0910|
|  - Art Theory                         |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
|  - Design                             |      0|none  |     0|acc              |↑  |0.8000|±  |0.0743|
|  - Music                              |      0|none  |     0|acc              |↑  |0.2333|±  |0.0785|
| - Business                            |      0|none  |      |acc              |↑  |0.3933|±  |0.0402|
|  - Accounting                         |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Economics                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Finance                            |      0|none  |     0|acc              |↑  |0.3333|±  |0.0875|
|  - Manage                             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Marketing                          |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
| - Health and Medicine                 |      0|none  |      |acc              |↑  |0.4667|±  |0.0411|
|  - Basic Medical Science              |      0|none  |     0|acc              |↑  |0.5333|±  |0.0926|
|  - Clinical Medicine                  |      0|none  |     0|acc              |↑  |0.3667|±  |0.0895|
|  - Diagnostics and Laboratory Medicine|      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Pharmacy                           |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Public Health                      |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
| - Humanities and Social Science       |      0|none  |      |acc              |↑  |0.7083|±  |0.0420|
|  - History                            |      0|none  |     0|acc              |↑  |0.6333|±  |0.0895|
|  - Literature                         |      0|none  |     0|acc              |↑  |0.7333|±  |0.0821|
|  - Psychology                         |      0|none  |     0|acc              |↑  |0.7000|±  |0.0851|
|  - Sociology                          |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
| - Science                             |      0|none  |      |acc              |↑  |0.3933|±  |0.0398|
|  - Biology                            |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Chemistry                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Geography                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Math                               |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Physics                            |      0|none  |     0|acc              |↑  |0.2667|±  |0.0821|
| - Tech and Engineering                |      0|none  |      |acc              |↑  |0.3667|±  |0.0327|
|  - Agriculture                        |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Architecture and Engineering       |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Computer Science                   |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Electronics                        |      0|none  |     0|acc              |↑  |0.1000|±  |0.0557|
|  - Energy and Power                   |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Materials                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Mechanical Engineering             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|

|             Groups             |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------------------------|------:|------|------|------|---|-----:|---|-----:|
|mmmu_val                        |      0|none  |      |acc   |↑  |0.4689|±  |0.0160|
| - Art and Design               |      0|none  |      |acc   |↑  |0.6000|±  |0.0404|
| - Business                     |      0|none  |      |acc   |↑  |0.3933|±  |0.0402|
| - Health and Medicine          |      0|none  |      |acc   |↑  |0.4667|±  |0.0411|
| - Humanities and Social Science|      0|none  |      |acc   |↑  |0.7083|±  |0.0420|
| - Science                      |      0|none  |      |acc   |↑  |0.3933|±  |0.0398|
| - Tech and Engineering         |      0|none  |      |acc   |↑  |0.3667|±  |0.0327|

After

|                 Tasks                 |Version|Filter|n-shot|     Metric      |   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|-----------------|---|-----:|---|-----:|
|chartqa                                |      0|none  |     0|anywhere_accuracy|↑  |0.8088|±  |0.0079|
|                                       |       |none  |     0|exact_match      |↑  |0.5328|±  |0.0100|
|                                       |       |none  |     0|relaxed_accuracy |↑  |0.8004|±  |0.0080|
|mmmu_val                               |      0|none  |      |acc              |↑  |0.4700|±  |0.0160|
| - Art and Design                      |      0|none  |      |acc              |↑  |0.6000|±  |0.0404|
|  - Art                                |      0|none  |     0|acc              |↑  |0.6000|±  |0.0910|
|  - Art Theory                         |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
|  - Design                             |      0|none  |     0|acc              |↑  |0.8000|±  |0.0743|
|  - Music                              |      0|none  |     0|acc              |↑  |0.2333|±  |0.0785|
| - Business                            |      0|none  |      |acc              |↑  |0.3933|±  |0.0402|
|  - Accounting                         |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Economics                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Finance                            |      0|none  |     0|acc              |↑  |0.3333|±  |0.0875|
|  - Manage                             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Marketing                          |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
| - Health and Medicine                 |      0|none  |      |acc              |↑  |0.4667|±  |0.0411|
|  - Basic Medical Science              |      0|none  |     0|acc              |↑  |0.5333|±  |0.0926|
|  - Clinical Medicine                  |      0|none  |     0|acc              |↑  |0.3667|±  |0.0895|
|  - Diagnostics and Laboratory Medicine|      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Pharmacy                           |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Public Health                      |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
| - Humanities and Social Science       |      0|none  |      |acc              |↑  |0.7083|±  |0.0420|
|  - History                            |      0|none  |     0|acc              |↑  |0.6333|±  |0.0895|
|  - Literature                         |      0|none  |     0|acc              |↑  |0.7333|±  |0.0821|
|  - Psychology                         |      0|none  |     0|acc              |↑  |0.7000|±  |0.0851|
|  - Sociology                          |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
| - Science                             |      0|none  |      |acc              |↑  |0.3933|±  |0.0398|
|  - Biology                            |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Chemistry                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Geography                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Math                               |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Physics                            |      0|none  |     0|acc              |↑  |0.2667|±  |0.0821|
| - Tech and Engineering                |      0|none  |      |acc              |↑  |0.3714|±  |0.0330|
|  - Agriculture                        |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Architecture and Engineering       |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Computer Science                   |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Electronics                        |      0|none  |     0|acc              |↑  |0.1333|±  |0.0631|
|  - Energy and Power                   |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Materials                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Mechanical Engineering             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|

|             Groups             |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------------------------|------:|------|------|------|---|-----:|---|-----:|
|mmmu_val                        |      0|none  |      |acc   |↑  |0.4700|±  |0.0160|
| - Art and Design               |      0|none  |      |acc   |↑  |0.6000|±  |0.0404|
| - Business                     |      0|none  |      |acc   |↑  |0.3933|±  |0.0402|
| - Health and Medicine          |      0|none  |      |acc   |↑  |0.4667|±  |0.0411|
| - Humanities and Social Science|      0|none  |      |acc   |↑  |0.7083|±  |0.0420|
| - Science                      |      0|none  |      |acc   |↑  |0.3933|±  |0.0398|
| - Tech and Engineering         |      0|none  |      |acc   |↑  |0.3714|±  |0.0330|

DarkLight1337

Thanks for testing this! After discussing offline with @ywang96 , we think it is ok to merge this until proven otherwise.

gcanlin · 2025-11-12T08:44:09Z

@DarkLight1337 Sorry to bother you again. The CI can finish running, but there are 3 failing tests. At first glance, they don’t seem related to this PR. Should we fix them?

DarkLight1337 · 2025-11-12T09:34:21Z

Retrying the tests to see if they are flaky

gcanlin · 2025-11-12T12:49:39Z

Sadly it failed again. I will try to find the root cause. Or maybe should I try rebasing this PR?

DarkLight1337 · 2025-11-12T12:59:03Z

Yeah can you rebase?

Signed-off-by: gcanlin <[email protected]>

gcanlin requested a review from sighingnow as a code owner November 7, 2025 06:36

mergify bot added the qwen Related to Qwen models label Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

gcanlin changed the title ~~[Refactor] Remove redundant TP gather/split in split_qkv in Qwen2_5-VL~~ [Refactor] Remove redundant TP gather/split in split_qkv in QwenVL Nov 7, 2025

shen-shanshan mentioned this pull request Nov 10, 2025

[RFC]: Remove VL Modeling Files vllm-project/vllm-ascend#4084

Open

8 tasks

DarkLight1337 approved these changes Nov 12, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 12, 2025 05:33

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 12, 2025

gcanlin added 4 commits November 12, 2025 13:50

[Refactor] Remove redundant TP gather/split in split_qkv in Qwen2_5-VL

246bda0

Signed-off-by: gcanlin <[email protected]>

[Refactor] Remove redundant TP gather/split in split_qkv in Qwen2-VL

50ab1e8

Signed-off-by: gcanlin <[email protected]>

fix lint

78937e7

Signed-off-by: gcanlin <[email protected]>

remove unused function

525f05a

Signed-off-by: gcanlin <[email protected]>

auto-merge was automatically disabled November 12, 2025 13:50
Head branch was pushed to by a user without write access

gcanlin force-pushed the refactor-qwen branch from bfc2bd2 to 525f05a Compare November 12, 2025 13:50

DarkLight1337 enabled auto-merge (squash) November 12, 2025 13:54

DarkLight1337 merged commit bc5bd45 into vllm-project:main Nov 12, 2025
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL #28271

[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL #28271

Uh oh!

gcanlin commented Nov 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

tjtanaa commented Nov 7, 2025 •

edited

Loading

Uh oh!

gcanlin commented Nov 8, 2025

Uh oh!

DarkLight1337 commented Nov 10, 2025

Uh oh!

gcanlin commented Nov 12, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

gcanlin commented Nov 12, 2025

Uh oh!

DarkLight1337 commented Nov 12, 2025

Uh oh!

gcanlin commented Nov 12, 2025

Uh oh!

DarkLight1337 commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL #28271

[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL #28271

Uh oh!

Conversation

gcanlin commented Nov 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tjtanaa commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gcanlin commented Nov 8, 2025

Uh oh!

DarkLight1337 commented Nov 10, 2025

Uh oh!

gcanlin commented Nov 12, 2025

Command

Before

After

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Nov 12, 2025

Uh oh!

DarkLight1337 commented Nov 12, 2025

Uh oh!

gcanlin commented Nov 12, 2025

Uh oh!

DarkLight1337 commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gcanlin commented Nov 7, 2025 •

edited by github-actions bot

Loading

tjtanaa commented Nov 7, 2025 •

edited

Loading