Skip to content

Conversation

@gcanlin
Copy link
Contributor

@gcanlin gcanlin commented Nov 7, 2025

Purpose

This code path uses head-parallel attention, where each rank holds full Q/K/V vectors for its own subset of heads. All attention backends operate per-rank on local heads. Therefore, the previous tp_size > 1 communication logic (all_gather_interleave and re-slicing) is unnecessary and introduces redundant synchronization overhead. Removing it preserves correctness and improves performance.

Test Plan

It has been tested in vllm-ascend. And we indeed observed improved performance.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@gcanlin gcanlin requested a review from sighingnow as a code owner November 7, 2025 06:36
@mergify mergify bot added the qwen Related to Qwen models label Nov 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the split_qkv method in Qwen2_5_VisionAttention for the Qwen2.5-VL model. The change removes redundant tensor parallelism communication logic, specifically an all_gather_interleave operation followed by a re-slicing of the Q, K, and V tensors. As the model uses head-parallel attention where each tensor-parallel rank already computes its own subset of heads, these communication steps were unnecessary and introduced performance overhead. The removal of this logic is correct, simplifies the code, and should improve performance by eliminating redundant synchronization. The change is well-justified and looks good.

@gcanlin gcanlin changed the title [Refactor] Remove redundant TP gather/split in split_qkv in Qwen2_5-VL [Refactor] Remove redundant TP gather/split in split_qkv in QwenVL Nov 7, 2025
@tjtanaa
Copy link
Collaborator

tjtanaa commented Nov 7, 2025

@gcanlin I think this cannot be removed for correctness of the Omni model

There was a same PR #21493 opened before, for more comments about this change.

CC @DarkLight1337

@gcanlin
Copy link
Contributor Author

gcanlin commented Nov 8, 2025

@tjtanaa Thanks for the reminder! I reviewed the PR #16907 and #16974 mentioned by @DarkLight1337. It seems that the part actually causing the break is the gather function we previously considered redundant. For now, we haven’t observed this optimization causing any Omni-related break on Ascend. We also enable Omni to benefit from this optimization as well in this PR.

@DarkLight1337
Copy link
Member

@ywang96 can you ask Qwen team for comment?

@gcanlin
Copy link
Contributor Author

gcanlin commented Nov 12, 2025

@tjtanaa @DarkLight1337 I conducted a correctness verification for Qwen2.5-Omni-3B under tp=2, and the specific command and results are shown below. It seems that this PR’s removal of gather and slice operations did not cause any breakage for Qwen2.5-Omni in multi-TP settings. Could you please take a look for this verification? I’m not sure whether this verification is sufficient. Do I need to run additional tests to cover scenarios that might potentially cause a break? Please feel free to let me know if you think further testing is needed. Thanks!

Command

uv run lm_eval --model vllm-vlm --model_args "pretrained=Qwen/Qwen2.5-Omni-3B,model=Qwen/Qwen2.5-Omni-3B,tensor_parallel_size=2"  --tasks mmmu_val,chartqa  --batch_size 32 --apply_chat_template

Before

|                 Tasks                 |Version|Filter|n-shot|     Metric      |   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|-----------------|---|-----:|---|-----:|
|chartqa                                |      0|none  |     0|anywhere_accuracy|↑  |0.8092|±  |0.0079|
|                                       |       |none  |     0|exact_match      |↑  |0.5336|±  |0.0100|
|                                       |       |none  |     0|relaxed_accuracy |↑  |0.8020|±  |0.0080|
|mmmu_val                               |      0|none  |      |acc              |↑  |0.4689|±  |0.0160|
| - Art and Design                      |      0|none  |      |acc              |↑  |0.6000|±  |0.0404|
|  - Art                                |      0|none  |     0|acc              |↑  |0.6000|±  |0.0910|
|  - Art Theory                         |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
|  - Design                             |      0|none  |     0|acc              |↑  |0.8000|±  |0.0743|
|  - Music                              |      0|none  |     0|acc              |↑  |0.2333|±  |0.0785|
| - Business                            |      0|none  |      |acc              |↑  |0.3933|±  |0.0402|
|  - Accounting                         |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Economics                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Finance                            |      0|none  |     0|acc              |↑  |0.3333|±  |0.0875|
|  - Manage                             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Marketing                          |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
| - Health and Medicine                 |      0|none  |      |acc              |↑  |0.4667|±  |0.0411|
|  - Basic Medical Science              |      0|none  |     0|acc              |↑  |0.5333|±  |0.0926|
|  - Clinical Medicine                  |      0|none  |     0|acc              |↑  |0.3667|±  |0.0895|
|  - Diagnostics and Laboratory Medicine|      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Pharmacy                           |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Public Health                      |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
| - Humanities and Social Science       |      0|none  |      |acc              |↑  |0.7083|±  |0.0420|
|  - History                            |      0|none  |     0|acc              |↑  |0.6333|±  |0.0895|
|  - Literature                         |      0|none  |     0|acc              |↑  |0.7333|±  |0.0821|
|  - Psychology                         |      0|none  |     0|acc              |↑  |0.7000|±  |0.0851|
|  - Sociology                          |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
| - Science                             |      0|none  |      |acc              |↑  |0.3933|±  |0.0398|
|  - Biology                            |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Chemistry                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Geography                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Math                               |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Physics                            |      0|none  |     0|acc              |↑  |0.2667|±  |0.0821|
| - Tech and Engineering                |      0|none  |      |acc              |↑  |0.3667|±  |0.0327|
|  - Agriculture                        |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Architecture and Engineering       |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Computer Science                   |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Electronics                        |      0|none  |     0|acc              |↑  |0.1000|±  |0.0557|
|  - Energy and Power                   |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Materials                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Mechanical Engineering             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|

|             Groups             |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------------------------|------:|------|------|------|---|-----:|---|-----:|
|mmmu_val                        |      0|none  |      |acc   |↑  |0.4689|±  |0.0160|
| - Art and Design               |      0|none  |      |acc   |↑  |0.6000|±  |0.0404|
| - Business                     |      0|none  |      |acc   |↑  |0.3933|±  |0.0402|
| - Health and Medicine          |      0|none  |      |acc   |↑  |0.4667|±  |0.0411|
| - Humanities and Social Science|      0|none  |      |acc   |↑  |0.7083|±  |0.0420|
| - Science                      |      0|none  |      |acc   |↑  |0.3933|±  |0.0398|
| - Tech and Engineering         |      0|none  |      |acc   |↑  |0.3667|±  |0.0327|

After

|                 Tasks                 |Version|Filter|n-shot|     Metric      |   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|-----------------|---|-----:|---|-----:|
|chartqa                                |      0|none  |     0|anywhere_accuracy|↑  |0.8088|±  |0.0079|
|                                       |       |none  |     0|exact_match      |↑  |0.5328|±  |0.0100|
|                                       |       |none  |     0|relaxed_accuracy |↑  |0.8004|±  |0.0080|
|mmmu_val                               |      0|none  |      |acc              |↑  |0.4700|±  |0.0160|
| - Art and Design                      |      0|none  |      |acc              |↑  |0.6000|±  |0.0404|
|  - Art                                |      0|none  |     0|acc              |↑  |0.6000|±  |0.0910|
|  - Art Theory                         |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
|  - Design                             |      0|none  |     0|acc              |↑  |0.8000|±  |0.0743|
|  - Music                              |      0|none  |     0|acc              |↑  |0.2333|±  |0.0785|
| - Business                            |      0|none  |      |acc              |↑  |0.3933|±  |0.0402|
|  - Accounting                         |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Economics                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Finance                            |      0|none  |     0|acc              |↑  |0.3333|±  |0.0875|
|  - Manage                             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Marketing                          |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
| - Health and Medicine                 |      0|none  |      |acc              |↑  |0.4667|±  |0.0411|
|  - Basic Medical Science              |      0|none  |     0|acc              |↑  |0.5333|±  |0.0926|
|  - Clinical Medicine                  |      0|none  |     0|acc              |↑  |0.3667|±  |0.0895|
|  - Diagnostics and Laboratory Medicine|      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Pharmacy                           |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Public Health                      |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
| - Humanities and Social Science       |      0|none  |      |acc              |↑  |0.7083|±  |0.0420|
|  - History                            |      0|none  |     0|acc              |↑  |0.6333|±  |0.0895|
|  - Literature                         |      0|none  |     0|acc              |↑  |0.7333|±  |0.0821|
|  - Psychology                         |      0|none  |     0|acc              |↑  |0.7000|±  |0.0851|
|  - Sociology                          |      0|none  |     0|acc              |↑  |0.7667|±  |0.0785|
| - Science                             |      0|none  |      |acc              |↑  |0.3933|±  |0.0398|
|  - Biology                            |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|
|  - Chemistry                          |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Geography                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Math                               |      0|none  |     0|acc              |↑  |0.5000|±  |0.0928|
|  - Physics                            |      0|none  |     0|acc              |↑  |0.2667|±  |0.0821|
| - Tech and Engineering                |      0|none  |      |acc              |↑  |0.3714|±  |0.0330|
|  - Agriculture                        |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Architecture and Engineering       |      0|none  |     0|acc              |↑  |0.3000|±  |0.0851|
|  - Computer Science                   |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Electronics                        |      0|none  |     0|acc              |↑  |0.1333|±  |0.0631|
|  - Energy and Power                   |      0|none  |     0|acc              |↑  |0.4000|±  |0.0910|
|  - Materials                          |      0|none  |     0|acc              |↑  |0.4667|±  |0.0926|
|  - Mechanical Engineering             |      0|none  |     0|acc              |↑  |0.4333|±  |0.0920|

|             Groups             |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------------------------|------:|------|------|------|---|-----:|---|-----:|
|mmmu_val                        |      0|none  |      |acc   |↑  |0.4700|±  |0.0160|
| - Art and Design               |      0|none  |      |acc   |↑  |0.6000|±  |0.0404|
| - Business                     |      0|none  |      |acc   |↑  |0.3933|±  |0.0402|
| - Health and Medicine          |      0|none  |      |acc   |↑  |0.4667|±  |0.0411|
| - Humanities and Social Science|      0|none  |      |acc   |↑  |0.7083|±  |0.0420|
| - Science                      |      0|none  |      |acc   |↑  |0.3933|±  |0.0398|
| - Tech and Engineering         |      0|none  |      |acc   |↑  |0.3714|±  |0.0330|

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing this! After discussing offline with @ywang96 , we think it is ok to merge this until proven otherwise.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 12, 2025 05:33
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 12, 2025
@gcanlin
Copy link
Contributor Author

gcanlin commented Nov 12, 2025

@DarkLight1337 Sorry to bother you again. The CI can finish running, but there are 3 failing tests. At first glance, they don’t seem related to this PR. Should we fix them?

@DarkLight1337
Copy link
Member

Retrying the tests to see if they are flaky

@gcanlin
Copy link
Contributor Author

gcanlin commented Nov 12, 2025

Sadly it failed again. I will try to find the root cause. Or maybe should I try rebasing this PR?

@DarkLight1337
Copy link
Member

Yeah can you rebase?

auto-merge was automatically disabled November 12, 2025 13:50

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 12, 2025 13:54
@DarkLight1337 DarkLight1337 merged commit bc5bd45 into vllm-project:main Nov 12, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants