Skip to content

Conversation

@lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Oct 29, 2025

Purpose

#23207 Broke Qwen3VL since it relies on Qwen2_5_VisionAttention as well but still used the old signature:

RuntimeError: vllm::flash_attn_maxseqlen_wrapper() Expected a value of type 'Tensor' for argument 'max_seqlen' but instead found type 'int'

This PR updates Qwen3VL and Qwen3Omni to fix this.

Test Plan

vllm serve Qwen/Qwen3-VL-2B-Instruct-FP8

Note: I didn't test Qwen3OmniMoE since I didn't have a large enough GPU on hand today, but it's the same code changes as for Qwen3VL so I believe it should be fine.

/cc @Lucaskabela @ywang96

I'm happy to make a followup PR to add torch.compile support to the other modules in Qwen3VL tomorrow once I get a chance to run some benchmarks.

@lgeiger lgeiger requested a review from sighingnow as a code owner October 29, 2025 02:01
Comment on lines -839 to +840
max_seqlen, seqlens = (
torch.zeros(1, device=cu_seqlens.device),
torch.zeros(1, device=cu_seqlens.device),
)
max_seqlen = torch.zeros([], device=cu_seqlens.device)
seqlens = torch.zeros(1, device=cu_seqlens.device)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_seqlen is actually a scalar value so this updates the dummy values to match this.

@mergify mergify bot added the qwen Related to Qwen models label Oct 29, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a runtime error with torch.compile in Qwen3VL and Qwen3Omni models. The error was caused by passing a Python int instead of a torch.Tensor for the max_seqlen argument in the vision attention mechanism.

The changes correctly modify the compute_attn_mask_seqlen methods in qwen2_5_vl.py, qwen3_omni_moe_thinker.py, and qwen3_vl.py to return torch.Tensor objects for max_seqlen and seqlens, instead of Python scalars or lists. This is achieved by removing .item() and .tolist() calls. The type hints for the forward methods are also updated accordingly.

These changes are consistent, targeted, and directly resolve the reported torch.compile issue. The code quality is good, and I have no further suggestions.

Copy link
Contributor

@Lucaskabela Lucaskabela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @lgeiger! I was planning on making may way through to Qwen3, but didn't realize this PR had (indirectly) impacted those files

Given that but no test failed, do we have CI coverage for Qwen3 models? (cc @ywang96 @ProExpertProg @zou3519 )

If not it might be worth adding it to the basic initialization suite of tests

@lgeiger
Copy link
Contributor Author

lgeiger commented Oct 29, 2025

Thanks for the fix @lgeiger! I was planning on making may way through to Qwen3, but didn't realize this PR had (indirectly) impacted those files

No worries! The addition of torch.compile to the ViT part is amazing. Looking forward to your Qwen3 PRs 🎉

@Lucaskabela
Copy link
Contributor

Also I would definitely welcome adding torch.compile to the vision component of Qwen3, so let me know if you need review/support on that or otherwise want to own it :) It should be relatively straightforward

The only "gotcha" is that right now the torch.compile integration compiles each instantiation of the VisionBlock separately - I am planning on putting out a fix for that shortly but in the meanwhile don't be surprised to see 30+ compile logs if applying torch.compile to the VisionBlock (it shouldn't effect correctness or runtime, but adds slightly to compile time and is not needed hence why I plan to fix this soon in followup)

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! Stamping it now and I will test Qwen3-Omni on my end.

Given that but no test failed, do we have CI coverage for Qwen3 models?

Yea we didn't add them onto CI back then because the model support PR was merged before the model release, but we should definitely add one now that the smaller version of the models are available. I will take that as an action item

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 29, 2025
@ywang96
Copy link
Member

ywang96 commented Oct 29, 2025

@lgeiger @Lucaskabela I added an fix for the missing attribute of self.vllm_config for Qwen3Omni since this model's encoder execution in fact inherits those from Qwen2.5Omni - something to also keep in mind when we add the support for Qwen3-VL, and I wonder if we should just make this an interface requirement for multimodal models.

Signed-off-by: Roger Wang <[email protected]>
@ywang96 ywang96 enabled auto-merge (squash) October 29, 2025 05:19
@ywang96 ywang96 merged commit 0d8161b into vllm-project:main Oct 29, 2025
54 checks passed
@lgeiger lgeiger deleted the fix-qwen3 branch October 29, 2025 08:53
@ywang96 ywang96 added this to the v0.11.1 milestone Oct 29, 2025
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Oct 30, 2025
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants