[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE #25055

Isotr0py · 2025-09-17T07:55:54Z

Purpose

Following PR for [Model] Support Qwen3-VL Model Series #24727, implement corresponding Triton kernel for interleaved MRoPE.

Test Plan

pytest -s -v tests/kernels/core/test_mrope.py

Test Result

Test should pass

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2025-09-17T17:26:42Z

Benchmark

server

vllm serve /home/mozf/LLM/Qwen3-VL-4B-Instruct/

client

vllm bench serve  \
--backend openai-chat   \
--endpoint-type openai-chat \
--model /home/mozf/LLM/Qwen3-VL-4B-Instruct/   \
--endpoint /v1/chat/completions   \
--dataset-name hf   \
--dataset-path lmarena-ai/VisionArena-Chat   \
--hf-split train   \
--num-prompts 200 \
--max-concurrency 64

Results

Main

============ Serving Benchmark Result ============
Successful requests:                     993       
Maximum request concurrency:             64        
Benchmark duration (s):                  186.79    
Total input tokens:                      94247     
Total generated tokens:                  119911    
Request throughput (req/s):              5.32      
Output token throughput (tok/s):         641.94    
Total Token throughput (tok/s):          1146.49   
---------------Time to First Token----------------
Mean TTFT (ms):                          1005.91   
Median TTFT (ms):                        708.54    
P99 TTFT (ms):                           5738.68   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          95.17     
Median TPOT (ms):                        90.26     
P99 TPOT (ms):                           290.06    
---------------Inter-token Latency----------------
Mean ITL (ms):                           91.23     
Median ITL (ms):                         29.79     
P99 ITL (ms):                            455.95    
==================================================

PR

============ Serving Benchmark Result ============
Successful requests:                     990       
Maximum request concurrency:             64        
Benchmark duration (s):                  185.86    
Total input tokens:                      93035     
Total generated tokens:                  119586    
Request throughput (req/s):              5.33      
Output token throughput (tok/s):         643.43    
Total Token throughput (tok/s):          1144.00   
---------------Time to First Token----------------
Mean TTFT (ms):                          959.61    
Median TTFT (ms):                        703.75    
P99 TTFT (ms):                           5533.73   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          94.51     
Median TPOT (ms):                        90.73     
P99 TPOT (ms):                           234.72    
---------------Inter-token Latency----------------
Mean ITL (ms):                           91.84     
Median ITL (ms):                         30.29     
P99 ITL (ms):                            458.38    
==================================================

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 · 2025-09-18T04:16:03Z

Can you also check lm-eval?

ywang96 · 2025-09-18T04:26:10Z

@Isotr0py @DarkLight1337 BTW let's not post the actual eval numbers since the model hasn't been released - just make sure the results match

Isotr0py · 2025-09-18T05:51:57Z

just make sure the results match

Oh, I just realized there is no generation tests for Qwen3-VL yet. Let me add ones later today (after my seminar)

Isotr0py · 2025-09-19T08:27:52Z

Have confirmed Qwen3-VL generation test added by #25185 can still pass with Triton kernel.

vllm-project#25055) Signed-off-by: Isotr0py <[email protected]>

vllm-project#25055) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: charlifu <[email protected]>

Isotr0py added 2 commits September 17, 2025 15:38

init

ec5e0ba

Signed-off-by: Isotr0py <[email protected]>

init test

b04bf4e

Signed-off-by: Isotr0py <[email protected]>

mergify bot added the qwen Related to Qwen models label Sep 17, 2025

Isotr0py added 5 commits September 17, 2025 18:21

fix sin cos layout

4c1642e

Signed-off-by: Isotr0py <[email protected]>

include qwen3_vl_moe

a16d76b

Signed-off-by: Isotr0py <[email protected]>

update test

df4b50e

Signed-off-by: Isotr0py <[email protected]>

consolidate kernel

1130de2

Signed-off-by: Isotr0py <[email protected]>

remove redundant computation for interleaved mrope

2d79d75

Signed-off-by: Isotr0py <[email protected]>

Isotr0py changed the title ~~[WIP][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE~~ [Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE Sep 17, 2025

Isotr0py marked this pull request as ready for review September 17, 2025 17:23

Isotr0py requested review from mgoin, tlrmchlsmth, WoosukKwon and yewentao256 as code owners September 17, 2025 17:23

Isotr0py added 3 commits September 18, 2025 01:57

fix t

f292633

Signed-off-by: Isotr0py <[email protected]>

fix t

4bd57ed

Signed-off-by: Isotr0py <[email protected]>

code format

9fb001d

Signed-off-by: Isotr0py <[email protected]>

Isotr0py requested a review from ywang96 September 18, 2025 02:33

Merge branch 'main' into qwen3-vl-mrope

0f274b1

DarkLight1337 approved these changes Sep 19, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 19, 2025 08:39

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025

DarkLight1337 merged commit cea91a3 into vllm-project:main Sep 19, 2025
46 of 48 checks passed

Isotr0py deleted the qwen3-vl-mrope branch September 19, 2025 10:52

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (

cef294a

vllm-project#25055) Signed-off-by: Isotr0py <[email protected]>

ywang96 mentioned this pull request Sep 20, 2025

[Model] Support Qwen3-VL Model Series #24727

Merged

12 tasks

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (

5769dfe

vllm-project#25055) Signed-off-by: Isotr0py <[email protected]>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (

7f3073c

vllm-project#25055) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: charlifu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE #25055

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE #25055

Uh oh!

Isotr0py commented Sep 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

Isotr0py commented Sep 17, 2025

Uh oh!

DarkLight1337 commented Sep 18, 2025

Uh oh!

ywang96 commented Sep 18, 2025

Uh oh!

Isotr0py commented Sep 18, 2025

Uh oh!

Isotr0py commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE #25055

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE #25055

Uh oh!

Conversation

Isotr0py commented Sep 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Isotr0py commented Sep 17, 2025

Benchmark

Results

Uh oh!

DarkLight1337 commented Sep 18, 2025

Uh oh!

ywang96 commented Sep 18, 2025

Uh oh!

Isotr0py commented Sep 18, 2025

Uh oh!

Isotr0py commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

Isotr0py commented Sep 17, 2025 •

edited by github-actions bot

Loading