Skip to content

Conversation

Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Sep 17, 2025

Purpose

Test Plan

pytest -s -v tests/kernels/core/test_mrope.py

Test Result

Test should pass


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@mergify mergify bot added the qwen Related to Qwen models label Sep 17, 2025
@Isotr0py Isotr0py changed the title [WIP][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE [Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE Sep 17, 2025
@Isotr0py Isotr0py marked this pull request as ready for review September 17, 2025 17:23
@Isotr0py
Copy link
Member Author

Benchmark

server

vllm serve /home/mozf/LLM/Qwen3-VL-4B-Instruct/

client

vllm bench serve  \
--backend openai-chat   \
--endpoint-type openai-chat \
--model /home/mozf/LLM/Qwen3-VL-4B-Instruct/   \
--endpoint /v1/chat/completions   \
--dataset-name hf   \
--dataset-path lmarena-ai/VisionArena-Chat   \
--hf-split train   \
--num-prompts 200 \
--max-concurrency 64

Results

Main

============ Serving Benchmark Result ============
Successful requests:                     993       
Maximum request concurrency:             64        
Benchmark duration (s):                  186.79    
Total input tokens:                      94247     
Total generated tokens:                  119911    
Request throughput (req/s):              5.32      
Output token throughput (tok/s):         641.94    
Total Token throughput (tok/s):          1146.49   
---------------Time to First Token----------------
Mean TTFT (ms):                          1005.91   
Median TTFT (ms):                        708.54    
P99 TTFT (ms):                           5738.68   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          95.17     
Median TPOT (ms):                        90.26     
P99 TPOT (ms):                           290.06    
---------------Inter-token Latency----------------
Mean ITL (ms):                           91.23     
Median ITL (ms):                         29.79     
P99 ITL (ms):                            455.95    
==================================================

PR

============ Serving Benchmark Result ============
Successful requests:                     990       
Maximum request concurrency:             64        
Benchmark duration (s):                  185.86    
Total input tokens:                      93035     
Total generated tokens:                  119586    
Request throughput (req/s):              5.33      
Output token throughput (tok/s):         643.43    
Total Token throughput (tok/s):          1144.00   
---------------Time to First Token----------------
Mean TTFT (ms):                          959.61    
Median TTFT (ms):                        703.75    
P99 TTFT (ms):                           5533.73   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          94.51     
Median TPOT (ms):                        90.73     
P99 TPOT (ms):                           234.72    
---------------Inter-token Latency----------------
Mean ITL (ms):                           91.84     
Median ITL (ms):                         30.29     
P99 ITL (ms):                            458.38    
==================================================

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@Isotr0py Isotr0py requested a review from ywang96 September 18, 2025 02:33
@DarkLight1337
Copy link
Member

Can you also check lm-eval?

@ywang96
Copy link
Member

ywang96 commented Sep 18, 2025

@Isotr0py @DarkLight1337 BTW let's not post the actual eval numbers since the model hasn't been released - just make sure the results match

@Isotr0py
Copy link
Member Author

just make sure the results match

Oh, I just realized there is no generation tests for Qwen3-VL yet. Let me add ones later today (after my seminar)

@Isotr0py
Copy link
Member Author

Have confirmed Qwen3-VL generation test added by #25185 can still pass with Triton kernel.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 19, 2025 08:39
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
@DarkLight1337 DarkLight1337 merged commit cea91a3 into vllm-project:main Sep 19, 2025
46 of 48 checks passed
@Isotr0py Isotr0py deleted the qwen3-vl-mrope branch September 19, 2025 10:52
debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants