[Usage]: Qwen2VL model mrope implemenation in cuda graph 

### Anything you want to discuss about vllm.

in qwen2vl's mrope imple, vllm decide whether input positions is for multimodal with
![image](https://github.com/user-attachments/assets/6dfc96d9-5162-4fbf-8759-031de22405e0)

in RUNTIME. So, when input is text-only, the input positions is (seqlen).
however,  vllm's cuda graph use positions shape == (3, seqlen).
![image](https://github.com/user-attachments/assets/8d96aff4-a931-4804-9879-64ad1666544b)

Does that means we can not use cuda graph for qwen2vl with text-only input. Otherwise, we get (seqlen) positions shape, but  cuda graph deal with it as (3, seqlen)?

However I do some tests, It seems no difference of final results between cuda graph and eager mode with text-only input? So I was wondering why.
PS. I use nsys to profile the whole process, cuda-graph DO have two more kernels than eager mode.
Left is cuda-graph, right is eager.
![image](https://github.com/user-attachments/assets/ddfb1034-2877-4c69-b2cf-846da3463a3e)



### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Qwen2VL model mrope implemenation in cuda graph #9546

Anything you want to discuss about vllm.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Qwen2VL model mrope implemenation in cuda graph #9546

Description

Anything you want to discuss about vllm.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions