-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Bugfix] Incorrect MM data format in vllm bench throughput
#26395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Incorrect MM data format in vllm bench throughput
#26395
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| for request in requests: | ||
| prompts.append( | ||
| TokensPrompt( | ||
| prompt_token_ids=request.prompt["prompt_token_ids"], | ||
| multi_modal_data=request.multi_modal_data, | ||
| ) | ||
| prompt = ( | ||
| TokensPrompt(prompt_token_ids=request.prompt["prompt_token_ids"]) | ||
| if "prompt_token_ids" in request.prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async throughput benchmark never sends any prompts
Inside the request loop a prompt object is built but never appended to prompts. The later loop for i, (prompt, sp, lr) in enumerate(zip(prompts, sampling_params, lora_requests)): therefore iterates zero times, so no calls to llm.generate are issued and the benchmark completes without exercising the engine, returning a meaningless near‑zero latency. This regresses all vllm bench throughput runs that rely on run_vllm_async.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug where multi_modal_data could be incorrectly passed as None in vllm bench throughput, causing an error. The change in run_vllm_async correctly avoids this by only setting multi_modal_data if it has a value. However, the added assertion to check the type of multi_modal_data is too restrictive and could cause crashes with certain datasets. A similar bug of passing None also appears to exist in the run_vllm function, which might be worth addressing for consistency.
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]>
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]>
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]>
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]>
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…oject#26395) Signed-off-by: DarkLight1337 <[email protected]>
Purpose
FIX #26320
According to the definition of
TextPrompt/TokensPrompt,multi_modal_datais marked asNotRequired[MultiModalDataDict], which means that if the item exists in the dictionary, it must beMultiModalDataDict, notNone. However invllm bench servethe data is passed asNonewhich results in this error.cc @huydhn
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.