-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Doc] Add multi-image input example and update supported models #8181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
DarkLight1337
merged 11 commits into
vllm-project:main
from
DarkLight1337:mm-support-docs
Sep 5, 2024
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
c0447dd
Indicate more information about supported modalities
DarkLight1337 1b97498
Clarify
DarkLight1337 d170164
Remove outdated note about single-image input
DarkLight1337 61905b8
Update docs and add example
DarkLight1337 53d4bea
Add missing references
DarkLight1337 1861b42
Clean up
DarkLight1337 5df8488
Use generate method
DarkLight1337 9cdbf0e
Remove some unnecessary lines
DarkLight1337 f747678
Further compress the lines
DarkLight1337 7950c27
Add new example to the tests
DarkLight1337 97b6006
Use a default argument
DarkLight1337 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| """ | ||
| This example shows how to use vLLM for running offline inference with | ||
| multi-image input on vision language models, using the chat template defined | ||
| by the model. | ||
| """ | ||
| from argparse import Namespace | ||
| from typing import List | ||
|
|
||
| from vllm import LLM | ||
| from vllm.multimodal.utils import fetch_image | ||
| from vllm.utils import FlexibleArgumentParser | ||
|
|
||
| QUESTION = "What is the content of each image?" | ||
| IMAGE_URLS = [ | ||
| "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg", | ||
| "https://upload.wikimedia.org/wikipedia/commons/7/77/002_The_lion_king_Snyggve_in_the_Serengeti_National_Park_Photo_by_Giles_Laurent.jpg", | ||
| ] | ||
|
|
||
|
|
||
| def _load_phi3v(image_urls: List[str]): | ||
| return LLM( | ||
| model="microsoft/Phi-3.5-vision-instruct", | ||
| trust_remote_code=True, | ||
| max_model_len=4096, | ||
| limit_mm_per_prompt={"image": len(image_urls)}, | ||
| ) | ||
|
|
||
|
|
||
| def run_phi3v_generate(question: str, image_urls: List[str]): | ||
| llm = _load_phi3v(image_urls) | ||
|
|
||
| placeholders = "\n".join(f"<|image_{i}|>" | ||
| for i, _ in enumerate(image_urls, start=1)) | ||
| prompt = f"<|user|>\n{placeholders}\n{question}<|end|>\n<|assistant|>\n" | ||
|
|
||
| outputs = llm.generate({ | ||
| "prompt": prompt, | ||
| "multi_modal_data": { | ||
| "image": [fetch_image(url) for url in image_urls] | ||
| }, | ||
| }) | ||
|
|
||
| for o in outputs: | ||
| generated_text = o.outputs[0].text | ||
| print(generated_text) | ||
|
|
||
|
|
||
| def run_phi3v_chat(question: str, image_urls: List[str]): | ||
| llm = _load_phi3v(image_urls) | ||
|
|
||
| outputs = llm.chat([{ | ||
| "role": | ||
| "user", | ||
| "content": [ | ||
| { | ||
| "type": "text", | ||
| "text": question, | ||
| }, | ||
| *({ | ||
| "type": "image_url", | ||
| "image_url": { | ||
| "url": image_url | ||
| }, | ||
| } for image_url in image_urls), | ||
| ], | ||
| }]) | ||
|
|
||
| for o in outputs: | ||
| generated_text = o.outputs[0].text | ||
| print(generated_text) | ||
|
|
||
|
|
||
| def main(args: Namespace): | ||
| method = args.method | ||
|
|
||
| if method == "generate": | ||
| run_phi3v_generate(QUESTION, IMAGE_URLS) | ||
| elif method == "chat": | ||
| run_phi3v_chat(QUESTION, IMAGE_URLS) | ||
| else: | ||
| raise ValueError(f"Invalid method: {method}") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| parser = FlexibleArgumentParser( | ||
| description='Demo on using vLLM for offline inference with ' | ||
| 'vision language models that support multi-image input') | ||
| parser.add_argument("--method", | ||
| type=str, | ||
| default="generate", | ||
| choices=["generate", "chat"], | ||
| help="The method to run in `vllm.LLM`.") | ||
|
|
||
| args = parser.parse_args() | ||
| main(args) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.