You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/getting_started/quickstart.rst
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,10 +138,10 @@ Since this server is compatible with OpenAI API, you can use it as a drop-in rep
138
138
139
139
A more detailed client example can be found `here <https://github.com/vllm-project/vllm/blob/main/examples/openai_completion_client.py>`__.
140
140
141
-
OpenAI Chat API with vLLM
142
-
~~~~~~~~~~~~~~~~~~~~~~~~~~
141
+
OpenAI Chat Completions API with vLLM
142
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143
143
144
-
vLLM is designed to also support the OpenAI Chat API. The chat interface is a more dynamic, interactive way to communicate with the model, allowing back-and-forth exchanges that can be stored in the chat history. This is useful for tasks that require context or more detailed explanations.
144
+
vLLM is designed to also support the OpenAI Chat Completions API. The chat interface is a more dynamic, interactive way to communicate with the model, allowing back-and-forth exchanges that can be stored in the chat history. This is useful for tasks that require context or more detailed explanations.
145
145
146
146
You can use the `create chat completion <https://platform.openai.com/docs/api-reference/chat/completions/create>`_ endpoint to interact with the model:
147
147
@@ -157,7 +157,7 @@ You can use the `create chat completion <https://platform.openai.com/docs/api-re
157
157
$ ]
158
158
$ }'
159
159
160
-
Alternatively, you can use the `openai` python package:
160
+
Alternatively, you can use the ``openai`` python package:
Since OpenAI Vision API is based on `Chat Completions <https://platform.openai.com/docs/api-reference/chat>`_ API,
188
+
Since OpenAI Vision API is based on `Chat Completions API <https://platform.openai.com/docs/api-reference/chat>`_,
189
189
a chat template is **required** to launch the API server.
190
190
191
191
Although Phi-3.5-Vision comes with a chat template, for other models you may have to provide one if the model's tokenizer does not come with it.
@@ -243,6 +243,10 @@ To consume the server, you can use the OpenAI client like in the example below:
243
243
244
244
A full code example can be found in `examples/openai_api_client_for_multimodal.py <https://github.com/vllm-project/vllm/blob/main/examples/openai_api_client_for_multimodal.py>`_.
245
245
246
+
.. tip::
247
+
There is no need to place image placeholders in the text content of the API request - they are already represented by the image content.
248
+
In fact, you can place image placeholders in the middle of the text by interleaving text and image content.
249
+
246
250
.. note::
247
251
248
252
By default, the timeout for fetching images through http url is ``5`` seconds. You can override this by setting the environment variable:
@@ -251,5 +255,49 @@ A full code example can be found in `examples/openai_api_client_for_multimodal.p
251
255
252
256
$ export VLLM_IMAGE_FETCH_TIMEOUT=<timeout>
253
257
254
-
.. note::
255
-
There is no need to format the prompt in the API request since it will be handled by the server.
258
+
Chat Embeddings API
259
+
^^^^^^^^^^^^^^^^^^^
260
+
261
+
vLLM's Chat Embeddings API is a superset of OpenAI's `Embeddings API <https://platform.openai.com/docs/api-reference/embeddings>`_,
262
+
where a list of ``messages`` can be passed instead of batched ``inputs``. This enables multi-modal inputs to be passed to embedding models.
263
+
264
+
.. tip::
265
+
The schema of ``messages`` is exactly the same as in Chat Completions API.
266
+
267
+
In this example, we will serve the ``TIGER-Lab/VLM2Vec-Full`` model.
Please see the [OpenAI API Reference](https://platform.openai.com/docs/api-reference) for more information on the API. We support all parameters except:
30
-
- Chat: `tools`, and `tool_choice`.
31
-
- Completions: `suffix`.
32
29
33
-
vLLM also provides experimental support for OpenAI Vision API compatible inference. See more details in [Using VLMs](../models/vlm.rst).
-[Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Using VLMs](../models/vlm.rst).
36
+
-*Note: `image_url.detail` parameter is not supported.*
37
+
- We also support `audio_url` content type for audio files.
38
+
- Refer to [vllm.entrypoints.chat_utils](https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/chat_utils.py) for the exact schema.
39
+
-*TODO: Support `input_audio` content type as defined [here](https://github.com/openai/openai-python/blob/v1.52.2/src/openai/types/chat/chat_completion_content_part_input_audio_param.py).*
40
+
-*Note: `parallel_tool_calls` and `user` parameters are ignored.*
0 commit comments