-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Open
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity
Description
🚀 The feature, motivation and pitch
#10235 Add support to concurrently calculate prefills of multiple requests, which allows to reduce the TTFT of shorter requests, while other longer requests also run on the same vLLM server.
This feature is currently only supported in V0, but it would be very useful for me in V1, as V0 is about to deprecated.
This feature should also solve #21495.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity