API causes slowdown in batch request handling

Using the API server and submitting multiple prompts to take advantage of speed benefit returns the following error:

"multiple prompts in a batch is not currently supported"

What's the point of vLLM without being able to send batches to the API?

Of course, I can send multiple seperate requests, but those are handled sequentially and do not benefit from speed improvements.

Correct me if I'm wrong...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API causes slowdown in batch request handling #1707

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API causes slowdown in batch request handling #1707

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions