Skip to content

API causes slowdown in batch request handling #1707

@jpeig

Description

@jpeig

Using the API server and submitting multiple prompts to take advantage of speed benefit returns the following error:

"multiple prompts in a batch is not currently supported"

What's the point of vLLM without being able to send batches to the API?

Of course, I can send multiple seperate requests, but those are handled sequentially and do not benefit from speed improvements.

Correct me if I'm wrong...

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingunstaleRecieved activity after being labelled stale

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions