anyone tests lora's throughput

I installed vllm from the lastest code.

found it supports Qwen2 series model.

I test Qwen1.8B with 16 concurrency. got the following result:

I merge the lora weight to Qwen1.8B. latency(ms)：
min: 222,  average: 400,  max:418

without merging lora weight to Qwen1.8B, using lora dynamic calling through query. 
min: 307, average: 780, max: 874

vllm lora way is much more slower than merging version? Is this okay?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

anyone tests lora's throughput #3316

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

anyone tests lora's throughput #3316

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions