Skip to content

anyone tests lora's throughput #3316

@white-wolf-tech

Description

@white-wolf-tech

I installed vllm from the lastest code.

found it supports Qwen2 series model.

I test Qwen1.8B with 16 concurrency. got the following result:

I merge the lora weight to Qwen1.8B. latency(ms):
min: 222, average: 400, max:418

without merging lora weight to Qwen1.8B, using lora dynamic calling through query.
min: 307, average: 780, max: 874

vllm lora way is much more slower than merging version? Is this okay?

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions