-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Closed
Labels
staleOver 90 days of inactivityOver 90 days of inactivity
Description
I installed vllm from the lastest code.
found it supports Qwen2 series model.
I test Qwen1.8B with 16 concurrency. got the following result:
I merge the lora weight to Qwen1.8B. latency(ms):
min: 222, average: 400, max:418
without merging lora weight to Qwen1.8B, using lora dynamic calling through query.
min: 307, average: 780, max: 874
vllm lora way is much more slower than merging version? Is this okay?
white-wolf-techwhite-wolf-tech
Metadata
Metadata
Assignees
Labels
staleOver 90 days of inactivityOver 90 days of inactivity