[Usage]: vLLM's performance significantly slows down in the newer version when using multi-LoRA inference

### Test Conditions
- **Dataset:** Alpaca (Chinese)
- **Total Requests:** 500
- **Request Rate:** 16 req/s
- **Model:** Qwen2.5-32B-Instruct
- **Hardware:** A100_40G × 2

### Results
I modified the benchmark code to run the Alpaca dataset. The benchmark duration (in seconds) is as follows:

| vLLM Version  | 0.5.0|0.5.4 | 0.6.2 |
| ------------- | ----- | ----- | ----- |
| Only Base          | 49.33| 48.47 | 37.93 |
| Base with LoRA | 58.96 | 71.81 | 83.56 |

The results indicate that vLLM's performance significantly slows down in the newer version (0.6.2) when using multi-LoRA inference.


### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: vLLM's performance significantly slows down in the newer version when using multi-LoRA inference #9452

Test Conditions

Results

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: vLLM's performance significantly slows down in the newer version when using multi-LoRA inference #9452

Description

Test Conditions

Results

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions