Test Conditions
- Dataset: Alpaca (Chinese)
- Total Requests: 500
- Request Rate: 16 req/s
- Model: Qwen2.5-32B-Instruct
- Hardware: A100_40G × 2
Results
I modified the benchmark code to run the Alpaca dataset. The benchmark duration (in seconds) is as follows:
vLLM Version |
0.5.0 |
0.5.4 |
0.6.2 |
Only Base |
49.33 |
48.47 |
37.93 |
Base with LoRA |
58.96 |
71.81 |
83.56 |
The results indicate that vLLM's performance significantly slows down in the newer version (0.6.2) when using multi-LoRA inference.
Before submitting a new issue...