[Performance]: The impact of CPU on vLLM performance is significant.

### Proposal to improve performance

We used the same GPU on two machines but different CPUs. The following experimental conclusions were drawn:
Experimental results: The GPU is 3090, and the CPU was upgraded from Xeon Gold 6240 to i9-12900k. The impact is as follows.
a. vLLM achieved a 3.8x speedup in the agent scenario.
b. TGi achieved a 1.23x speedup in the agent scenario.
c. vLLM still has latency issues, but the time has been reduced to 100ms (previously 300ms).
e. GPU utilization has increased from 70% to 90%.

From the stress test data, it is evident that vLLM heavily relies on the performance of the CPU. 
What are the main factors affecting CPU performance, and how can they be optimized?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: The impact of CPU on vLLM performance is significant. #8147

Proposal to improve performance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: The impact of CPU on vLLM performance is significant. #8147

Description

Proposal to improve performance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions