Skip to content

[Performance]: The impact of CPU on vLLM performance is significant. #8147

@skylee-01

Description

@skylee-01

Proposal to improve performance

We used the same GPU on two machines but different CPUs. The following experimental conclusions were drawn:
Experimental results: The GPU is 3090, and the CPU was upgraded from Xeon Gold 6240 to i9-12900k. The impact is as follows.
a. vLLM achieved a 3.8x speedup in the agent scenario.
b. TGi achieved a 1.23x speedup in the agent scenario.
c. vLLM still has latency issues, but the time has been reduced to 100ms (previously 300ms).
e. GPU utilization has increased from 70% to 90%.

From the stress test data, it is evident that vLLM heavily relies on the performance of the CPU.
What are the main factors affecting CPU performance, and how can they be optimized?

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issuesstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions