-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Closed
Labels
performancePerformance-related issuesPerformance-related issuesstaleOver 90 days of inactivityOver 90 days of inactivity
Description
Proposal to improve performance
We used the same GPU on two machines but different CPUs. The following experimental conclusions were drawn:
Experimental results: The GPU is 3090, and the CPU was upgraded from Xeon Gold 6240 to i9-12900k. The impact is as follows.
a. vLLM achieved a 3.8x speedup in the agent scenario.
b. TGi achieved a 1.23x speedup in the agent scenario.
c. vLLM still has latency issues, but the time has been reduced to 100ms (previously 300ms).
e. GPU utilization has increased from 70% to 90%.
From the stress test data, it is evident that vLLM heavily relies on the performance of the CPU.
What are the main factors affecting CPU performance, and how can they be optimized?
Metadata
Metadata
Assignees
Labels
performancePerformance-related issuesPerformance-related issuesstaleOver 90 days of inactivityOver 90 days of inactivity