-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Frontend] speed up import time of vllm.config #18036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
c1b18c2
to
6921702
Compare
This pull request has merge conflicts that must be resolved before it can be |
bb4c8d5
to
054f562
Compare
@aarnphm this is ready for review, thanks! cc @simon-mo @Chen-0210 |
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit hesitant to optimize this file lazily, given that this touches a lot of components within vLLM.
Also let's try to reduce some hint change to minimum.
This PR will requires running the whole suite to make sure it won't introduce any regression.
This pull request has merge conflicts that must be resolved before it can be |
This pull request has merge conflicts that must be resolved before it can be |
1ec7328
to
6a65c3d
Compare
This pull request has merge conflicts that must be resolved before it can be |
Head branch was pushed to by a user without write access
by changing submodules to lazily import expensive modules like `vllm.model_executor.layers.quantization` or only importing them for type checkers when not used during runtime. contributes to vllm-project#14924 Signed-off-by: David Xia <[email protected]>
@aarnphm thanks for reviewing again. I rebased away the conflict and fix the pre-commit Python formatting check. All checks pass now and ready for another review. 🙏 |
On my M1 Mac with 64GB memory, py312, editable install of vllm following these docs before with master commit 3443aaf
after with master commit 7108934
~2.15% speed up in the average import times ((4.332-4.239)÷4.332) |
by changing some modules in
vllm/multimodal
to lazily import expensive modules liketransformers
or only importing them for type checkers when not used during runtime.contributes to #14924
I ran on main branch
python -X importtime -c 'import vllm' 2> import.log && tuna import.log
. The visualized call tree showsvllm.config
accounts for the majority of the total import time at 55.5%.On this branch,
vllm.config
's share decreased to 52.5%.python -c 'import vllm'
on a Google Compute Engine
a2-highgpu-1g
(12 vCPUs, 85 GB Memory) instance with 1 A100 GPU~3% decrease in mean time
before (main branch commit 94d8ec8)
after (my PR commit 054f562)