Skip to content

Conversation

davidxia
Copy link
Contributor

@davidxia davidxia commented May 13, 2025

by changing some modules in vllm/multimodal to lazily import expensive modules like transformers or only importing them for type checkers when not used during runtime.

contributes to #14924

I ran on main branch python -X importtime -c 'import vllm' 2> import.log && tuna import.log. The visualized call tree shows vllm.config accounts for the majority of the total import time at 55.5%.

image

On this branch, vllm.config's share decreased to 52.5%.

image

python -c 'import vllm'

on a Google Compute Engine a2-highgpu-1g (12 vCPUs, 85 GB Memory) instance with 1 A100 GPU

~3% decrease in mean time

before (main branch commit 94d8ec8)

$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 100
Benchmark 1: python -c "import vllm"
  Time (mean ± σ):      9.643 s ±  0.140 s    [User: 10.393 s, System: 1.913 s]
  Range (min … max):    9.485 s … 10.084 s    100 runs

after (my PR commit 054f562)

$ hyperfine 'python -c "import vllm"' --warmup 3 --runs 100
Benchmark 1: python -c "import vllm"
  Time (mean ± σ):      9.355 s ±  0.096 s    [User: 10.284 s, System: 1.740 s]
  Range (min … max):    9.205 s …  9.711 s    100 runs

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@davidxia davidxia force-pushed the patch4 branch 3 times, most recently from c1b18c2 to 6921702 Compare May 15, 2025 03:36
Copy link

mergify bot commented May 15, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @davidxia.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 15, 2025
@mergify mergify bot removed the needs-rebase label May 15, 2025
@davidxia davidxia marked this pull request as ready for review May 15, 2025 22:36
@davidxia davidxia force-pushed the patch4 branch 4 times, most recently from bb4c8d5 to 054f562 Compare May 21, 2025 18:26
@davidxia
Copy link
Contributor Author

@aarnphm this is ready for review, thanks! cc @simon-mo @Chen-0210

Copy link

mergify bot commented May 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @davidxia.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant to optimize this file lazily, given that this touches a lot of components within vLLM.

Also let's try to reduce some hint change to minimum.

This PR will requires running the whole suite to make sure it won't introduce any regression.

Copy link

mergify bot commented May 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @davidxia.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link

mergify bot commented May 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @davidxia.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 29, 2025
@mergify mergify bot removed the needs-rebase label May 29, 2025
@davidxia davidxia force-pushed the patch4 branch 2 times, most recently from 1ec7328 to 6a65c3d Compare May 29, 2025 13:04
Copy link

mergify bot commented Jun 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @davidxia.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 1, 2025
@aarnphm aarnphm enabled auto-merge (squash) June 22, 2025 03:13
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 22, 2025
auto-merge was automatically disabled June 23, 2025 23:13

Head branch was pushed to by a user without write access

by changing submodules to lazily import expensive modules like
`vllm.model_executor.layers.quantization` or only importing them for type
checkers when not used during runtime.

contributes to vllm-project#14924

Signed-off-by: David Xia <[email protected]>
@davidxia
Copy link
Contributor Author

@aarnphm thanks for reviewing again. I rebased away the conflict and fix the pre-commit Python formatting check. All checks pass now and ready for another review. 🙏

@aarnphm aarnphm merged commit 7108934 into vllm-project:main Jun 25, 2025
68 checks passed
@davidxia davidxia deleted the patch4 branch June 25, 2025 11:29
@davidxia
Copy link
Contributor Author

On my M1 Mac with 64GB memory, py312, editable install of vllm following these docs

before with master commit 3443aaf

$ hyperfine 'python -c "import vllm.config"' --warmup 10 --runs 100
Benchmark 1: python -c "import vllm.config"
  Time (mean ± σ):      4.332 s ±  0.088 s    [User: 5.246 s, System: 2.792 s]
  Range (min … max):    4.151 s …  4.748 s    100 runs

after with master commit 7108934

$ hyperfine 'python -c "import vllm.config"' --warmup 10 --runs 100
Benchmark 1: python -c "import vllm.config"
  Time (mean ± σ):      4.239 s ±  0.040 s    [User: 5.104 s, System: 2.981 s]
  Range (min … max):    4.145 s …  4.389 s    100 runs

~2.15% speed up in the average import times ((4.332-4.239)÷4.332)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants