Skip to content

Conversation

Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Mar 10, 2025

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@Isotr0py Isotr0py requested review from jeejeelee and mgoin March 10, 2025 08:56
Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this!

I do agree the solution is a bit involved rn, but perhaps we can find a way to simplify it a bit..
Apart from the comments I left, I would also:

  • add a test for unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit
  • make sure VLLM_ATTENTION_BACKEND=XFORMERS python -m pytest -v tests/models/encoder_decoder/language/test_bart.py is also still working (not the case for me locally rn)

Signed-off-by: Isotr0py <[email protected]>
@Isotr0py
Copy link
Member Author

Isotr0py commented Mar 10, 2025

add a test for unsloth/Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit

Mllama is quite large for testing, and we won't test it by running inference on CI. I think testing it on whisper or bart might be a better selection.

Signed-off-by: Isotr0py <[email protected]>
@NickLucche
Copy link
Collaborator

Sure but we already have tests for it, though they're guarded by the 48gb requirement so they won't run on L4, still it is useful to run them locally with a single command.

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also cc @mgoin

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 11, 2025 15:02
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2025
@vllm-bot vllm-bot merged commit e392d85 into vllm-project:main Mar 12, 2025
46 of 48 checks passed
@Isotr0py Isotr0py deleted the refactor-x-qkv branch March 12, 2025 03:40
richardsliu pushed a commit to richardsliu/vllm that referenced this pull request Mar 14, 2025
…B 4-bit quantization (vllm-project#14545)

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Richard Liu <[email protected]>
@gshtras
Copy link
Collaborator

gshtras commented Mar 21, 2025

This PR has a regression, breaking support for amd/Llama-3.2-11B-Vision-Instruct-FP8-KV quantized models. @Isotr0py

@Isotr0py
Copy link
Member Author

This PR has a regression, breaking support for amd/Llama-3.2-11B-Vision-Instruct-FP8-KV quantized models.

Oh, fp8 need to call process_weights_after_loading, let me think a method to handle this for QKVCrossParallelLinear.

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…B 4-bit quantization (vllm-project#14545)

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Louis Ulmer <[email protected]>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants