Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from vllm.config import KVTransferConfig
from vllm.multimodal.utils import encode_image_base64

MODEL_NAME = "Qwen/Qwen2.5-VL-3B-Instruct"
MODEL_NAME = "RedHatAI/Qwen2.5-VL-3B-Instruct-quantized.w4a16"
Copy link
Member

@tlrmchlsmth tlrmchlsmth Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this model also breaks the CI unfortunately:

[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683] ValueError: Failed to find a kernel that can implement the WNA16 linear layer. Reasons:
[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683] MacheteLinearKernel requires capability 90, current compute  capability is 89
[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683]  AllSparkLinearKernel cannot implement due to: For Ampere GPU, AllSpark does not support group_size = 128. Only group_size = -1 are supported.
[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683]  MarlinLinearKernel cannot implement due to: Weight output_size_per_partition = 6840 is not divisible by  min_thread_n = 64. Consider reducing tensor_parallel_size or running with --quantization gptq.
[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683]  Dynamic4bitLinearKernel cannot implement due to: Only CPU is supported
[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683]  BitBLASLinearKernel cannot implement due to: bitblas is not installed. Please install bitblas by running `pip install bitblas>=0.1.0`
[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683]  ConchLinearKernel cannot implement due to: conch-triton-kernels is not installed, please install it via `pip install conch-triton-kernels` and try again!
[2025-08-03T23:44:26Z] (EngineCore_0 pid=14405) ERROR 08-03 16:44:26 [core.py:683]  ExllamaLinearKernel cannot implement due to: Exllama only supports float16 activations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we change the device running this test?


SAMPLING_PARAMS = SamplingParams(temperature=0.0, top_k=1, max_tokens=128)

Expand Down Expand Up @@ -130,6 +130,8 @@ def test_shared_storage_connector_hashes(tmp_path):
model=MODEL_NAME,
max_model_len=8192,
max_num_seqs=1,
gpu_memory_utilization=0.4,
enforce_eager=True,
kv_transfer_config=kv_transfer_config,
limit_mm_per_prompt={"image": 2},
)
Expand Down