Your current environment
VLLM 0.6.1.post2
🐛 Describe the bug
I used a model from a hub with AWQ quantization, so it's already quantized. I loaded it with a half data type, and it performs really fast. However, when I loaded the base model and let VLLM handle bitsandbytes quantization, the performance was significantly slower compared to the AWQ model imported directly from the hub.
Any idea?