-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Closed
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity
Description
🚀 The feature, motivation and pitch
I would like to be able to use in-flight BNB quantization of Mixtral models like mistralai/Mixtral-8x7B-Instruct-v0.1
with --quantization bitsandbytes
. It currently doesn't work on vllm 0.8.4.
Alternatives
No response
Additional context
Since #2673 a hacky workaround has been implemented to be able to use quantization with Mixtral models here
bitsandbytes
is not part of the list of mixtral-supported quantization methods, so vllm falls back to the QuantMixtralForCausalLM
(MixtralForCausalLM
from mixtral_quant.py
) implementation which doesn't rely on fused moe. But weights loading with BitsAndBytes later fails here with
AttributeError: Model MixtralForCausalLM does not support BitsAndBytes quantization yet. No 'packed_modules_mapping' found.
MixtralForCausalLM
from mixtral_quant.py
does indeed not have the packed_modules_mapping
attribute.
I tried two things:
- adding
packed_modules_mapping
attribute similar to those ofMixtralForCausalLM
class inQuantMixtralForCausalLM
=> it now fails here, I'm not sure why - adding
bitsandbytes
to the list of mixtral-supported quantization methods => it fails here becauseFusedMoE
is not aLinearBase
layer
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity