[Feature]: Inflight BNB quantization for Mixtral models

### 🚀 The feature, motivation and pitch

I would like to be able to use in-flight BNB quantization of Mixtral models like `mistralai/Mixtral-8x7B-Instruct-v0.1` with `--quantization bitsandbytes`. It currently doesn't work on vllm 0.8.4.

### Alternatives

_No response_

### Additional context

Since https://github.com/vllm-project/vllm/pull/2673 a hacky workaround has been implemented to be able to use quantization with Mixtral models [here](https://github.com/vllm-project/vllm/blob/v0.8.4/vllm/model_executor/model_loader/utils.py#L88-L97)

`bitsandbytes` is not part of the list of mixtral-supported quantization methods, so vllm falls back to the `QuantMixtralForCausalLM` (`MixtralForCausalLM`  from `mixtral_quant.py`) implementation which doesn't rely on fused moe. But weights loading with BitsAndBytes later fails [here](https://github.com/vllm-project/vllm/blob/v0.8.4/vllm/model_executor/model_loader/loader.py#L1131) with 

```
AttributeError: Model MixtralForCausalLM does not support BitsAndBytes quantization yet. No 'packed_modules_mapping' found.
```

`MixtralForCausalLM` from `mixtral_quant.py` does indeed not have the `packed_modules_mapping` attribute.

I tried two things:
- adding `packed_modules_mapping` attribute similar to those of `MixtralForCausalLM` class in `QuantMixtralForCausalLM` => it now fails [here](https://github.com/vllm-project/vllm/blob/v0.8.4/vllm/model_executor/layers/linear.py#L311), I'm not sure why
- adding `bitsandbytes` to the list of mixtral-supported quantization methods => it fails [here](https://github.com/vllm-project/vllm/blob/v0.8.4/vllm/model_executor/layers/quantization/bitsandbytes.py#L123) because `FusedMoE` is not a `LinearBase` layer

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Inflight BNB quantization for Mixtral models #17199

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Inflight BNB quantization for Mixtral models #17199

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions