You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
model = AutoModelForCausalLM.from_pretrained("{your_username}/opt-125m-gptq", device_map="auto", quantization_config= gptq_config)
133
142
```
134
143
135
144
Note that only 4-bit models are supported for now. Furthermore, it is recommended to deactivate the exllama kernels if you are finetuning a quantized model with peft.
136
145
146
+
You can find the benchmark of these kernels [here](https://github.com/huggingface/optimum/tree/main/tests/benchmark#gptq-benchmark)
137
147
#### Fine-tune a quantized model
138
148
139
149
With the official support of adapters in the Hugging Face ecosystem, you can fine-tune models that have been quantized with GPTQ.
Copy file name to clipboardExpand all lines: src/transformers/modeling_utils.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -2759,7 +2759,7 @@ def from_pretrained(
2759
2759
logger.warning(
2760
2760
"You passed `quantization_config` to `from_pretrained` but the model you're loading already has a "
2761
2761
"`quantization_config` attribute and has already quantized weights. However, loading attributes"
2762
-
" (e.g. disable_exllama, use_cuda_fp16, max_input_length) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored."
2762
+
" (e.g. disable_exllama, use_cuda_fp16, max_input_length, use_exllama_v2) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored."
f"You need optimum > 1.13.2 and auto-gptq > 0.4.2 . Make sure to have that version installed - detected version : optimum {optimum_version} and autogptq {autogptq_version}"
434
+
)
435
+
self.disable_exllama=True
436
+
logger.warning("You have activated exllamav2 kernels. Exllama kernels will be disabled.")
437
+
ifnotself.disable_exllama:
438
+
logger.warning(
439
+
"""You have activated exllama backend. Note that you can get better inference
440
+
speed using exllamav2 kernel by setting `use_exllama_v2=True`.`disable_exllama` will be deprecated
0 commit comments