-
Notifications
You must be signed in to change notification settings - Fork 228
Closed
Description
Recently, HuggingFace transformers has a new feature on int8 quantization for all HuggingFace models. This feature could reduce the size of the large models by up to 2 without a high loss in performance. Is it possible for DeepSpeed inference to support int8 quantization for BLOOM? According to the DeepSpeed inference tutorial, DeepSpeed inference supports fp32, fp16, and int8 parameters. But when I tried BLOOM with the inference script and changed dtype=torch.int8 on line 194, an error will be raised.
site-packages/deepspeed/runtime/weight_quantizer.py”, line 163, in model_quantize
return quantized_module, torch.cat(all_scales)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
Any chance on DeepSpeed inference to support int8 quantization for BLOOM?
Metadata
Metadata
Assignees
Labels
No labels