DeepSpeed inference support for int8 parameters on BLOOM?

Recently, HuggingFace `transformers` has a new feature on [int8 quantization](https://github.com/huggingface/transformers/pull/17901) for all HuggingFace models. This feature could reduce the size of the large models by up to 2 without a high loss in performance. Is it possible for DeepSpeed inference to support int8 quantization for BLOOM? According to the [DeepSpeed inference tutorial](https://www.deepspeed.ai/tutorials/inference-tutorial/#datatypes-and-quantized-models), DeepSpeed inference supports fp32, fp16, and int8 parameters. But when I tried BLOOM with the [inference script](https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/scripts/inference/bloom-ds-inference.py) and changed `dtype=torch.int8` on line 194,  an error will be raised.
```
site-packages/deepspeed/runtime/weight_quantizer.py”, line 163, in model_quantize
    return quantized_module, torch.cat(all_scales)
RuntimeError: torch.cat(): expected a non-empty list of Tensors
```
Any chance on DeepSpeed inference to support int8 quantization for BLOOM?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DeepSpeed inference support for int8 parameters on BLOOM? #330

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DeepSpeed inference support for int8 parameters on BLOOM? #330

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions