-
Notifications
You must be signed in to change notification settings - Fork 282
Closed
Description
Following example from neural-compressor/docs/source/3x/PT_StaticQuant.md:
I get a crash in MinMaxObserver
when calling prepare(model)
:
"NotImplementedError: MinMaxObserver's qscheme only support torch.per_tensor_symmetric and torch.per_tensor_affine."
The scheme requested prior to error is per_channel_symmetric
.
i am using default StaticQuantConfig
.
i think that per_channel_symmetric
quantization is done on weights only.
And by using minmax
algorithm, its simply absolute max for each channel, i'm not sure why observer was used in this case.
Environment:
- Linux Ubuntu 22.04.1 LTS
- Python 3.10
- torch 2.4.0+cpu
- neural_compressor 3.0
To reproduce:
import torch
from torch import nn
from neural_compressor.torch.export import export
from neural_compressor.torch.quantization import StaticQuantConfig, prepare, convert
def main():
# Prepare the float model and example inputs for export model
model = nn.Linear(5,5)
example_inputs = (torch.rand(size=(1,5)),)
# Export eager model into FX graph model
exported_model = export(model=model, example_inputs=example_inputs)
# Quantize the model
quant_config = StaticQuantConfig()
prepared_model = prepare(exported_model, quant_config=quant_config)
# Calibrate
for _ in range(100):
prepared_model(torch.rand_like(example_inputs[0]))
q_model = convert(prepared_model)
# Compile the quantized model and replace the Q/DQ pattern with Q-operator
from torch._inductor import config
config.freezing = True
opt_model = torch.compile(q_model)
if __name__=="__main__":
main()
Metadata
Metadata
Assignees
Labels
No labels