Skip to content

NotImplementedError is raised in static INT8 Quantization with PT2E Backend default recipe #1984

@haitamhawa

Description

@haitamhawa

Following example from neural-compressor/docs/source/3x/PT_StaticQuant.md:

I get a crash in MinMaxObserver when calling prepare(model):
"NotImplementedError: MinMaxObserver's qscheme only support torch.per_tensor_symmetric and torch.per_tensor_affine."
The scheme requested prior to error is per_channel_symmetric.

i am using default StaticQuantConfig.

i think that per_channel_symmetric quantization is done on weights only.
And by using minmax algorithm, its simply absolute max for each channel, i'm not sure why observer was used in this case.

Environment:

  • Linux Ubuntu 22.04.1 LTS
  • Python 3.10
  • torch 2.4.0+cpu
  • neural_compressor 3.0

To reproduce:

import torch
from torch import nn
from neural_compressor.torch.export import export
from neural_compressor.torch.quantization import StaticQuantConfig, prepare, convert

def main():
    # Prepare the float model and example inputs for export model
    model = nn.Linear(5,5)
    example_inputs = (torch.rand(size=(1,5)),)


    # Export eager model into FX graph model
    exported_model = export(model=model, example_inputs=example_inputs)
    # Quantize the model
    quant_config = StaticQuantConfig()
    prepared_model = prepare(exported_model, quant_config=quant_config)
    # Calibrate
    for _ in range(100):
        prepared_model(torch.rand_like(example_inputs[0]))

    q_model = convert(prepared_model)
    # Compile the quantized model and replace the Q/DQ pattern with Q-operator
    from torch._inductor import config

    config.freezing = True
    opt_model = torch.compile(q_model)

if __name__=="__main__":
    main()

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions