| 
 | 1 | +# NXP eIQ Neutron Quantization  | 
 | 2 | + | 
 | 3 | +The eIQ Neutron NPU requires the operators delegated to be quantized. To quantize the PyTorch model for the Neutron backend, use the `NeutronQuantizer` from `backends/nxp/quantizer/neutron_quantizer.py`.  | 
 | 4 | +The `NeutronQuantizer` is configured to quantize the model with quantization scheme supported by the eIQ Neutron NPU.  | 
 | 5 | + | 
 | 6 | +### Supported Quantization Schemes  | 
 | 7 | + | 
 | 8 | +The Neutron delegate supports the following quantization schemes:  | 
 | 9 | + | 
 | 10 | +- Static quantization with 8-bit symmetric weights and 8-bit asymmetric activations (via the PT2E quantization flow), per-tensor granularity.  | 
 | 11 | +    - Following operators are supported at this moment:   | 
 | 12 | +      - `aten.abs.default`  | 
 | 13 | +      - `aten.adaptive_avg_pool2d.default`  | 
 | 14 | +      - `aten.addmm.default`  | 
 | 15 | +      - `aten.add.Tensor`  | 
 | 16 | +      - `aten.avg_pool2d.default`  | 
 | 17 | +      - `aten.cat.default`  | 
 | 18 | +      - `aten.conv1d.default`  | 
 | 19 | +      - `aten.conv2d.default`  | 
 | 20 | +      - `aten.dropout.default`  | 
 | 21 | +      - `aten.flatten.using_ints`  | 
 | 22 | +      - `aten.hardtanh.default`  | 
 | 23 | +      - `aten.hardtanh_.default`  | 
 | 24 | +      - `aten.linear.default`  | 
 | 25 | +      - `aten.max_pool2d.default`  | 
 | 26 | +      - `aten.mean.dim`  | 
 | 27 | +      - `aten.pad.default`  | 
 | 28 | +      - `aten.permute.default`  | 
 | 29 | +      - `aten.relu.default` and `aten.relu_.default`  | 
 | 30 | +      - `aten.reshape.default`  | 
 | 31 | +      - `aten.view.default`  | 
 | 32 | +      - `aten.softmax.int`  | 
 | 33 | +      - `aten.tanh.default`,  `aten.tanh_.default`  | 
 | 34 | +      - `aten.sigmoid.default`  | 
 | 35 | + | 
 | 36 | +### Static 8-bit Quantization Using the PT2E Flow  | 
 | 37 | + | 
 | 38 | +To perform 8-bit quantization with the PT2E flow, perform the following steps prior to exporting the model to edge:  | 
 | 39 | + | 
 | 40 | +1) Create an instance of the `NeutronQuantizer` class.  | 
 | 41 | +2) Use `torch.export.export` to export the model to ATen Dialect.  | 
 | 42 | +3) Call `prepare_pt2e` with the instance of the `NeutronQuantizer` to annotate the model with observers for quantization.  | 
 | 43 | +4) As static quantization is required, run the prepared model with representative samples to calibrate the quantized tensor activation ranges.  | 
 | 44 | +5) Call `convert_pt2e` to quantize the model.  | 
 | 45 | +6) Export and lower the model using the standard flow.  | 
 | 46 | + | 
 | 47 | +The output of `convert_pt2e` is a PyTorch model which can be exported and lowered using the normal flow. As it is a regular PyTorch model, it can also be used to evaluate the accuracy of the quantized model using standard PyTorch techniques.  | 
 | 48 | + | 
 | 49 | +```python  | 
 | 50 | +import torch  | 
 | 51 | +import torchvision.models as models  | 
 | 52 | +from torchvision.models.mobilenetv2 import MobileNet_V2_Weights  | 
 | 53 | +from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer  | 
 | 54 | +from executorch.backends.nxp.neutron_partitioner import NeutronPartitioner  | 
 | 55 | +from executorch.backends.nxp.nxp_backend import generate_neutron_compile_spec  | 
 | 56 | +from executorch.exir import to_edge_transform_and_lower  | 
 | 57 | +from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e  | 
 | 58 | + | 
 | 59 | +model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()  | 
 | 60 | +sample_inputs = (torch.randn(1, 3, 224, 224), )  | 
 | 61 | + | 
 | 62 | +quantizer = NeutronQuantizer() # (1)  | 
 | 63 | + | 
 | 64 | +training_ep = torch.export.export(model, sample_inputs).module() # (2)  | 
 | 65 | +prepared_model = prepare_pt2e(training_ep, quantizer) # (3)  | 
 | 66 | + | 
 | 67 | +for cal_sample in [torch.randn(1, 3, 224, 224)]: # Replace with representative model inputs  | 
 | 68 | +	prepared_model(cal_sample) # (4) Calibrate  | 
 | 69 | + | 
 | 70 | +quantized_model = convert_pt2e(prepared_model) # (5)  | 
 | 71 | + | 
 | 72 | +compile_spec = generate_neutron_compile_spec(  | 
 | 73 | +    "imxrt700",  | 
 | 74 | +    operators_not_to_delegate=None,  | 
 | 75 | +    neutron_converter_flavor="SDK_25_06",  | 
 | 76 | +)  | 
 | 77 | + | 
 | 78 | +et_program = to_edge_transform_and_lower( # (6)  | 
 | 79 | +    torch.export.export(quantized_model, sample_inputs),  | 
 | 80 | +    partitioner=[NeutronPartitioner(compile_spec=compile_spec)],  | 
 | 81 | +).to_executorch()  | 
 | 82 | +```  | 
 | 83 | + | 
 | 84 | +See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information.  | 
0 commit comments