Skip to content

Commit c7159c8

Browse files
authored
Add Pruning performance/accuracy data (#295)
* Add Pruning performance/accuracy data * Update doc to add supported MLPerf models and validated frameworks
1 parent f790e37 commit c7159c8

File tree

4 files changed

+592
-186
lines changed

4 files changed

+592
-186
lines changed

README.md

Lines changed: 263 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,14 @@ Click the image to enlarge it.
3333
#### Supported Frameworks
3434

3535
Supported Intel-optimized DL frameworks are:
36-
* [TensorFlow\*](https://github.com/Intel-tensorflow/tensorflow), including [1.15.0 UP3](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up3), [1.15.0 UP2](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up2), [1.15.0 UP1](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up1), [2.1.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.1.0), [2.2.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.2.0), [2.3.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.3.0), [2.4.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.4.0), [2.5.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.5.0)
36+
* [TensorFlow\*](https://github.com/Intel-tensorflow/tensorflow), including [1.15.0 UP3](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up3), [1.15.0 UP2](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up2), [1.15.0 UP1](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up1), [2.1.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.1.0), [2.2.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.2.0), [2.3.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.3.0), [2.4.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.4.0), [2.5.0](https://github.com/Intel-tensorflow/tensorflow/tree/v2.5.0), [Official TensorFlow 2.6.0](https://github.com/tensorflow/tensorflow/tree/v2.6.0)
3737

3838
> **Note**: Intel Optimized TensorFlow 2.5.0 requires to set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running LPOT quantization or deploying the quantized model.
3939
40+
> **Note**: From Official TensorFlow 2.6.0, oneDNN support has been upstreamed. User just need download official TensorFlow binary for CPU device and set environment variable TF_ENABLE_ONEDNN_OPTS=1 before running LPOT quantization or deploying the quantized model.
41+
4042
* [PyTorch\*](https://pytorch.org/), including [1.5.0+cpu](https://download.pytorch.org/whl/torch_stable.html), [1.6.0+cpu](https://download.pytorch.org/whl/torch_stable.html), [1.8.0+cpu](https://download.pytorch.org/whl/torch_stable.html)
41-
* [Apache\* MXNet](https://mxnet.apache.org), including [1.6.0](https://github.com/apache/incubator-mxnet/tree/1.6.0), [1.7.0](https://github.com/apache/incubator-mxnet/tree/1.7.0)
43+
* [Apache\* MXNet](https://mxnet.apache.org), including [1.6.0](https://github.com/apache/incubator-mxnet/tree/1.6.0), [1.7.0](https://github.com/apache/incubator-mxnet/tree/1.7.0), [1.8.0](https://github.com/apache/incubator-mxnet/tree/1.8.0)
4244
* [ONNX\* Runtime](https://github.com/microsoft/onnxruntime), including [1.6.0](https://github.com/microsoft/onnxruntime/tree/v1.6.0), [1.7.0](https://github.com/microsoft/onnxruntime/tree/v1.7.0), [1.8.0](https://github.com/microsoft/onnxruntime/tree/v1.8.0)
4345

4446

@@ -152,7 +154,7 @@ python setup.py install
152154
153155
**Deep Dive**
154156
155-
* [Quantization](docs/Quantization.md) are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. LPOT supports Post-Training Quantization ([PTQ](docs/PTQ.md)) and Quantization-Aware Training ([QAT](docs/QAT.md)). Note that ([Dynamic Quantization](docs/dynamic_quantization.md)) currently has limited support.
157+
* [Quantization](docs/Quantization.md) are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. LPOT supports Post-Training Quantization ([PTQ](docs/PTQ.md)) with [different quantization capabilities](docs/backend_quant.md) and Quantization-Aware Training ([QAT](docs/QAT.md)). Note that ([Dynamic Quantization](docs/dynamic_quantization.md)) currently has limited support.
156158
* [Pruning](docs/pruning.md) provides a common method for introducing sparsity in weights and activations.
157159
* [Benchmarking](docs/benchmark.md) introduces how to utilize the benchmark interface of LPOT.
158160
* [Mixed precision](docs/mixed_precision.md) introduces how to enable mixed precision, including BFP16 and int8 and FP32, on Intel platforms during tuning.
@@ -198,10 +200,13 @@ Intel® Low Precision Optimization Tool requires installing the pertinent Intel-
198200
</thead>
199201
<tbody>
200202
<tr>
201-
<td class="tg-nrix" rowspan="18">Cascade Lake<br><br>Cooper Lake<br><br>Skylake<br><br>Ice Lake</td>
202-
<td class="tg-nrix" rowspan="18">CentOS 8.3<br><br>Ubuntu 18.04</td>
203-
<td class="tg-nrix" rowspan="18">3.6<br><br>3.7<br><br>3.8<br><br>3.9</td>
204-
<td class="tg-cly1" rowspan="9">TensorFlow</td>
203+
<td class="tg-nrix" rowspan="20">Cascade Lake<br><br>Cooper Lake<br><br>Skylake<br><br>Ice Lake</td>
204+
<td class="tg-nrix" rowspan="20">CentOS 8.3<br><br>Ubuntu 18.04</td>
205+
<td class="tg-nrix" rowspan="20">3.6<br><br>3.7<br><br>3.8<br><br>3.9</td>
206+
<td class="tg-cly1" rowspan="10">TensorFlow</td>
207+
<td class="tg-7zrl">2.6.0</td>
208+
</tr>
209+
<tr>
205210
<td class="tg-7zrl">2.5.0</td>
206211
</tr>
207212
<tr>
@@ -243,6 +248,9 @@ Intel® Low Precision Optimization Tool requires installing the pertinent Intel-
243248
</tr>
244249
<tr>
245250
<td class="tg-cly1" rowspan="2">MXNet</td>
251+
<td class="tg-7zrl">1.8.0</td>
252+
</tr>
253+
<tr>
246254
<td class="tg-7zrl">1.7.0</td>
247255
</tr>
248256
<tr>
@@ -265,6 +273,79 @@ Intel® Low Precision Optimization Tool requires installing the pertinent Intel-
265273

266274
Intel® Low Precision Optimization Tool provides numerous examples to show promising accuracy loss with the best performance gain. A full quantized model list on various frameworks is available in the [Model List](docs/full_model_list.md).
267275

276+
#### Validated MLPerf Models
277+
278+
<table>
279+
<thead>
280+
<tr>
281+
<th>Model</th>
282+
<th>Framework</th>
283+
<th>Support</th>
284+
<th>Example</th>
285+
</tr>
286+
</thead>
287+
<tbody>
288+
<tr>
289+
<td rowspan="2">ResNet50 v1.5</td>
290+
<td>TensorFlow</td>
291+
<td>Yes</td>
292+
<td><a href="https://github.com/intel/lpot/tree/master/examples/tensorflow/image_recognition">Link</a></td>
293+
</tr>
294+
<tr>
295+
<td>PyTorch</td>
296+
<td>Yes</td>
297+
<td><a href="https://github.com/intel/lpot/tree/master/examples/pytorch/ipex/image_recognition/imagenet/cpu/ptq">Link</a></td>
298+
</tr>
299+
<tr>
300+
<td>DLRM</td>
301+
<td>PyTorch</td>
302+
<td>Yes</td>
303+
<td><a href="https://github.com/intel/lpot/tree/master/examples/pytorch/fx/recommendation">Link</a></td>
304+
</tr>
305+
<tr>
306+
<td rowspan="2">BERT-large</td>
307+
<td>TensorFlow</td>
308+
<td>Yes</td>
309+
<td><a href="https://github.com/intel/lpot/tree/master/examples/tensorflow/nlp/bert_large_squad">Link</a></td>
310+
</tr>
311+
<tr>
312+
<td>PyTorch</td>
313+
<td>Yes</td>
314+
<td><a href="https://github.com/intel/lpot/tree/master/examples/pytorch/eager/language_translation/ptq">Link</a></td>
315+
</tr>
316+
<tr>
317+
<td rowspan="2">SSD-ResNet34</td>
318+
<td>TensorFlow</td>
319+
<td>WIP</td>
320+
<td></td>
321+
</tr>
322+
<tr>
323+
<td>PyTorch</td>
324+
<td>Yes</td>
325+
<td><a href="https://github.com/intel/lpot/tree/master/examples/pytorch/fx/object_detection/ssd_resnet34/ptq">Link</a></td>
326+
</tr>
327+
<tr>
328+
<td>RNN-T</td>
329+
<td>PyTorch</td>
330+
<td>WIP</td>
331+
<td></td>
332+
</tr>
333+
<tr>
334+
<td rowspan="2">3D-UNet</td>
335+
<td>TensorFlow</td>
336+
<td>WIP</td>
337+
<td></td>
338+
</tr>
339+
<tr>
340+
<td>PyTorch</td>
341+
<td>Yes</td>
342+
<td><a href="https://github.com/intel/lpot/tree/master/examples/pytorch/eager/medical_imaging/3d-unet">Link</a></td>
343+
</tr>
344+
</tbody>
345+
</table>
346+
347+
#### Validated Quantized Models
348+
268349
<table class="docutils">
269350
<thead>
270351
<tr>
@@ -531,6 +612,181 @@ Intel® Low Precision Optimization Tool provides numerous examples to show promi
531612
</tbody>
532613
</table>
533614

615+
#### Validated Pruning Models
616+
617+
<table>
618+
<thead>
619+
<tr>
620+
<th rowspan="2">Tasks</th>
621+
<th rowspan="2">FWK</th>
622+
<th rowspan="2">Model</th>
623+
<th rowspan="2">fp32 baseline</th>
624+
<th colspan="3">gradient sensitivity with 20% sparsity</th>
625+
<th colspan="3">+onnx dynamic quantization on pruned model</th>
626+
</tr>
627+
<tr>
628+
<td>accuracy%</td>
629+
<td> drop%</td>
630+
<td>perf gain (sample/s)</td>
631+
<td>accuracy%</td>
632+
<td> drop%</td>
633+
<td>perf gain (sample/s)</td>
634+
</tr>
635+
</thead>
636+
<tbody>
637+
<tr>
638+
<td>SST-2</td>
639+
<td>pytorch</td>
640+
<td>bert-base</td>
641+
<td>accuracy = 92.32</td>
642+
<td>accuracy = 91.97</td>
643+
<td>-0.38</td>
644+
<td>1.30x</td>
645+
<td>accuracy = 92.20</td>
646+
<td>-0.13</td>
647+
<td>1.86x</td>
648+
</tr>
649+
<tr>
650+
<td>QQP</td>
651+
<td>pytorch</td>
652+
<td>bert-base</td>
653+
<td>[accuracy, f1] = [91.10, 88.05]</td>
654+
<td>[accuracy, f1] = [89.97, 86.54]</td>
655+
<td>[-1.24, -1.71]</td>
656+
<td>1.32x</td>
657+
<td>[accuracy, f1] = [89.75, 86.60]</td>
658+
<td>[-1.48, -1.65]</td>
659+
<td>1.81x</td>
660+
</tr>
661+
</tbody>
662+
</table>
663+
664+
<table>
665+
<thead>
666+
<tr>
667+
<th rowspan="2">Tasks</th>
668+
<th rowspan="2">FWK</th>
669+
<th rowspan="2">Model</th>
670+
<th rowspan="2">fp32 baseline</th>
671+
<th colspan="2">Pattern Lock on 70% Unstructured Sparsity</th>
672+
<th colspan="2">Pattern Lock on 50% 1:2 Structured Sparsity</th>
673+
</tr>
674+
<tr>
675+
<td>accuracy%</td>
676+
<td> drop%</td>
677+
<td>accuracy%</td>
678+
<td> drop%</td>
679+
</tr>
680+
</thead>
681+
<tbody>
682+
<tr>
683+
<td>MNLI</td>
684+
<td>pytorch</td>
685+
<td>bert-base</td>
686+
<td>[m, mm] = [84.57, 84.79]</td>
687+
<td>[m, mm] = [82.45, 83.27]</td>
688+
<td>[-2.51, -1.80]</td>
689+
<td>[m, mm] = [83.20, 84.11]</td>
690+
<td>[-1.62, -0.80]</td>
691+
</tr>
692+
<tr>
693+
<td>SST-2</td>
694+
<td>pytorch</td>
695+
<td>bert-base</td>
696+
<td>accuracy = 92.32</td>
697+
<td>accuracy = 91.51</td>
698+
<td>-0.88</td>
699+
<td>accuracy = 92.20</td>
700+
<td>-0.13</td>
701+
</tr>
702+
<tr>
703+
<td>QQP</td>
704+
<td>pytorch</td>
705+
<td>bert-base</td>
706+
<td>[accuracy, f1] = [91.10, 88.05]</td>
707+
<td>[accuracy, f1] = [90.48, 87.06]</td>
708+
<td>[-0.68, -1.12]</td>
709+
<td>[accuracy, f1] = [90.92, 87.78]</td>
710+
<td>[-0.20, -0.31]</td>
711+
</tr>
712+
<tr>
713+
<td>QNLI</td>
714+
<td>pytorch</td>
715+
<td>bert-base</td>
716+
<td>accuracy = 91.54</td>
717+
<td>accuracy = 90.39</td>
718+
<td>-1.26</td>
719+
<td>accuracy = 90.87</td>
720+
<td>-0.73</td>
721+
</tr>
722+
<tr>
723+
<td>QnA</td>
724+
<td>pytorch</td>
725+
<td>bert-base</td>
726+
<td>[em, f1] = [79.34, 87.10]</td>
727+
<td>[em, f1] = [77.27, 85.75]</td>
728+
<td>[-2.61, -1.54]</td>
729+
<td>[em, f1] = [78.03, 86.50]</td>
730+
<td>[-1.65, -0.69]</td>
731+
</tr>
732+
</tbody>
733+
</table>
734+
735+
<table>
736+
<thead>
737+
<tr>
738+
<th>Framework</th>
739+
<th>Model</th>
740+
<th>fp32 baseline</th>
741+
<th>Compression</th>
742+
<th>dataset</th>
743+
<th>acc(drop)%</th>
744+
</tr>
745+
</thead>
746+
<tbody>
747+
<tr>
748+
<td>Pytorch</td>
749+
<td>resnet18</td>
750+
<td>69.76</td>
751+
<td>30% sparsity on magnitude</td>
752+
<td>ImageNet</td>
753+
<td>69.47(-0.42)</td>
754+
</tr>
755+
<tr>
756+
<td>Pytorch</td>
757+
<td>resnet18</td>
758+
<td>69.76</td>
759+
<td>30% sparsity on gradient sensitivity</td>
760+
<td>ImageNet</td>
761+
<td>68.85(-1.30)</td>
762+
</tr>
763+
<tr>
764+
<td>Pytorch</td>
765+
<td>resnet50</td>
766+
<td>76.13</td>
767+
<td>30% sparsity on magnitude</td>
768+
<td>ImageNet</td>
769+
<td>76.11(-0.03)</td>
770+
</tr>
771+
<tr>
772+
<td>Pytorch</td>
773+
<td>resnet50</td>
774+
<td>76.13</td>
775+
<td>30% sparsity on magnitude and post training quantization</td>
776+
<td>ImageNet</td>
777+
<td>76.01(-0.16)</td>
778+
</tr>
779+
<tr>
780+
<td>Pytorch</td>
781+
<td>resnet50</td>
782+
<td>76.13</td>
783+
<td>30% sparsity on magnitude and quantization aware training</td>
784+
<td>ImageNet</td>
785+
<td>75.90(-0.30)</td>
786+
</tr>
787+
</tbody>
788+
</table>
789+
534790
## Additional Content
535791

536792
* [Release Information](releases_info.md)

docs/backend_quant.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ This document provides a quantization support matrix for the following framework
3737
+ Symmetric Quantization
3838
+ int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
3939
+ Asymmetric Quantization
40-
+ uint8: scale = (rmax - rmin) / (max(uint8) - min(uint8)); zero_point = min(uint8) - round(rmin / scale)
40+
+ uint8: scale = (rmax - rmin) / (max(uint8) - min(uint8)); zero_point = min(uint8) - round(rmin / scale)
4141

4242
### Reference
4343
+ oneDNN: [Lower Numerical Precision Deep Learning Inference and Training](https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html)

0 commit comments

Comments
 (0)