Skip to content

Commit c7f995b

Browse files
committed
Squashed commit of the following:
commit 23fe77e Author: Uri Livne <[email protected]> Date: Sun Aug 11 19:01:44 2024 +0300 [SW-193273] Merge from public github to gerrit Merged from INC public master branch, top commit 7056720 Change-Id: I3c016ab98973ac56fc976e5b15a678e91a59291e commit f02e9bd Author: Asaf Karnieli <[email protected]> Date: Tue Aug 13 15:23:33 2024 +0300 [ALGO-801] add additional mark_step in qdq due to difference in results Change-Id: Ia7adaa70afb4f2990686fdb242d6a8f651fc2986 commit 775d5a2 Author: Roi Tiefenbrunn <[email protected]> Date: Sun Aug 11 09:52:04 2024 +0300 [SW-174155] Fix race condition bug when reading scales Implement an inter-process reader-writer lock Implement locking mechanism at save_file/load_file Change-Id: I140fdc05814286796bb47e6be8170b2ae9dd5154 commit a529cf4 Author: Asaf Karnieli <[email protected]> Date: Sun Aug 11 12:39:20 2024 +0300 [ALGO-801] Add Fake Quant option in linear and matmul layers Change-Id: I9888c92ffc33035f75d434044f4ef41b58f51e62 commit 09c6312 Author: Uri Livne <[email protected]> Date: Mon Aug 12 10:42:44 2024 +0300 [SW-192770] Remove regression detection script It is maintained by QA in other path Change-Id: Ie343575e0a6da28681283541847ad9541e209e30 commit ac48710 Author: Roi Tiefenbrunn <[email protected]> Date: Tue Aug 6 14:06:52 2024 +0300 [SW-195526] Rename LOG_LEVEL_HQT to LOG_LEVEL_INC Rename 'HQT' occurrences in fp8_tests.py and logger.py Change-Id: Ibbf314410de627f98a54d2230bf8db72aca0c40a commit c7aa37c Author: Roi Tiefenbrunn <[email protected]> Date: Tue Aug 6 15:02:38 2024 +0300 [SW-195525] INC Logger: Support ENABLE_CONSOLE values 1/0 Add support for values '1' and '0' for 'ENABLE_CONSOLE' env var Change-Id: I53f71250d7a74d2a8050aa1722b75acaebef0c4c commit b42b018 Author: yan tomsinsky <[email protected]> Date: Mon Aug 5 13:30:07 2024 +0300 [SW-195483] Remove hard coded strings from FP8 config in INC Change-Id: I1f58b74ab07eda93739b4e6c8be5041ac2beb714 commit c6af377 Author: Roi Tiefenbrunn <[email protected]> Date: Mon Aug 5 15:49:03 2024 +0300 [SW-194203] Add flag to recalculate scales Add support for conf 'recalc_scales' in Fp8cfg::parse Remove 'recalc_scales' parameter from get_config in scale.py - insead read from hqt config Change-Id: Ie5fe693e8dfdab850fcf3647049fda2880f20ba2 commit 55e1387 Author: Roi Tiefenbrunn <[email protected]> Date: Thu Aug 1 17:02:10 2024 +0300 [SW-186675] Update default configuration of 'allowlist' Defined default allowlist types to be empty - allows quantization of all models Refactor parse function to more dynamic code and consistency Change-Id: I6c8a14cb7ca6830927e5c5b7476e4b03335456aa commit 3f1d5c0 Author: Eran Geva <[email protected]> Date: Sun Aug 4 10:56:03 2024 +0300 [SW-192999] bump the inc version to 3.0 Change-Id: I2236780a613cd7102fa16618bc24aaca0d2f5d86 commit c19fcbd Author: Nir David <[email protected]> Date: Thu Aug 1 19:48:57 2024 +0300 Adjust INC to run from vLLM with old PA Change-Id: Ifdea6840aaa22791f478ad10788e5d47fd4a0394 commit ff114b7 Author: Roi Tiefenbrunn <[email protected]> Date: Tue Jul 30 13:15:12 2024 +0300 [SW-194748] Switch tester.py framework from using HQT to using INC Switch every call to HQT package to use INC instead Change-Id: I2f2dd4e6d6029aeb73fa2f70e7978aecfdccc65e commit 7949907 Author: Eran Geva <[email protected]> Date: Mon Jul 29 15:53:59 2024 +0300 [SW-194599] fix setup.py get_build_version Change-Id: I22ab530d88a2f37802859a7f3434e6395390566a commit ad0625b Author: yan tomsinsky <[email protected]> Date: Tue Jul 9 12:00:57 2024 +0300 [SW-189684] Add description to functions in HQT Change-Id: Id5822a21abd1f60f28999574c2ca0e89acc70bf6 commit 7bf9521 Author: Roi Tiefenbrunn <[email protected]> Date: Mon Jul 29 10:08:53 2024 +0300 [SW-193263] Switch HQT unit tests to run on INC Modify test to point to the correct package in INC instead of HQT. Add __init__.py file to include needed content for test_layers' tests. Change-Id: If47acdfc9f7521a54a7f350a444711a7c2b3e5b2 commit a5b6ef8 Author: Uri Livne <[email protected]> Date: Sun Jul 28 13:34:04 2024 +0300 [SW-184689] Adjust correct condition for one step flow align to 1.17 Change-Id: I588680b463a9f8304d95863306b6d5b2503e6e62 commit ae9d934 Author: xinhe3 <[email protected]> Date: Tue Jul 16 09:16:50 2024 +0300 [SW-192931] align setup.py with github INC and remove fp8_convert Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f Signed-off-by: xinhe3 <[email protected]> commit a92d70a Author: xinhe3 <[email protected]> Date: Tue Jul 16 06:16:34 2024 +0300 [SW-192917] Update all HQT logic files with pre-commit check Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42 Signed-off-by: xinhe3 <[email protected]> (cherry picked from commit 099e984) Signed-off-by: xinhe3 <[email protected]> commit 56a1a7e Author: xinhe3 <[email protected]> Date: Thu Jul 18 05:19:42 2024 +0300 [SW-193292] align INC pt requierments with OHF requieremnt (peft==0.11.1) Change-Id: I55961ff8265177b7916870d9884350af2bb7542f Signed-off-by: xinhe3 <[email protected]> (cherry picked from commit aa26f16) commit 3f61954 Author: Witold Szczurek <[email protected]> Date: Mon Jul 22 14:51:03 2024 +0300 [SW-187215] Add valid_seq_len feature to patched SDPA module Change-Id: Ia627fe8134470d68a7e55fc978a972bb7f7b3d5b commit 039af39 Author: Nir David <[email protected]> Date: Thu Jul 25 12:18:23 2024 +0300 [SW-194200] Save scale file only with new scales Change-Id: I14a4ef94d188b13c2fbf4ea77d2b42cb5bd6d952 commit 4f8b257 Author: Zhou Yuwen <[email protected]> Date: Mon Jul 15 09:02:41 2024 +0000 [SW-192809] fix json_file bug when instantiating FP8Config class Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc Signed-off-by: Zhou Yuwen <[email protected]> (cherry picked from commit dc4b5f5) commit 3572617 Author: Nir David <[email protected]> Date: Thu Jul 25 10:30:10 2024 +0300 [SW-194177] - Integrate new vllm-PA algo with HQT Change-Id: I94c9679f0aff7c2f9a86a802da825bfd6d0772ad commit 5e3a679 Author: Dudi Lester <[email protected]> Date: Thu Jul 11 15:15:58 2024 +0300 [SW-191415] update fp8 maxAbs observer using torch.copy_ Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f commit 7f62871 Author: Asaf Karnieli <[email protected]> Date: Sun Jul 21 11:45:02 2024 +0300 [ALGO-790] add GPTQ quantization support for Gaudi Change-Id: I00ac0c6d2263e1dde3b86b019f84671188f1b482 commit abaa038 Author: Uri Livne <[email protected]> Date: Thu Jul 11 12:41:09 2024 +0300 [SW-192358] Remove HQT reference in INC Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e commit 56c03d8 Author: yan tomsinsky <[email protected]> Date: Tue Jul 9 12:31:07 2024 +0300 [SW-190303] Implement HPUWeightOnlyLinear class in INC Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41 commit 969f467 Author: Zhou Yuwen <[email protected]> Date: Wed Jun 12 18:49:17 2024 -0700 [SW-184943] Enhance INC WOQ model loading - Support loading huggingface WOQ model - Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses - Load woq linear weight module by module - Save hpu format tensor to reuse it once load it again Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf commit 6404b06 Author: xinhe3 <[email protected]> Date: Tue Jul 9 11:32:29 2024 +0300 [SW-191945] align requirement_pt.txt in gerrit INC with Github INC Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef Signed-off-by: xinhe3 <[email protected]> commit 7e1e78f Author: Uri Livne <[email protected]> Date: Tue Jul 9 22:30:50 2024 +0300 [SW-184689] use finalize_calibration intrenaly for one step flow Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89 commit 997bf9b Author: Uri Livne <[email protected]> Date: Mon Jul 8 11:29:04 2024 +0300 [SW-190899] Install packages according to configuration Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525 commit 1ed690c Author: Uri Livne <[email protected]> Date: Sun Jun 23 11:54:59 2024 +0300 [SW-187731] Save orig module as member of patched module This allows direct usage of the original module methods, which solves torch compile issue Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632 commit adfe13b Author: smarkovichgolan <[email protected]> Date: Wed Jul 3 18:09:30 2024 +0300 Fix errors in regression_detection Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438 commit 222402e Author: Danny Semiat <[email protected]> Date: Thu Jun 20 12:27:17 2024 +0300 [SW-177468] Removed unused code + cleanup Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638 commit 7329e4f Author: Uri Livne <[email protected]> Date: Sun Jul 7 18:23:30 2024 +0300 [SW-184714] Add internal folder to fp8 quant This is a folder used for experiments, not to be used by users Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc commit da4bcd2 Author: Uri Livne <[email protected]> Date: Sat Jul 6 20:06:08 2024 +0300 [SW-184714] Port HQT code into INC HQT lib content was copied as is under fp8_quant Tests were copied to 3.x torch location Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9 commit 768c2a4 Author: Uri Livne <[email protected]> Date: Wed Jul 3 17:22:02 2024 +0300 [SW-191317] Raise exception according to hqt config object Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a commit 52a98f4 Author: Uri Livne <[email protected]> Date: Wed Jun 19 15:05:12 2024 +0300 [SW-189361] Fix white list extend Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3 commit abd570b Author: Zhou Yuwen <[email protected]> Date: Wed May 22 07:39:06 2024 +0000 [SW-177474] add HQT FP8 porting code Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd Signed-off-by: Zhou Yuwen <[email protected]> commit 254de6d Author: Ron Ben Moshe <[email protected]> Date: Thu Jun 6 10:58:15 2024 +0300 [SW-183320]updated setup.py Change-Id: I592af89486cb1d9e0b5197521c428920197a9103 commit f23f1fa Author: yan tomsinsky <[email protected]> Date: Sun May 19 16:39:09 2024 +0300 [SW-184941] INC CI, CD and Promotion Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52 commit d7ad2d1 Author: Uri Livne <[email protected]> Date: Wed Apr 24 19:47:28 2024 +0300 [SW-181785] Remove torch from INC requierments Change-Id: I469c5b2ae3b1ff5369fa555fd1bcea193ec02211 commit 31d8bb9 Author: Wang, Mengni <[email protected]> Date: Tue Jun 11 15:28:40 2024 +0800 Add UT and remove unused code for torch MX quant (#1854) * Add UT and remove unused code for torch MX quant --------- Change-Id: I2727aa716fa99467fa2d63b966de4d88470e4bb3 Signed-off-by: Mengni Wang <[email protected]> Signed-off-by: xinhe3 <[email protected]>
1 parent 2bb257e commit c7f995b

File tree

37 files changed

+1506
-224
lines changed

37 files changed

+1506
-224
lines changed

examples/fp8_sample/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
### Usage demo:
2+
3+
#### two steps to get quantized model
4+
5+
```diff
6+
import torch
7+
+ from neural_compressor.torch.quantization import FP8Config, convert, prepare, finalize_calibration
8+
import habana_frameworks.torch.core as htcore
9+
10+
class M(torch.nn.Module):
11+
def __init__(self) -> None:
12+
super().__init__()
13+
self.fc1 = torch.nn.Linear(10, 5)
14+
self.fc2 = torch.nn.Linear(5, 10)
15+
16+
def forward(self, inp):
17+
x1 = self.fc1(inp)
18+
x2 = self.fc2(x1)
19+
return x2
20+
21+
model = M().eval()
22+
23+
+ config = FP8Config.from_json_file(args.quant_config) # args.quant_config is the path of json file
24+
25+
+ if config.measure:
26+
+ model = prepare(model, config)
27+
28+
+ if config.quantize:
29+
+ htcore.hpu_initialize()
30+
+ model = convert(model, config)
31+
32+
# user code run
33+
with torch.no_grad():
34+
model.to("hpu")
35+
output = model(torch.randn(1, 10).to("hpu"))
36+
print(output)
37+
38+
+ if config.measure:
39+
+ finalize_calibration(model)
40+
```
41+
42+
43+
Whole script and config refer to [sample_two_steps.py](./sample_two_steps.py), [maxabs_measure.json](./maxabs_measure.json) and [maxabs_quant.json](./maxabs_quant.json).
44+
45+
First, measure the tensor quantization statistic:
46+
```shell
47+
python sample_two_steps.py --quant_config=maxabs_measure.json
48+
```
49+
50+
Then quantize the model based on previous measurements:
51+
```shell
52+
python sample_two_steps.py --quant_config=maxabs_quant.json
53+
```
54+
55+
#### one step to get quantized model
56+
57+
```diff
58+
import torch
59+
+ from neural_compressor.torch.quantization import FP8Config, convert, prepare, finalize_calibration
60+
import habana_frameworks.torch.core as htcore
61+
62+
class M(torch.nn.Module):
63+
def __init__(self) -> None:
64+
super().__init__()
65+
self.fc1 = torch.nn.Linear(10, 5)
66+
self.fc2 = torch.nn.Linear(5, 10)
67+
68+
def forward(self, inp):
69+
x1 = self.fc1(inp)
70+
x2 = self.fc2(x1)
71+
return x2
72+
73+
model = M().to("hpu")
74+
75+
+ config = FP8Config.from_json_file(args.quant_config) # args.quant_config is the path of json file
76+
+ model = prepare(model, config)
77+
78+
# user code run to do calibration
79+
with torch.no_grad():
80+
output = model(torch.randn(1, 10).to("hpu"))
81+
print(output)
82+
83+
+ finalize_calibration(model)
84+
+ model = convert(model)
85+
86+
# user code to run benchmark for quantized model
87+
with torch.no_grad():
88+
output = model(torch.randn(1, 10).to("hpu"))
89+
print(output)
90+
```
91+
92+
Whole script and config refer to [sample_one_step.py](./sample_one_step.py).
93+
94+
```shell
95+
python sample_one_step.py --quant_config=quant_config.json
96+
```
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"mode": "MEASURE",
3+
"observer": "maxabs",
4+
"allowlist": {"types": [], "names": []},
5+
"blocklist": {"types": [], "names": []},
6+
"dump_stats_path": "./hqt_output/measure"
7+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"mode": "QUANTIZE",
3+
"observer": "maxabs",
4+
"scale_method": "maxabs_hw",
5+
"allowlist": {"types": [], "names": []},
6+
"blocklist": {"types": [], "names": []},
7+
"dump_stats_path": "./hqt_output/measure"
8+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"mode": "AUTO",
3+
"observer": "maxabs",
4+
"scale_method": "maxabs_hw",
5+
"allowlist": {"types": [], "names": []},
6+
"blocklist": {"types": [], "names": []},
7+
"dump_stats_path": "./hqt_output/measure"
8+
}
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import argparse
2+
import torch
3+
import habana_frameworks.torch.core as htcore
4+
htcore.hpu_set_env()
5+
6+
from neural_compressor.torch.quantization import FP8Config, convert, finalize_calibration, prepare
7+
8+
torch.manual_seed(1)
9+
10+
11+
# 1. python sample_one_step.py --quant_config=quant_config.json
12+
13+
14+
class M(torch.nn.Module):
15+
def __init__(self) -> None:
16+
super().__init__()
17+
self.fc1 = torch.nn.Linear(10, 5)
18+
self.fc2 = torch.nn.Linear(5, 10)
19+
20+
def forward(self, inp):
21+
x1 = self.fc1(inp)
22+
x2 = self.fc2(x1)
23+
return x2
24+
25+
26+
def eval_func(model):
27+
# user's eval func
28+
input = torch.randn(1, 10)
29+
model(input.to("hpu"))
30+
31+
32+
if __name__ == "__main__":
33+
parser = argparse.ArgumentParser(
34+
description="Habana FP8 sample code.", formatter_class=argparse.ArgumentDefaultsHelpFormatter
35+
)
36+
parser.add_argument("--quant_config", type=str, help="json file of quantization config")
37+
args = parser.parse_args()
38+
39+
model = M().eval().to("hpu")
40+
htcore.hpu_initialize()
41+
42+
config = FP8Config.from_json_file(args.quant_config)
43+
model = prepare(model, config)
44+
45+
# for calibration
46+
with torch.no_grad():
47+
# model.to("hpu")
48+
output = model(torch.randn(1, 10).to("hpu"))
49+
50+
model = convert(model)
51+
print(model)
52+
53+
# for benchmark
54+
with torch.no_grad():
55+
output = model(torch.randn(1, 10).to("hpu"))
56+
print(output)
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
import argparse
2+
import torch
3+
import habana_frameworks.torch.core as htcore
4+
htcore.hpu_set_env()
5+
6+
from neural_compressor.torch.quantization import FP8Config, convert, finalize_calibration, prepare
7+
8+
torch.manual_seed(1)
9+
10+
# 1. python sample_two_steps.py --quant_config=maxabs_measure.json
11+
# 2. python sample_two_steps.py --quant_config=maxabs_quant.json
12+
13+
14+
class M(torch.nn.Module):
15+
def __init__(self) -> None:
16+
super().__init__()
17+
self.fc1 = torch.nn.Linear(10, 5)
18+
self.fc2 = torch.nn.Linear(5, 10)
19+
20+
def forward(self, inp):
21+
x1 = self.fc1(inp)
22+
x2 = self.fc2(x1)
23+
return x2
24+
25+
26+
if __name__ == "__main__":
27+
parser = argparse.ArgumentParser(
28+
description="Habana FP8 sample code.", formatter_class=argparse.ArgumentDefaultsHelpFormatter
29+
)
30+
parser.add_argument("--quant_config", type=str, help="json file of quantization config")
31+
args = parser.parse_args()
32+
33+
model = M().eval()
34+
config = FP8Config.from_json_file(args.quant_config)
35+
36+
if config.measure:
37+
model = prepare(model, config)
38+
39+
if config.quantize:
40+
htcore.hpu_initialize()
41+
model = convert(model, config)
42+
print(model)
43+
44+
with torch.no_grad():
45+
model.to("hpu")
46+
output = model(torch.randn(1, 10).to("hpu"))
47+
print(output)
48+
49+
if config.measure:
50+
finalize_calibration(model)
Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,23 @@
1-
#!/usr/bin/env python
1+
#
22
# -*- coding: utf-8 -*-
33
#
4+
<<<<<<<< HEAD:neural_compressor/evaluation/hf_eval/hf_datasets/__init__.py
45
# Copyright (c) 2022 Intel Corporation
6+
========
7+
# Copyright (c) 2018 Intel Corporation
8+
>>>>>>>> 23fe77ec31ed8ef87e5b0717d7ab41eb0b34afc8:examples/3.x_api/tensorflow/semantic_image_segmentation/3dunet-mlperf/quantization/ptq/nnUNet/__init__.py
59
#
610
# Licensed under the Apache License, Version 2.0 (the "License");
711
# you may not use this file except in compliance with the License.
812
# You may obtain a copy of the License at
913
#
10-
# http://www.apache.org/licenses/LICENSE-2.0
14+
# http://www.apache.org/licenses/LICENSE-2.0
1115
#
1216
# Unless required by applicable law or agreed to in writing, software
1317
# distributed under the License is distributed on an "AS IS" BASIS,
1418
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1519
# See the License for the specific language governing permissions and
1620
# limitations under the License.
21+
#
22+
23+
#

0 commit comments

Comments
 (0)