-
Notifications
You must be signed in to change notification settings - Fork 344
Add outlier in AWQ test cases #3106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3106
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 1764d26 with merge base d407246 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@Xia-Weiwen could you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@namgyu-youn if you have time, it will be very helpful if you can help us debug why the unittest doesn't work for hqq: here we are using the default, not hqq currently: Line 63 in 2fe0ca0
just need to update / or add a test for |
Just want to make it clear, could you explain how mixed (AWQ + HQQ) PTQ works at a high level? They look like opposite sides: calibration-based vs. calibration-free. Is it work like:
? btw, SINQ (https://huggingface.co/papers/2509.22944) looks very interesting. It is a calibration-free PTQ like HQQ, and uses 2-axis (row/column) scale factor to efficient outlier observation |
oh so hqq just replaces the original naive choose_affine_qparams with its own algorithm to choose_qparams, see ao/torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py Line 148 in 28612d0
and awq is applying an additional pre_scaling_factor on top of the scale/zero_point for quantization so the process is:
I think it's a problem because in principle awq should not regress the metric that we are measuring: Lines 130 to 131 in 28612d0
|
SINQ seems to be pretty new and looks like it has good improvement in lower bits, contributions are welcome as well |
Thanks for the context. Not sure, but assuming FBGEMM dispatch error; let me take a look more |
Yeah SINQ (https://github.com/huawei-csl/SINQ) is pretty new, but 2-axis scale factor (fig1) looked pretty cool in my first look. ![]() If we can reuse weight-only api, 2-axis scale factor seems to be a choice in my thought. But not now yet, because I am learning Marlin implementation. |
Summary:
Similar to #3101 , this test inject outliers to reflect input distribution. Toy model is updated to use direct dtype, device callsite
Test plan:
pytest -sv test/prototype/test_awq.py