Add outlier in AWQ test cases #3106

namgyu-youn · 2025-10-01T04:18:12Z

Summary:
Similar to #3101 , this test inject outliers to reflect input distribution. Toy model is updated to use direct dtype, device callsite

Test plan:
pytest -sv test/prototype/test_awq.py

pytorch-bot · 2025-10-01T04:18:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3106

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Driver update on H100 and A100 instances

✅ No Failures

As of commit 1764d26 with merge base d407246 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

namgyu-youn · 2025-10-01T04:18:57Z

@Xia-Weiwen could you please review this PR?

Xia-Weiwen

LGTM

jerryzh168 · 2025-10-08T01:42:38Z

@namgyu-youn if you have time, it will be very helpful if you can help us debug why the unittest doesn't work for hqq: here we are using the default, not hqq currently:

ao/test/prototype/test_awq.py

Line 63 in 2fe0ca0

Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d"),

just need to update / or add a test for Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")

namgyu-youn · 2025-10-08T16:39:02Z

@namgyu-youn if you have time, it will be very helpful if you can help us debug why the unittest doesn't work for hqq: here we are using the default, not hqq currently:

ao/test/prototype/test_awq.py

Line 63 in 2fe0ca0

Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d"),

just need to update / or add a test for Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")

Just want to make it clear, could you explain how mixed (AWQ + HQQ) PTQ works at a high level? They look like opposite sides: calibration-based vs. calibration-free. Is it work like:

Scale weight & activation (AWQ with calibration)
HQQ simulation: Compute error
(loop 1->2)
Quantization with best scale using real HQQ

?

btw, SINQ (https://huggingface.co/papers/2509.22944) looks very interesting. It is a calibration-free PTQ like HQQ, and uses 2-axis (row/column) scale factor to efficient outlier observation

jerryzh168 · 2025-10-08T17:09:48Z

@namgyu-youn if you have time, it will be very helpful if you can help us debug why the unittest doesn't work for hqq: here we are using the default, not hqq currently:

ao/test/prototype/test_awq.py

Line 63 in 2fe0ca0

Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d"),

just need to update / or add a test for Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")

Just want to make it clear, could you explain how mixed (AWQ + HQQ) PTQ works at a high level? They look like opposite sides: calibration-based vs. calibration-free. Is it work like:

Scale weight & activation (AWQ with calibration)

HQQ simulation: Compute error
(loop 1->2)

Quantization with best scale using real HQQ

?

btw, SINQ (huggingface.co/papers/2509.22944, arxiv.org/abs/2509.22944) looks very interesting I feel. SINQ is a calibration-free PTQ like HQQ. and uses a 2-axis (row/column) scale factor to efficient outlier observation

oh so hqq just replaces the original naive choose_affine_qparams with its own algorithm to choose_qparams, see

ao/torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py

Line 148 in 28612d0

if int4_choose_qparams_algorithm == Int4ChooseQParamsAlgorithm.HQQ:

and awq is applying an additional pre_scaling_factor on top of the scale/zero_point for quantization

so the process is:

# AWQ loop to search for pre_scale factor for activation and weight
for act_pre_scale in scale_factor_options:
    weight = original_weight * act_pre_scale
    qw = quantize_with_hqq(weight)
    activation = original_activation / act_pre_scale
    awq_result = F.linear(activation, qw)
    ref_result = F.linear(original_activation, original_weight)
    loss = loss_fn(ref_result, awq_result)
    ... # find scale that give the best loss

I think it's a problem because in principle awq should not regress the metric that we are measuring:

ao/test/prototype/test_awq.py

Lines 130 to 131 in 28612d0

    
           loss_awq = (ref_out - awq_out).pow(2).mean().item() 
        
           loss_base = (ref_out - baseline_out).pow(2).mean().item()

jerryzh168 · 2025-10-08T17:12:37Z

SINQ seems to be pretty new and looks like it has good improvement in lower bits, contributions are welcome as well

namgyu-youn · 2025-10-08T17:24:58Z

oh so hqq just replaces the original naive choose_affine_qparams with its own algorithm to choose_qparams, see

ao/torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py

Line 148 in 28612d0

if int4_choose_qparams_algorithm == Int4ChooseQParamsAlgorithm.HQQ:

and awq is applying an additional pre_scaling_factor on top of the scale/zero_point for quantization

so the process is:
# AWQ loop to search for pre_scale factor for activation and weight
for act_pre_scale in scale_factor_options:
    weight = original_weight * act_pre_scale
    qw = quantize_with_hqq(weight)
    activation = original_activation / act_pre_scale
    awq_result = F.linear(activation, qw)
    ref_result = F.linear(original_activation, original_weight)
    loss = loss_fn(ref_result, awq_result)
    ... # find scale that give the best loss
I think it's a problem because in principle awq should not regress the metric that we are measuring:

ao/test/prototype/test_awq.py

Lines 130 to 131 in 28612d0

loss_awq = (ref_out - awq_out).pow(2).mean().item()

loss_base = (ref_out - baseline_out).pow(2).mean().item()

Thanks for the context. Not sure, but assuming FBGEMM dispatch error; let me take a look more

namgyu-youn · 2025-10-08T17:32:03Z

SINQ seems to be pretty new and looks like it has good improvement in lower bits, contributions are welcome as well

Yeah SINQ (https://github.com/huawei-csl/SINQ) is pretty new, but 2-axis scale factor (fig1) looked pretty cool in my first look.

If we can reuse weight-only api, 2-axis scale factor seems to be a choice in my thought. But not now yet, because I am learning Marlin implementation.

Add outlier in AWQ test cases

1764d26

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 1, 2025

Xia-Weiwen approved these changes Oct 2, 2025

View reviewed changes

jerryzh168 approved these changes Oct 8, 2025

View reviewed changes

jerryzh168 added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Oct 8, 2025

jerryzh168 merged commit 2d31ac3 into pytorch:main Oct 8, 2025
18 of 20 checks passed

namgyu-youn deleted the awq-test branch October 8, 2025 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add outlier in AWQ test cases #3106

Add outlier in AWQ test cases #3106

Uh oh!

namgyu-youn commented Oct 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

namgyu-youn commented Oct 1, 2025

Uh oh!

Xia-Weiwen left a comment

Uh oh!

jerryzh168 commented Oct 8, 2025

Uh oh!

namgyu-youn commented Oct 8, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Oct 8, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Oct 8, 2025

Uh oh!

Uh oh!

namgyu-youn commented Oct 8, 2025 •

edited

Loading

Uh oh!

namgyu-youn commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add outlier in AWQ test cases #3106

Add outlier in AWQ test cases #3106

Uh oh!

Conversation

namgyu-youn commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3106

❗ 1 Active SEVs

✅ No Failures

Uh oh!

namgyu-youn commented Oct 1, 2025

Uh oh!

Xia-Weiwen left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Oct 8, 2025

Uh oh!

namgyu-youn commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 commented Oct 8, 2025

Uh oh!

Uh oh!

namgyu-youn commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

namgyu-youn commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

namgyu-youn commented Oct 1, 2025 •

edited

Loading

pytorch-bot bot commented Oct 1, 2025 •

edited

Loading

namgyu-youn commented Oct 8, 2025 •

edited

Loading

jerryzh168 commented Oct 8, 2025 •

edited

Loading

namgyu-youn commented Oct 8, 2025 •

edited

Loading

namgyu-youn commented Oct 8, 2025 •

edited

Loading