Skip to content

Conversation

namgyu-youn
Copy link
Contributor

@namgyu-youn namgyu-youn commented Oct 1, 2025

Summary:
Similar to #3101 , this test inject outliers to reflect input distribution. Toy model is updated to use direct dtype, device callsite

Test plan:
pytest -sv test/prototype/test_awq.py

Copy link

pytorch-bot bot commented Oct 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3106

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 1764d26 with merge base d407246 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 1, 2025
@namgyu-youn
Copy link
Contributor Author

@Xia-Weiwen could you please review this PR?

Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jerryzh168
Copy link
Contributor

@namgyu-youn if you have time, it will be very helpful if you can help us debug why the unittest doesn't work for hqq: here we are using the default, not hqq currently:

Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d"),

just need to update / or add a test for Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")

@jerryzh168 jerryzh168 added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Oct 8, 2025
@namgyu-youn
Copy link
Contributor Author

namgyu-youn commented Oct 8, 2025

@namgyu-youn if you have time, it will be very helpful if you can help us debug why the unittest doesn't work for hqq: here we are using the default, not hqq currently:

Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d"),

just need to update / or add a test for Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")

Just want to make it clear, could you explain how mixed (AWQ + HQQ) PTQ works at a high level? They look like opposite sides: calibration-based vs. calibration-free. Is it work like:

  1. Scale weight & activation (AWQ with calibration)
  2. HQQ simulation: Compute error
    (loop 1->2)
  3. Quantization with best scale using real HQQ

?

btw, SINQ (https://huggingface.co/papers/2509.22944) looks very interesting. It is a calibration-free PTQ like HQQ, and uses 2-axis (row/column) scale factor to efficient outlier observation

@jerryzh168
Copy link
Contributor

jerryzh168 commented Oct 8, 2025

@namgyu-youn if you have time, it will be very helpful if you can help us debug why the unittest doesn't work for hqq: here we are using the default, not hqq currently:

Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d"),

just need to update / or add a test for Int4WeightOnlyConfig(group_size=128, int4_packing_format="tile_packed_to_4d", int4_choose_qparams_algorithm="hqq")

Just want to make it clear, could you explain how mixed (AWQ + HQQ) PTQ works at a high level? They look like opposite sides: calibration-based vs. calibration-free. Is it work like:

  1. Scale weight & activation (AWQ with calibration)
  2. HQQ simulation: Compute error
    (loop 1->2)
  3. Quantization with best scale using real HQQ

?

btw, SINQ (huggingface.co/papers/2509.22944, arxiv.org/abs/2509.22944) looks very interesting I feel. SINQ is a calibration-free PTQ like HQQ. and uses a 2-axis (row/column) scale factor to efficient outlier observation

oh so hqq just replaces the original naive choose_affine_qparams with its own algorithm to choose_qparams, see

if int4_choose_qparams_algorithm == Int4ChooseQParamsAlgorithm.HQQ:

and awq is applying an additional pre_scaling_factor on top of the scale/zero_point for quantization

so the process is:

# AWQ loop to search for pre_scale factor for activation and weight
for act_pre_scale in scale_factor_options:
    weight = original_weight * act_pre_scale
    qw = quantize_with_hqq(weight)
    activation = original_activation / act_pre_scale
    awq_result = F.linear(activation, qw)
    ref_result = F.linear(original_activation, original_weight)
    loss = loss_fn(ref_result, awq_result)
    ... # find scale that give the best loss

I think it's a problem because in principle awq should not regress the metric that we are measuring:

loss_awq = (ref_out - awq_out).pow(2).mean().item()
loss_base = (ref_out - baseline_out).pow(2).mean().item()

@jerryzh168
Copy link
Contributor

SINQ seems to be pretty new and looks like it has good improvement in lower bits, contributions are welcome as well

@jerryzh168 jerryzh168 merged commit 2d31ac3 into pytorch:main Oct 8, 2025
18 of 20 checks passed
@namgyu-youn
Copy link
Contributor Author

namgyu-youn commented Oct 8, 2025

oh so hqq just replaces the original naive choose_affine_qparams with its own algorithm to choose_qparams, see

if int4_choose_qparams_algorithm == Int4ChooseQParamsAlgorithm.HQQ:

and awq is applying an additional pre_scaling_factor on top of the scale/zero_point for quantization

so the process is:

# AWQ loop to search for pre_scale factor for activation and weight
for act_pre_scale in scale_factor_options:
    weight = original_weight * act_pre_scale
    qw = quantize_with_hqq(weight)
    activation = original_activation / act_pre_scale
    awq_result = F.linear(activation, qw)
    ref_result = F.linear(original_activation, original_weight)
    loss = loss_fn(ref_result, awq_result)
    ... # find scale that give the best loss

I think it's a problem because in principle awq should not regress the metric that we are measuring:

loss_awq = (ref_out - awq_out).pow(2).mean().item()
loss_base = (ref_out - baseline_out).pow(2).mean().item()

Thanks for the context. Not sure, but assuming FBGEMM dispatch error; let me take a look more

@namgyu-youn
Copy link
Contributor Author

namgyu-youn commented Oct 8, 2025

SINQ seems to be pretty new and looks like it has good improvement in lower bits, contributions are welcome as well

Yeah SINQ (https://github.com/huawei-csl/SINQ) is pretty new, but 2-axis scale factor (fig1) looked pretty cool in my first look.

image

If we can reuse weight-only api, 2-axis scale factor seems to be a choice in my thought. But not now yet, because I am learning Marlin implementation.

@namgyu-youn namgyu-youn deleted the awq-test branch October 8, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants