-
Notifications
You must be signed in to change notification settings - Fork 345
Add torchao_convert to PARQ's QuantOptimizer #2947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2947
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 4cfa628 with merge base b99904b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
d593e74
to
8e39be5
Compare
3bdfd2a
to
01e14bd
Compare
a3c4e94
to
e608880
Compare
Thanks! Overall looks great! Left some comments. |
1beacfa
to
f352a85
Compare
b0f15c8
to
38cf423
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side, will let @metascroy do a final pass/stamp
Overall, I think it looks great! Approving PR, but address the comments above before landing. |
ec5e1f7
to
01582ae
Compare
Thanks for the comments @andrewor14 @metascroy :) I think I've addressed them all now, but let me know if I missed anything! |
e33264a
to
a854d79
Compare
a854d79
to
1ed5b32
Compare
block_size = (1, group_size) | ||
target_dtype = torch.int8 | ||
q_args = (weight, mapping_type, block_size, target_dtype, config.b) | ||
if config.version == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this it's probably fine to break BC soon since it's a prototype feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. I kept the version=1 convention for this config to preserve the old functionality (StretchedAffineQuantizedTensor). The new version=2 will convert tensors to IntxUnpackedToInt8Tensor, which is consistent with other config classes like IntxWeightOnlyConfig
@metascroy Should we remove StretchedAffineQuantizedTensor entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objections to removing it. I assumed you wanted it for Int4CPULayout, but if you don't need that, I think it's fine to remove.
On ExecuTorch side, IntxUnpackedToInt8 tensor will be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right about CPU support. I'll leave it for now but consider removing it in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int4CPULayout is also moved to v2 already: #2845 you can set int4_packing_format
to plain_int32
to get it
The original plan was to manually replace each weight tensor with an instance of
IntxUnpackedToInt8Tensor
. However, I found that this already happens inIntxWeightOnlyConfig
andInt8DynamicActivationIntxWeightConfig
when they are initialized withversion=2
.quantize_
once per regularized param_group inQuantOptimizer.torchao_convert
StretchedUnifTorchaoQuantizer
, I fetch the qparams, scale, zero_point to initializeIntxUnpackedToInt8Tensor
(as suggested by Scott)The PR also adds the
num_steps
property toQuantOptimizer
to mirror D81526700.