Skip to content

Conversation

danielpatrickhug
Copy link
Contributor

This PR is for the purpose of migrating several quantization classes in GPTQ.py to the new quantization.quantize_linear module before deleting the legacy GPTQ module following PR #914. This PR also handles changing the imports and README to correctly mirror the new structure as well as updating and removing tests in both test_qat and test_quant_api. @HDCharles @andrewor14 @jerryzh168 assisted and advised on how to execute this migration PR.

Copy link

pytorch-bot bot commented Oct 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1115

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures

As of commit aa105d6 with merge base 629e142 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 18, 2024
)

from torchao.quantization.GPTQ import (
from torchao.quantization.quantize_linear import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be bc breaking since it's used by other repos like torchchat, so we'd probably want to keep backward compatibility by still keeping the old import path

@andrewor14
Copy link
Contributor

Hi Daniel, thanks for cleaning this up. It looks great to me overall! I just had some comments about maintaining BC to give us time to migrate some known use cases.

def test_8da4w_quantizer(self):
from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer
from torchao.quantization.GPTQ import Int8DynActInt4WeightLinear
from torchao.quantization.quantize_linear import Int8DynActInt4WeightLinear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we call this quantized_linear instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe even _quantized_linear to discourage people from using it


# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this file and keep the legacy imports for now? It will help us migrate some known use cases. E.g. something like

from torchao.quantization.quantized_linear import (
    Int4WeightOnlyQuantizer,
    Int8DynActInt4WeightQuantizer,
    ...
)

group_quantize_tensor_symmetric,
)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment to say everything in this file is intended to be deprecated and eventually removed? Then we should also add the new alternative of running these using the quantize_ API, e.g.

from torchao.quantization import (
    quantize_,
    int4_weight_only,
    int8_dynamic_activation_int4_weight,
)
quantize_(model, int4_weight_only())
quantize_(model, int8_dynamic_activation_int4_weight())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, can you elaborate on this step a little bit more? I'm a bit confused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure, I mean we can express everything in this file using the new quantize_ API:

# Before:
quantizer = Int4WeightOnlyQuantizer()  # or Int8DynActInt4WeightQuantizer
model = quantizer.quantize(model)

# After:
quantize_(model, int4_weight_only())  # or int8_dynamic_activation_int4_weight

We want to deprecate the Quantizer APIs eventually in favor of the quantize_ API, so we want to discourage people from using the quantizers in this file. I think adding a comment here explaining this context will help with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I added the comments to the top of the file noting the deprecation.

__all__ += [
"Int8DynActInt4WeightQuantizer",
"Int8DynActInt4WeightGPTQQuantizer",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this empty line?

"Int4WeightOnlyQuantizer",
"WeightOnlyInt4Linear",
"Int8DynActInt4WeightLinear",
"Int8DynActInt4WeightQuantizer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't expose these if they weren't exposed before, since these are meant to be legacy APIs?

)

@unittest.skip("skipping until we get checkpoints for gpt-fast")
def test_quantizer_int4_weight_only(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR just move the location of these classes, if so this test should still pass right, or am I missing something? @danielpatrickhug @HDCharles Is there a reason why we want to remove this test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want 2 int4 implementations

Copy link
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for the refactor!

get_groupwise_affine_qparams,
groupwise_affine_quantize_tensor,
)
from torchao.quantization.prototype.qat.utils import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry this is recently updated to torchao.quantization.qat.utils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually you'll face some conflicts with this file. Can you make the changes in torchao/quantization/qat/linear.py instead? Sorry for the merge conflicts

@andrewor14
Copy link
Contributor

Oh actually I just realized. The latest changes don't actually remove the classes from GPTQ.py. We should still do that so we don't end up with duplicate copies of the same classes. What I meant was remove those classes from GPTQ.py and add the following imports to GPTQ.py to maintain legacy imports:

from torchao.quantization._quantized_linear import (
    Int4WeightOnlyQuantizer,
    Int8DynActInt4WeightLinear,
    Int8DynActInt4WeightQuantizer,
    WeightOnlyInt4Linear,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants