Skip to content

Commit bcf77e5

Browse files
committed
make quantize_.set_inductor_config None by default for future deprecation
Summary: We want to migrate this to individual workflows, see #1715 for migration plan. This PR is step 1 where we enable distinguishing whether the user specified this argument or not. After this PR, we can control the behavior per-workflow, such as setting this functionality to False for future training workflows. Test Plan: CI Reviewers: Subscribers: Tasks: Tags:
1 parent 12e830b commit bcf77e5

File tree

2 files changed

+14
-2
lines changed

2 files changed

+14
-2
lines changed

torchao/quantization/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,6 +386,9 @@ The benchmarks below were run on a single NVIDIA-A6000 GPU.
386386
You try can out these apis with the `quantize_` api as above alongside the constructor `codebook_weight_only` an example can be found in in `torchao/_models/llama/generate.py`.
387387

388388
### Automatic Inductor Configuration
389+
390+
:warning: <em>This functionality is being migrated from the top level `quantize_` API to individual workflows, see https://github.com/pytorch/ao/issues/1715 for more details.</em>
391+
389392
The `quantize_` and `autoquant` apis now automatically use our recommended inductor configuration setings. You can mimic the same configuration settings for your own experiments by using the `torchao.quantization.utils.recommended_inductor_config_setter` to replicate our recommended configuration settings. Alternatively if you wish to disable these recommended settings, you can use the key word argument `set_inductor_config` and set it to false in the `quantize_` or `autoquant` apis to prevent assignment of those configuration settings. You can also overwrite these configuration settings after they are assigned if you so desire, as long as they are overwritten before passing any inputs to the torch.compiled model. This means that previous flows which referenced a variety of inductor configurations that needed to be set are now outdated, though continuing to manually set those same inductor configurations is unlikely to cause any issues.
390393

391394
## (To be moved to prototype) A16W4 WeightOnly Quantization with GPTQ

torchao/quantization/quant_api.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -488,7 +488,7 @@ def quantize_(
488488
model: torch.nn.Module,
489489
config: Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]],
490490
filter_fn: Optional[Callable[[torch.nn.Module, str], bool]] = None,
491-
set_inductor_config: bool = True,
491+
set_inductor_config: Optional[bool] = None,
492492
device: Optional[torch.types.Device] = None,
493493
):
494494
"""Convert the weight of linear modules in the model with `config`, model is modified inplace
@@ -498,7 +498,7 @@ def quantize_(
498498
config (Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]]): either (1) a workflow configuration object or (2) a function that applies tensor subclass conversion to the weight of a module and return the module (e.g. convert the weight tensor of linear to affine quantized tensor). Note: (2) will be deleted in a future release.
499499
filter_fn (Optional[Callable[[torch.nn.Module, str], bool]]): function that takes a nn.Module instance and fully qualified name of the module, returns True if we want to run `config` on
500500
the weight of the module
501-
set_inductor_config (bool, optional): Whether to automatically use recommended inductor config settings (defaults to True)
501+
set_inductor_config (bool, optional): Whether to automatically use recommended inductor config settings (defaults to None)
502502
device (device, optional): Device to move module to before applying `filter_fn`. This can be set to `"cuda"` to speed up quantization. The final model will be on the specified `device`.
503503
Defaults to None (do not change device).
504504
@@ -522,6 +522,15 @@ def quantize_(
522522
quantize_(m, int4_weight_only(group_size=32))
523523
524524
"""
525+
if set_inductor_config != None:
526+
warnings.warn(
527+
"""The `set_inductor_config` argument to `quantize_` will be removed in a future release. This functionality is being migrated to individual workflows. Please see https://github.com/pytorch/ao/issues/1715 for more details."""
528+
)
529+
else: # None
530+
# for now, default to True to not change existing behavior when the
531+
# argument is not specified
532+
set_inductor_config = True
533+
525534
if set_inductor_config:
526535
torchao.quantization.utils.recommended_inductor_config_setter()
527536

0 commit comments

Comments
 (0)