Add quantize_ nn.Parameter support #3083

jcaip · 2025-09-26T20:14:50Z

This PR adds in support for quantizing nn.Parameter to quantize_.

ModuleFqnToConfig has been renamed to FqnToConfig, which now accepts both module fqn and parameter fqns. ModuleFqnToConfig has been aliased to maintain BC.

API examples

For example, a toy nn.Linear model,

class MyModel(nn.Module()):
  linear1 = nn.Linear(128, 128)
  linear2 = nn.Linear(128, 128)

model = MyModel()

The keys to FqnToConfig can be one of the following (in order of precedence):

exact module FQN - We can quantize the weight of the first linear as follows

quant_config = FqnToConfig({
    "linear1": Float8DynamicActivationFloat8WeightConfig(
        granularity=PerRow(),
    ),
})

regex that matches module FQN (prepended by re:)

quant_config = FqnToConfig({
    "re:linear*": Float8DynamicActivationFloat8WeightConfig(
        granularity=PerRow(),
    ),
})

exact parameter FQN

quant_config = FqnToConfig({
    "linear1.weight": Float8DynamicActivationFloat8WeightConfig(
        granularity=PerRow(),
    ),
})

regex that matches parameter FQN (prepended by re:)

quant_config = FqnToConfig({
    "re:linear*.weight": Float8DynamicActivationFloat8WeightConfig(
        granularity=PerRow(),
    ),
})

_default

quant_config = FqnToConfig({
    "_default": Float8DynamicActivationFloat8WeightConfig(
        granularity=PerRow(),
    ),
})

To enable support for parameter fqn for a paticular config, we need to add the parameter_name kwarg into the config signature, and update CUSTOM_PARAM_QUANTIZATION_SUPPOTED_CONFIGS. See the changes here for more details.

Float8DynamicActivationFloat8WeightConfig has been enabled by this PR, but other configs will throw an NotImplementedError.

Test Plan

unit tests for new config:

pytest test/quantization/test_quant_api.py::TestModuleOrParamFqnToConfig

regression test for ModuleFqnToConfig

pytest test/quantization/test_quant_api.py -k test_module_fqn_to_config

Make sure that we can load old HF checkpoints to maintain BC, run this

How do our configs translate for MoEs?

Currently, we define a bunch of configs that are for dense nn.Linear modules, how do these configs translate in the case of MoE inference?

Some background on MoE inference

There are two ways that forwards is implemented for MoE

For loop of nn.Linear - In this case, we break down the 3d weight x activation matmul into a for loop of 2d weight x activation matmuls. This can be seen here.

In this case, I argue that the semantics of the configs do not change at all from the normal nn.Linear case, as we are just doing a bunch of normal 2d linear matmuls.

bmm/grouped mm on the 3d weights / activations directly.

For this case, we'd need to add additional op support (bmm) for forwards. Depending on whether the subclass is an AQT subclass or non AQT subclass this will be added differently.

I plan to only support parameter quantization for non-AQT subclasses, my reasoning being that those are the most popular / important configs anyway (Float8Dynamic, Int4WeightOnly).

Below is a breakdown of what Configs map to AQT / non-AQT subclasses:

not using AQT	AffineQuantizedTensor
Float8DynamicActivationFloat8WeightConfig	FPXWeightOnlyConfig
Float8DynamicActivationInt4WeightConfig	Float8WeightOnlyConfig
Float8StaticActivationFloat8WeightConfig	Float8DynamicActivationFloat8SemiSparseWeightConfig
Int4WeightOnlyConfig (v2)	GemliteUIntXWeightOnlyConfig
	Int4DynamicActivationInt4WeightConfig
	Int8DynamicActivationInt4WeightConfig
	Int8DynamicActivationInt8WeightConfig
	Int8WeightOnlyConfig
	IntxWeightOnlyConfig
	UIntXWeightOnlyConfig

For these the majority of the semantics remain the same, the only semantics that really changes is PerRow granularity. and there's a very natural extension of PerRow to the 3d case (apply on the last dimension).

I took a look at the keys of the non-AQT configs below and what they would mean for MoEs.

Float8DynamicActivationFloat8WeightConfig

[('activation_dtype', <class 'torch.dtype'>),
 ('weight_dtype', <class 'torch.dtype'>),
 ('granularity',
  typing.Union[ForwardRef('PerTensor'), ForwardRef('PerRow'), typing.List[typing.Union[ForwardRef('PerTensor'), ForwardRef('PerRow')]], NoneType]),
 ('mm_config', typing.Optional[torchao.float8.inference.Float8MMConfig]),
 ('activation_value_lb', typing.Optional[float]),
 ('activation_value_ub', typing.Optional[float]),
 ('kernel_preference', <enum 'KernelPreference'>),
 ('set_inductor_config', <class 'bool'>),
 ('version', <class 'int'>)]

activation_dtype, weight_dtype, activation_value_lb, activation_value_ub all do not change meaning semantically.
granularity=PerTensor() does not change semantic meaning - we still use a single tensor to scale the entire weight tensor.
granularity=PerRow() does change meaning - we now calculate a scale for each row for the last dimension [-1] i.e for a weight of (E, N, K) we would expect PerRow to create scales of block size (1, 1, K).
mm_config kernel_preference and set_inductor_config stay the same as well.

Float8StaticActivationFloat8WeightConfig

[('scale', <class 'torch.Tensor'>),
 ('activation_dtype', <class 'torch.dtype'>),
 ('weight_dtype', <class 'torch.dtype'>),
 ('granularity',
  typing.Union[ForwardRef('PerTensor'), ForwardRef('PerRow'), typing.Tuple[typing.Union[ForwardRef('PerTensor'), ForwardRef('PerRow')], typing.Union[ForwardRef('PerTensor'), ForwardRef('PerRow')]], NoneType]),
 ('mm_config', typing.Optional[torchao.float8.inference.Float8MMConfig]),
 ('set_inductor_config', <class 'bool'>)]

scale should be passed in as a 3d tensor instead of a 2d tensor in the case of PerRow granularity

Float8DynamicActivationInt4WeightConfig

[('int4_packing_format', <enum 'Int4PackingFormat'>)]

int4_packing_format - Only "preshuffled" is supported and Int4PreshuffledTensor supports 3d weights.

Int4WeightOnlyConfig

[('group_size', <class 'int'>),
 ('layout',
  typing.Optional[torchao.dtypes.uintx.tensor_core_tiled_layout.TensorCoreTiledLayout]),
 ('use_hqq', <class 'bool'>),
 ('zero_point_domain',
  typing.Optional[torchao.quantization.quant_primitives.ZeroPointDomain]),
 ('set_inductor_config', <class 'bool'>),
 ('preserve_zero', typing.Optional[bool]),
 ('int4_packing_format', <enum 'Int4PackingFormat'>),
 ('int4_choose_qparams_algorithm', <enum 'Int4ChooseQParamsAlgorithm'>),
 ('version', <class 'int'>)]

group_size, int4_packing_format, int4_choose_qparams_algorithm, set_inductor_config are the only things that are set for v2 config,

I don't think these semantics of these change, although there are some packing formats that do not support 3d weights. It looks like (Int4PackingFormat.PLAIN_INT32, Int4PackingFormat.MARLIN_SPARSE).

Summary: This PR adds in a simple 2d and 3d moe implementation and tests `quantize_` on them to see if we get the same results. Test Plan: ``` pytest test/prototype/test_parameter.py -k test_quantize_parameter ``` Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2025-09-26T20:14:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3083

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures

As of commit f68f572 with merge base 5346f0e ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 36fb2bd73d4e033f608a7a9f507f06cbbc4d685e735e1b47b536e80332775d10 /exec failed with exit code 1
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 3e930f1c2709fb69a346843fbf2cc804589828e66b38009ebd2b137316d9021b /exec failed with exit code 1
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 2bd89a291954b61e81860bafc71c4846bbe8f33126765d6b49c56019f80f7da7 /exec failed with exit code 1
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t e5328912d064edebc91d6116f53eb42351e6733f10724f48f4b01cde651a4355 /exec failed with exit code 1
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 60bb5fc47e80815c302a5c12186453d7bb7e2de96e9a9d64a2677e79583a54f4 /exec failed with exit code 1
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 4de802004bb3baf22fe6026fefeb862d44b1aa0dfc2423f7e5868328a74db8e8 /exec failed with exit code 1
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t f731bf404a6ec31e514c9ba4f592d0baf32c6fe226c36aac040713e22baa16a2 /exec failed with exit code 1
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 9fd4d2f2f6c668e3126633e170f25eb653399642b52604fe1abbb15b1868652f /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/quant_api.py

jerryzh168 · 2025-09-26T21:04:26Z

current AOBaseConfig is more for linear weights, can it be extended to param config cleanly?

vkuzo · 2025-09-29T11:27:45Z

Add in ParamFqnToConfig config
This new config is very similar to ModuleFqnToConfig except it takes in nn.Parameter FQNs and also supports regexs.

Would it work to stick with ModuleFqnToConfig and update its meaning, to avoid introducing a new object with a lot of similarities with the old object? Pseudocode of what it could do:

def handle_module(model, fqn, config):
    if has_parameter(model, fqn):
        ... new behavior for parameters, apply parameter swap config ...
    elif has_parameter(model, fqn + '.weight'):
        ... old behavior, apply parameter swap config ...
    elif has_module(model, fqn):
        ... old behavior, apply module swap ...

jcaip · 2025-09-29T15:39:13Z

Would it work to stick with ModuleFqnToConfig and update its meaning, to avoid introducing a new object with a lot of similarities with the old object?

Yeah, we can do this. Do you think we should keep the ModuleFqnToConfig name? It's a little confusing I feel to pass in parameter fqn but it's also being used by huggingface and vllm so I think it would be better to keep it as is.

jcaip · 2025-09-29T15:52:07Z

current AOBaseConfig is more for linear weights, can it be extended to param config cleanly?

Yes I believe so, especially in the case of the Config object itself. We attach everything to the weight parameter for nn.Linear, so this allows us to specify the parameter name instead of assuming it's "weight".

The only thing that does not map cleanly IMO is the module_registration:

        # non user facing code
        @register_quantize_module_handler(WorkflowFooConfig)
        def _transform(
            mod: torch.nn.Module,
            config: WorkflowFooConfig,
        ) -> torch.nn.Module:
            # the transform is implemented here, usually a tensor sublass
            # weight swap or a module swap

I think we should define the transform for parameters as the base case (aka @register_quantize_handler) , and use that for the module flow (assuming the parameter is module.weight), since it's the more general case.

vkuzo · 2025-09-29T16:34:41Z

Do you think we should keep the ModuleFqnToConfig name? It's a little confusing I feel to pass in parameter fqn but it's also being used by huggingface and vllm so I think it would be better to keep it as is.

IMO we should change the current name and keep the old name for BC:

ParamOrModuleFqnToConfig = ...

# for bc
ModuleFqnToConfig = ParamOrModuleFqnToConfig

vkuzo · 2025-09-29T16:36:23Z

I think we should define the transform for parameters as the base case

To me it seems that the transform has to be for modules, because it is inplace. User can target a parameter if they want to, but the transform function always runs on a module that owns the parameter.

jerryzh168 · 2025-10-01T18:05:43Z

torchao/quantization/quant_api.py

+        # skip if not direct child
+        if "." not in name:
+            for pattern in config.param_fqn_to_config:
+                if re.match(pattern, f"{fqn}.{name}"):


so it applies to all params, regardless of what it is? e.g. bias? should we be more specific in what people are configuring?

I think we should consider the regex syntax separately, I can remove from this PR.

One thing I would like would be for quantize_ log the modules/params it's swapping so it's easy to see what the difference is.

andrewor14 · 2025-10-01T18:07:33Z

Does this mean we need to refactor all supported configs to use this structure?

@register_quantized_param_handler(config)
def _float8_dynamic_activation_float8_weight_quantize_tensor(...):
    # returns quantized tensor

def _float8_dynamic_activation_float8_weight_transform(...):
    module.weight = _float8_dynamic_activation_float8_weight_quantize_tensor(...)
    return module

jerryzh168 · 2025-10-06T19:40:38Z

torchao/quantization/quant_api.py

+class ModuleOrParamFqnToConfig(AOBaseConfig):
+    """Configuration class for applying different quantization configs to modules or parameters based on their fully qualified names (FQNs).
+
+    This extends the functionality of ModuleFqnToConfig to support parameter-level quantization configurations


nit: comment seems stale

vkuzo · 2025-10-07T10:38:15Z

Add in ParamFqnToConfig config
This new config is very similar to ModuleFqnToConfig except it takes in nn.Parameter FQNs and also supports regexs.

Would it work to stick with ModuleFqnToConfig and update its meaning, to avoid introducing a new object with a lot of similarities with the old object? Pseudocode of what it could do:
def handle_module(model, fqn, config):
    if has_parameter(model, fqn):
        ... new behavior for parameters, apply parameter swap config ...
    elif has_parameter(model, fqn + '.weight'):
        ... old behavior, apply parameter swap config ...
    elif has_module(model, fqn):
        ... old behavior, apply module swap ...

@jcaip would this be simpler than having two transform registration systems?

jcaip · 2025-10-07T15:00:32Z

Would it work to stick with ModuleFqnToConfig and update its meaning, to avoid introducing a new object with a lot of similarities with the old object? Pseudocode of what it could do:
def handle_module(model, fqn, config):
    if has_parameter(model, fqn):
        ... new behavior for parameters, apply parameter swap config ...
    elif has_parameter(model, fqn + '.weight'):
        ... old behavior, apply parameter swap config ...
    elif has_module(model, fqn):
        ... old behavior, apply module swap ...
@jcaip would this be simpler than having two transform registration systems?

cc @vkuzo

Hmm, I think the pseudocode mentioned here vs the logic in the PR and having two transform registration systems are a bit orthogonal. It's possible to have one registration system with the logic in the PR as well. I'm assuming your main concern is with having two registration systems? Let me know if that's not the case.

IMO it's about the same complexity to have one registration system vs two. My main preference for having two registration systems is that it reduces the amount of work we have to do to enable other Configs for parameter quantization - we just need to add the decorator to our from_hp or from_float class function. In the case of having a shared registration system, we'd need to modify each existing transform function manually to add non-weight param support.

vkuzo · 2025-10-07T15:04:48Z

I'm assuming your main concern is with having two registration systems?

yes, and even further IMO we should have a single "modify module inplace" paradigm instead of having one paradigm for modules and one for parameters

My main preference for having two registration systems is that it reduces the amount of work we have to do to enable other Configs for parameter quantization

IMO we should go for the solution where the resulting code is the simplest, if that involves manual work that seems OK to me, and we can parallelize the conversions if you don't want to do them alone. Reducing the work to convert but ending up with two systems seems like trading dev time now for increased system complexity later.

jcaip · 2025-10-07T15:25:33Z

OK I'll update the PR to use a single registration system.

yes, and even further IMO we should have a single "modify module inplace" paradigm instead of having one paradigm for modules and one for parameters

One thing I want to point out is that it's difficult to supports stuff like our vLLM integration, where we pass in a parameter that's not tied to any module, with a single "modify module inplace" paradigm.

vkuzo · 2025-10-07T15:41:22Z

One thing I want to point out is that it's difficult to supports stuff like our vLLM integration, where we pass in a parameter that's not tied to any module, with a single "modify module inplace" paradigm.

I think "everything is parameters" is also a valid solution, I just don't think we should have both - let's pick one?

jerryzh168 · 2025-10-07T20:17:34Z

torchao/quantization/quant_api.py

-        `module_fqn_to_config`: typing.OrderedDict[str, Optional[AOBaseConfig]]: an
-         ordered dictionary from
-             (1). fully qualified name (fqn) of module or
+        module_fqn_to_config (OrderedDict[str, Optional[AOBaseConfig]]): An ordered dictionary mapping


nit: use typing.OrderedDict since it's different from collections.OrderedDict

jerryzh168 · 2025-10-07T20:21:15Z

torchao/quantization/quant_api.py

+    Raises:
+        NotImplementedError: If a configuration type doesn't have a registered parameter handler.
+    """
+    top_level_named_parameters_list = [


nit: is this the same as list(dict(mod_containing_param.named_parameters()).items())

jerryzh168 · 2025-10-07T20:22:59Z

torchao/quantization/quant_api.py

+    for name, param in top_level_named_parameters_list:
+        for pattern, param_config in config.module_or_param_fqn_to_config.items():
+            full_param_fqn = f"{fqn}.{name}"
+            if (pattern == full_param_fqn) or (


btw, if we want exact match (==) to take precedence, I think it has to be a separate check,

if pattern == full_param_fqn: ... elif pattern.startswith("re:") and ...: ...

A test of

model: with linear1 module config: {"re:linear.*": config1, "linear1": config2}

and linear1 should be quantized with config2 instead of config1 should catch it

jerryzh168 · 2025-10-07T20:27:25Z

test/quantization/test_quant_api.py

+                "0": Float8DynamicActivationFloat8WeightConfig(
+                    granularity=PerRow(),
+                ),
+                "re:.*weight": Float8DynamicActivationFloat8WeightConfig(


we should test the reverse order I think, to make sure 0 takes precedence

jerryzh168 · 2025-10-07T20:28:20Z

test/quantization/test_quant_api.py

+        quantize_(
+            model,
+            quant_config,
+        )


jerryzh168 · 2025-10-07T22:46:37Z

torchao/quantization/quant_api.py

-        `module_fqn_to_config`: typing.OrderedDict[str, Optional[AOBaseConfig]]: an
-         ordered dictionary from
-             (1). fully qualified name (fqn) of module or
+        module_fqn_to_config (OrderedDict[str, Optional[AOBaseConfig]]): An ordered dictionary mapping


also to correct the naming, we can add a module_or_param_fqn_to_config field and use that for version 2, and go through the normal version update path like other configs as well I think

how about just fqn_to_config

yeah sounds good

jerryzh168

@jcaip can you add ModuleOrParamFqnToConfig to torchao docs as well? I would like to link to it in transformer docs

torchao/quantization/quant_api.py

andrewor14

Looks good overall, just one main question about how the default filter_fn interacts with the config

andrewor14 · 2025-10-09T16:10:45Z

docs/source/torchao_vllm_integration.md

+(fqn-configuration)=
+### 3. FQN Configuration

-For granular control, use `ModuleFqnToConfig`:


Looks like we also document this in serving.md, can you update that doc as well?

andrewor14 · 2025-10-09T16:11:24Z

test/quantization/test_quant_api.py

+        assert isinstance(model.shared_expert.gate_proj.weight, Float8Tensor)
+        assert model.shared_expert.gate_proj.weight.scale.numel() == 1
+
+    def test_quantize_modle_exact_match_preference(self):


nit: typo modle

andrewor14 · 2025-10-09T16:21:10Z

torchao/quantization/quant_api.py

    """
    torch._C._log_api_usage_once("torchao.quantization.quantize_")

    filter_fn = _is_linear if filter_fn is None else filter_fn


is this default filter_fn going to have unexpected consequences if people are using FqnToConfig? E.g. let's say someone literally just wants to quantize a very specific parameter:

quantize_(model, FqnToConfig({"layers.0.some.parameter": Int4WeightOnlyConfig()}))

If I'm reading the code correctly, right now we do the replacement if either (1) we match the filter_fn, or (2) we match the fqn. Would the above unexpectedly quantize all the other linear layers in the model?

In this case, replacement won't do anything as the other linear layers aren't specified in the config. I can add a test for this though.

Yeah I think it would be good to verify this, from the code it seems we do the replacement if we match either the filter_fn or the config (not and). Would also be good to clearly document the semantics of filter_fn in the docstring in this case

yeah, I think the semantic should be

if both fqn_to_config and filter_fn specified, both have to match for config to be applied (AND, not OR)

else, use whichever one is applied

it seems like we should consider breaking BC here and change the default filter_fn to is_linear, so that if user passes in filter_fn == None then only fqn_to_config is applied?

In my mind, if someone specifies a fqn in the config, it's pretty clear that they want to quantize it. So I think AND is kind of a footgun here, especially if the default filter_fn is is_linear. i.e. First time user wants to quantize a parameter, adds an entry to FqnToConfig, and the new param doesn't get quantized because the default filter_fn is is_linear. I guess we can just throw a warning in this instance though.

cc @jerryzh168 what do you think? I'll defer to whatever's most popular with the team.

Sounds good to me, ill update the pr

agreed on removing filter_fn longer term

I think it is used pretty widely though, so maybe not in this PR and we do it separately with a proper deprecation? We can punt in this PR by just throwing an exception if fqn_to_config is provided along with a non-default filter_fn.

filter_fn has a lot of internal uses, and it's how many users apply quantization/QAT to linear and embedding separately today. We should do a careful deprecation of this and make sure existing use cases have a good alternative

@andrewor14 , any thoughts on "We can punt in this PR by just throwing an exception if fqn_to_config is provided along with a non-default filter_fn."?

We can punt in this PR by just throwing an exception if fqn_to_config is provided along with a non-default filter_fn

Yeah sounds good to me

andrewor14 · 2025-10-09T16:22:37Z

torchao/quantization/quant_api.py

+            regex patterns (as strings) to quantization configurations.
+
+            The patterns can be one of the follows:
+             (1). fully qualified name (fqn) of module or paramter or


typo: paramter

andrewor14 · 2025-10-09T16:25:07Z

torchao/quantization/quant_api.py

-        `module_fqn_to_config`: typing.OrderedDict[str, Optional[AOBaseConfig]]: an
-         ordered dictionary from
-             (1). fully qualified name (fqn) of module or
+        module_fqn_to_config (OrderedDict[str, Optional[AOBaseConfig]]): An ordered dictionary mapping


the docstring still references the old arg name I think

andrewor14 · 2025-10-09T16:26:20Z

torchao/quantization/quant_api.py

+        torch._C._log_api_usage_once("torchao.quantization.FqnToConfig")
+        if len(self.module_fqn_to_config) > 0 and len(self.fqn_to_config) > 0:
+            warnings.warn(
+                "Both module_fqn_to_config and fqn_to_config are specified, only fqn_to_config will be used"


I feel this is going to be a silent error for some users, should we just ban this case for simplicity? It's not for BC

Yeah, we should just ValueError here.

andrewor14 · 2025-10-09T16:29:32Z

torchao/quantization/quant_api.py

+            warnings.warn(
+                "Both module_fqn_to_config and fqn_to_config are specified, only fqn_to_config will be used"
+            )
+        if len(self.module_fqn_to_config) > 0 and len(self.fqn_to_config) == 0:


nit: if you throw an error above then this can become:

if len(self.module_fqn_to_config) > 0: assert len(self.fqn_to_config) == 0 self.fqn_to_config = self.module_fqn_to_config

and you don't need the rest of the cases (probably don't need to update self.module_fqn_to_config to match self.fqn_to_config?)

andrewor14 · 2025-10-09T16:34:10Z

torchao/quantization/quant_api.py

-        return handler(module, c)

-    return module
+def select_module_if_filter_fn_or_contains_params_matching_pattern(


andrewor14

Looks good to me! I'll let Jerry/Vasiliy stamp since they reviewed this in more detail

andrewor14 · 2025-10-09T18:56:02Z

torchao/quantization/quant_api.py

+        Args:
+            fqn (str): The fully qualified name to match against the config patterns.
+            config (FqnToConfig): The FqnToConfig object containing mapping of FQNs or regex patterns to quantization configs.
+    torchao/quantization/quant_api.py


andrewor14 · 2025-10-09T18:58:32Z

torchao/quantization/quant_api.py

    """
    torch._C._log_api_usage_once("torchao.quantization.quantize_")

    filter_fn = _is_linear if filter_fn is None else filter_fn


Yeah I think it would be good to verify this, from the code it seems we do the replacement if we match either the filter_fn or the config (not and). Would also be good to clearly document the semantics of filter_fn in the docstring in this case

vkuzo · 2025-10-09T20:11:26Z

torchao/quantization/quant_api.py

+    return found, c
+
+
+def _select_module_if_filter_fn_or_contains_params_matching_pattern(


IMO should be AND, not OR

vkuzo · 2025-10-09T20:15:47Z

torchao/quantization/quant_api.py

-            _module_fqn_to_config_handler,
-            filter_fn,
+            _fqn_to_config_handler,
+            partial(


seems like we are passing one callable and one callable wrapping a callable into a fuction, seems a bit hard to follow. Have we considered just writing this directly instead?

I can write this as a lambda, if that's a bit clearer to you?

lambda mod, fqn: filter_fn(mod, fqn) and select_with_module(mod, fqn, config=config)

jcaip added 12 commits September 19, 2025 09:10

update for user-defined parameter names

95867ee

update

d2ebe99

added ParamFQN config

b4e2809

added param support to quantize_

8a02364

remove float8 changes

2f2715a

update

cd3d1a3

ruff format

a22e781

update

36775c8

update

b71e77f

remove old changes

6c8bc80

update

62e0c5c

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025

jcaip added 3 commits September 26, 2025 13:28

update

cb76016

undo

ee36c40

update

7c5ab04

jcaip commented Sep 26, 2025

View reviewed changes

torchao/quantization/quant_api.py Show resolved Hide resolved

jcaip requested review from jerryzh168 and vkuzo September 26, 2025 21:00

jcaip force-pushed the jcaip/quantize_param_support branch from fe12f23 to 7c5ab04 Compare September 30, 2025 14:38

jcaip mentioned this pull request Sep 30, 2025

Add missing Float8Tensor op support (unsqueeze, 3dslice) for 3d weights #3035

Merged

jerryzh168 reviewed Oct 1, 2025

View reviewed changes

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

jcaip added 2 commits October 7, 2025 09:24

update

42bdf68

update comment

829d31f

jerryzh168 reviewed Oct 7, 2025

View reviewed changes

test/quantization/test_quant_api.py

quantize_(

model,

quant_config,

)

Copy link

Contributor

jerryzh168 Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checks?

jerryzh168 reviewed Oct 7, 2025

View reviewed changes

jerryzh168 reviewed Oct 8, 2025

View reviewed changes

torchao/quantization/quant_api.py Show resolved Hide resolved

jerryzh168 reviewed Oct 8, 2025

View reviewed changes

torchao/quantization/quant_api.py Show resolved Hide resolved

jcaip added 5 commits October 9, 2025 06:19

consolidate handler logic

b40a03a

refactor

3566b6f

add test for same-level param and module

3fe4474

update docstring

84fa0a5

more docstring updates

8b9db03

andrewor14 reviewed Oct 9, 2025

View reviewed changes

jcaip added 3 commits October 9, 2025 10:04

update

098327b

cleanup

45b8133

update docstring

d1805ef

andrewor14 reviewed Oct 9, 2025

View reviewed changes

update tests

f68f572

vkuzo reviewed Oct 9, 2025

View reviewed changes

		return found, c


		def _select_module_if_filter_fn_or_contains_params_matching_pattern(

Add quantize_ nn.Parameter support #3083

Are you sure you want to change the base?

Add quantize_ nn.Parameter support #3083

Conversation

jcaip commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API examples

Test Plan

How do our configs translate for MoEs?

Some background on MoE inference

Float8DynamicActivationFloat8WeightConfig

Float8StaticActivationFloat8WeightConfig

Float8DynamicActivationInt4WeightConfig

Int4WeightOnlyConfig

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3083

❌ 8 New Failures

Uh oh!

Uh oh!

jerryzh168 commented Sep 26, 2025

Uh oh!

vkuzo commented Sep 29, 2025

Uh oh!

jcaip commented Sep 29, 2025

Uh oh!

jcaip commented Sep 29, 2025

Uh oh!

vkuzo commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Sep 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Oct 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Oct 7, 2025

Uh oh!

jcaip commented Oct 7, 2025

Uh oh!

vkuzo commented Oct 7, 2025

Uh oh!

jcaip commented Oct 7, 2025

Uh oh!

vkuzo commented Oct 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jcaip commented Sep 26, 2025 •

edited

Loading

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading

vkuzo commented Sep 29, 2025 •

edited

Loading

jerryzh168 Oct 7, 2025 •

edited

Loading

jerryzh168 Oct 7, 2025 •

edited

Loading

jerryzh168 Oct 7, 2025 •

edited

Loading