[Inductor][float8] Support qlinear for float8 in inductor #2565

shiyang-weng · 2025-07-17T02:59:10Z

For float8_e4m3fn, support

register_qlinear_weight_prepack
_register_qlinear_unary_fusion
_register_qlinear_binary_fusion
quant_lift_up

on inductor.

For FP8, there are following issues

q/dq switch to use quantize_affine_float8/dequantize_affine_float8
The q/dq API change. The fp8 q/dq requires type(scale) is tensor.
pt2e not support float8.

Based on these issues,

Need to handle fp8 q/dq pattern separately.
Handle scale separately.
We implement the function(fp8_convert_), which can add q/dq before the linear in the model. We add the function to test/quantization/pt2e/test_x86inductor_fusion.py

…uctor

Add fp8 dequant promotion

pytorch-bot · 2025-07-17T02:59:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2565

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[Maintenance] MacOS runners update

✅ No Failures

As of commit 4fb5f7a with merge base 8e2ca35 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/quantization/pt2e/test_x86inductor_fusion.py

shiyang-weng · 2025-08-04T04:29:05Z

@jerryzh168 Could you help review this pr

shiyang-weng · 2025-08-05T01:50:46Z

@jerryzh168 Could you help review this pr?

jerryzh168 · 2025-08-05T01:58:22Z

@shiyang-weng the registration PR is reverted in #2672, we'd need to land that again without breaking BC

Xia-Weiwen · 2025-08-06T01:24:07Z

@shiyang-weng the registration PR is reverted in #2672, we'd need to land that again without breaking BC

Hi @jerryzh168 Could you please provide a reproducer so that we can fix that? Thanks.

jerryzh168 · 2025-08-06T05:34:12Z

yeah this is the test:

ao/test/dtypes/test_affine_quantized_float.py

Line 735 in 418593c

def test_expected_kernels_on_gpu(self, granularity, torch_compile_mode):

that's added in the PR, need H100 GPU to run

…nto wengshiy/qlinear

Copilot

Pull Request Overview

This PR adds float8_e4m3fn support to PyTorch Inductor for qlinear operations, implementing quantization patterns specifically for FP8 data types. The implementation handles differences in FP8 quantization API requirements, including tensor-based scales and modified quantize/dequantize operations.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
torchao/quantization/pt2e/inductor_passes/x86.py	Adds FP8 quantization support with new patterns, updates existing functions to handle FP8 operations, and modifies view operation handling
test/quantization/pt2e/test_x86inductor_fusion.py	Adds comprehensive test coverage for FP8 quantization patterns and refactors test helpers to support FP8

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-23T03:07:29Z

torchao/quantization/pt2e/inductor_passes/x86.py

+        x_zp = kwargs["x_zp"] if "x_zp" in kwargs else None
+        w_zp = kwargs["w_zp"] if "w_zp" in kwargs else None


[nitpick] The extraction of qparams has inconsistent patterns. The first two use tuple unpacking while x_zp and w_zp use conditional extraction. For better maintainability and consistency, consider using the same pattern for all parameters.

Suggested change

x_zp = kwargs["x_zp"] if "x_zp" in kwargs else None

w_zp = kwargs["w_zp"] if "w_zp" in kwargs else None

x_zp = kwargs.get("x_zp")

w_zp = kwargs.get("w_zp")

Copilot · 2025-09-23T03:07:29Z

torchao/quantization/pt2e/inductor_passes/x86.py

+        is_tensor_overload,
+        is_fp8,
+    ) in linear_weight_prepack_cases:
+        if is_fp8 and not is_tensor_overload:


[nitpick] This skip condition appears in multiple places (lines 1429 and 1506). Consider extracting this logic into a helper function or constant to avoid code duplication and improve maintainability.

Suggested change

if is_fp8 and not is_tensor_overload:

if _should_skip_fp8_case(is_fp8, is_tensor_overload):

Copilot · 2025-09-23T03:07:30Z

torchao/quantization/pt2e/inductor_passes/x86.py

+        if output_dtype == torch.float8_e4m3fn:
+            # For float8, torchao.quantize_affine_float8 requires tensor as scale
+            # Support scale node is full firstly
+            assert kwargs["o_inv_scale"].target is torch.ops.aten.full.default


The assertion assumes kwargs[\"o_inv_scale\"] is always a node object, but there's no validation that it has a target attribute. This could cause an AttributeError if the object doesn't have this attribute.

Suggested change

assert kwargs["o_inv_scale"].target is torch.ops.aten.full.default

assert hasattr(kwargs["o_inv_scale"], "target") and kwargs["o_inv_scale"].target is torch.ops.aten.full.default, (

"Expected kwargs['o_inv_scale'] to be a node object with 'target' attribute set to torch.ops.aten.full.default"

)

Copilot · 2025-09-23T03:07:30Z

torchao/quantization/pt2e/inductor_passes/x86.py

+            # check if scale created by torch.tensor
+            return (
+                len(node.all_input_nodes) == 2
+                and node.all_input_nodes[1].target == torch.tensor


[nitpick] Using torch.tensor as a target comparison might be fragile since it's comparing against a function object. Consider using a more robust method to identify tensor creation nodes, such as checking the function name or using a more specific target.

Suggested change

and node.all_input_nodes[1].target == torch.tensor

and torch.fx.node._qualified_name(node.all_input_nodes[1].target) == "torch.tensor"

Copilot · 2025-09-23T03:07:30Z

test/quantization/pt2e/test_x86inductor_fusion.py

+class FP8QDQLinear(torch.nn.Module):
+    def __init__(self, in_features, out_features, has_bias):
+        super().__init__()
+        self.qtype = torch.float8_e4m3fn
+        self.weight = torch.randn((out_features, in_features)).to(self.qtype)
+        self.weight_scale = 2.0
+        self.scale = 2.0
+        self.bias = None
+        if has_bias:
+            self.bias = torch.randn((out_features,))


[nitpick] The hardcoded scale values (2.0) should be configurable parameters or documented constants to improve maintainability and make the test more flexible.

Copilot · 2025-09-23T03:07:30Z

test/quantization/pt2e/test_x86inductor_fusion.py

+        if is_fp8:
+            # fp8_convert_ not support dynamic and qat yet
+            assert not is_dynamic
+            assert not is_qat


[nitpick] This assertion pattern appears multiple times in the test file (lines 206-208 and 1954-1957). Consider extracting this validation into a helper function to reduce code duplication.

shiyang-weng · 2025-09-23T08:37:12Z

This PR used for support fp8 on PT.
But it is not in PT2.8. So I add version check on UT

shiyang-weng · 2025-09-25T06:10:17Z

CC @mingfeima for review

jerryzh168

do you need to change the quant flow code to produce this op?

I'd recommend to do this by defining a new observer, use this API:

ao/test/quantization/pt2e/test_quantize_pt2e.py

Line 2315 in c96f2dd

def test_observer_callback(self):

* quantize_affine_float8/dequantize_affine_float8 not decomposed on inductor * remove redundant unittest.skipIf * fix rebase issue * change dispatch key to a flag decomposed * support scaled_mm on inductor * fix rebase issue * support dequant promtion for fp8 * add ut * remove redundant codes * fix lint * resolve conflict * change to use qlinear * add ut * fix lint * support fp8 quant_lift_up * add reshape into _VIEW_METHOD_OPS * add quant_input_check * fix lint * refine ut * remove fp8 dynamic quant ut * fix output_scale issue * add float8_e4m3fn to dtype_list * refine code * refine code * fix bugs * add comment * merge main * change to use non-decomposed q/dq * fix lint * add version check * change version * fix attention bug; update ut * add liftup oplist

shiyang-weng added 20 commits June 18, 2025 15:22

quantize_affine_float8/dequantize_affine_float8 not decomposed on ind…

a840ef5

…uctor

remove redundant unittest.skipIf

02d045b

fix rebase issue

9860c56

change dispatch key to a flag decomposed

ca662f3

support scaled_mm on inductor

f51a5be

fix rebase issue

719793c

support dequant promtion for fp8

48a3d99

add ut

1921b2f

remove redundant codes

0335415

Merge pull request #2 from shiyang-weng/wengshiy/dequant_promotion

955fa6e

Add fp8 dequant promotion

Merge remote-tracking branch 'origin/main' into wengshiy/scaled_mm

a70e094

fix lint

a5bb4d0

Merge branch 'main' into wengshiy/scaled_mm

1c1f890

resolve conflict

0c7f8ea

change to use qlinear

0175b17

add ut

564d4b7

fix lint

9948674

Merge remote-tracking branch 'origin/main' into wengshiy/qlinear

413a883

support fp8 quant_lift_up

558d216

add reshape into _VIEW_METHOD_OPS

8cd1433

shiyang-weng marked this pull request as draft July 17, 2025 02:59

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2025

shiyang-weng commented Jul 17, 2025

View reviewed changes

test/quantization/pt2e/test_x86inductor_fusion.py Show resolved Hide resolved

shiyang-weng added 6 commits July 17, 2025 09:53

add quant_input_check

ae4f582

Merge remote-tracking branch 'origin/main' into wengshiy/qlinear

469ac50

fix lint

8026306

refine ut

f735949

remove fp8 dynamic quant ut

5803511

fix output_scale issue

3e37dea

Merge remote-tracking branch 'origin/main' into wengshiy/qlinear

260f686

shiyang-weng marked this pull request as ready for review August 1, 2025 01:29

jerryzh168 mentioned this pull request Aug 12, 2025

Add float8 FakeQuantizeConfig and FakeQuantizer #2735

Merged

shiyang-weng added 4 commits September 22, 2025 00:54

merge main

db46a18

Merge branch 'main' into wengshiy/qlinear

3d3f8cf

change to use non-decomposed q/dq

4e7afcb

Merge remote-tracking branch 'refs/remotes/origin/wengshiy/qlinear' i…

d79c3cc

…nto wengshiy/qlinear

Xia-Weiwen requested a review from Copilot September 23, 2025 03:05

Copilot AI reviewed Sep 23, 2025

View reviewed changes

shiyang-weng added 2 commits September 22, 2025 23:12

fix lint

e417a4e

add version check

c23e286

shiyang-weng added 3 commits September 23, 2025 05:08

change version

77da321

fix attention bug; update ut

7ffc616

Merge remote-tracking branch 'origin/main' into wengshiy/qlinear

f1bbf13

shiyang-weng marked this pull request as draft September 25, 2025 06:09

add liftup oplist

4fb5f7a

Valentine233 mentioned this pull request Sep 26, 2025

[CPU] [FP8 SDPA] Enable FP8 SDPA pattern match #3076

Open

mingfeima approved these changes Sep 29, 2025

View reviewed changes

shiyang-weng marked this pull request as ready for review September 29, 2025 07:22

Xia-Weiwen requested a review from jerryzh168 September 29, 2025 07:22

jerryzh168 approved these changes Oct 6, 2025

View reviewed changes

jerryzh168 merged commit a52a64a into pytorch:main Oct 8, 2025
18 checks passed

		x_zp = kwargs["x_zp"] if "x_zp" in kwargs else None
		w_zp = kwargs["w_zp"] if "w_zp" in kwargs else None

	if is_fp8 and not is_tensor_overload:
	if _should_skip_fp8_case(is_fp8, is_tensor_overload):

-            assert kwargs["o_inv_scale"].target is torch.ops.aten.full.default
+            assert hasattr(kwargs["o_inv_scale"], "target") and kwargs["o_inv_scale"].target is torch.ops.aten.full.default, (
+                "Expected kwargs['o_inv_scale'] to be a node object with 'target' attribute set to torch.ops.aten.full.default"
+            )

	and node.all_input_nodes[1].target == torch.tensor
	and torch.fx.node._qualified_name(node.all_input_nodes[1].target) == "torch.tensor"

[Inductor][float8] Support qlinear for float8 in inductor #2565

[Inductor][float8] Support qlinear for float8 in inductor #2565

Conversation

shiyang-weng commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2565

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Uh oh!

shiyang-weng commented Aug 4, 2025

Uh oh!

shiyang-weng commented Aug 5, 2025

Uh oh!

jerryzh168 commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xia-Weiwen commented Aug 6, 2025

Uh oh!

jerryzh168 commented Aug 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

shiyang-weng commented Sep 23, 2025

Uh oh!

shiyang-weng commented Sep 25, 2025

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shiyang-weng commented Jul 17, 2025 •

edited

Loading

pytorch-bot bot commented Jul 17, 2025 •

edited

Loading

jerryzh168 commented Aug 5, 2025 •

edited

Loading