[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Xia-Weiwen · 2025-09-26T03:18:08Z

Summary
We split the original big PR #2505 into the following smaller ones:

Unify get_block_size #3039 (relanded by [Reland] Unify get_block_size #3059)
[CPU] Add ops for float8 linear #3052
And this PR [CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075, which as the Float8OpaqueTensor for dynamic float8 act float8 weight quantization on CPU

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

pytorch-bot · 2025-09-26T03:18:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6e1c2a2 with merge base 4013764 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-09-26T03:19:01Z

CC @mingfeima for review. Thanks.

Xia-Weiwen · 2025-09-28T01:16:59Z

Hi @mingfeima @jerryzh168 @andrewor14 Could you please review this PR? Thanks.

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

Xia-Weiwen · 2025-09-30T01:38:23Z

Hi @mingfeima @jerryzh168 @andrewor14 Though this PR depends on #3100, could you please review this PR? Thanks.

jerryzh168 · 2025-10-06T20:51:34Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+            float8_dtype=torch.float8_e4m3fn,
+            block_size=block_size,
+        )
+        data = _quantize_affine_float8(hp_tensor, scale, torch.float8_e4m3fn)


do you need to use

ao/torchao/quantization/quant_primitives.py

Line 2425 in c96f2dd

def _quantize_affine_float8_non_decomposed(

?

Thanks. Since we are not using Inductor for fusion like PT2E, it should be OK here.

jerryzh168 · 2025-10-06T20:52:19Z

torchao/float8/inference.py

    return processed_granularity


+def _normalize_granularity_opaque_tensor(


why this can't reuse the other normalize_granularity_tensor?

Thanks. Updated

jerryzh168 · 2025-10-06T20:53:10Z

torchao/float8/types.py


 # Define FP8Granularity type alias to break circular import dependencies
 FP8Granularity = Union["PerTensor", "PerRow"]
+FP8GranularityCPU = Union["PerTensor", "PerRow", "PerGroup"]


I feel we can reuse and extend FP8Granularity and assert only part of the options are supported for GPU right now

Thanks. Updated.

jerryzh168 · 2025-10-06T20:53:57Z

torchao/quantization/quant_api.py

+        block_size = get_block_size(x.shape, activation_granularity)
+    else:
+        group_size = activation_granularity.group_size
+        block_size = (*([1] * (len(x.shape) - 1)), group_size)


why is this not included in get_block_size?

Updated. Thanks.

jerryzh168 · 2025-10-06T20:55:17Z

torchao/quantization/quant_api.py

-    _check_hardware_support(granularity)
+    is_cpu = weight.device.type == "cpu"
+    if not is_cpu:
+        _check_hardware_support(granularity)


can you move this to version 1? and then version 2 can do this check in the tensor itself probably

Sure. Thanks.

jerryzh168 · 2025-10-06T20:55:47Z

torchao/quantization/quant_api.py

+    if not is_cpu and not _fp8_mm_compat(weight):
        # TODO(future PR): this should really throw an exception instead of silently
        # not doing what the user asked
        return weight

-    if isinstance(weight_granularity, PerRow):
+    if not is_cpu and isinstance(weight_granularity, PerRow):
        assert weight.dtype == torch.bfloat16, (
            "PerRow quantization only works for bfloat16 precision input weight"
        )


also these checks, I feel we can move these to version 1 branch for now and deprecate later, we can add the checks to tensors for version 2

Sure. thanks.

Moving this to version=1 branch causes CI failures. I will keep them as is. Maybe it can be improved later. Thanks.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025

Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 26, 2025

Xia-Weiwen requested review from jerryzh168 and andrewor14 September 26, 2025 06:10

Xia-Weiwen marked this pull request as ready for review September 26, 2025 06:10

Xia-Weiwen mentioned this pull request Sep 26, 2025

[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight #2505

Closed

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight

d460134

mingfeima reviewed Sep 28, 2025

View reviewed changes

Xia-Weiwen requested a review from mingfeima September 28, 2025 02:08

Xia-Weiwen marked this pull request as draft September 30, 2025 01:28

Xia-Weiwen marked this pull request as ready for review September 30, 2025 01:35

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

Xia-Weiwen added 3 commits October 9, 2025 10:00

Update _normalize_granularity

cf8dc09

Update torchao/quantization/quant_api.py

4333727

Fix CI

6e1c2a2

		return processed_granularity


		def _normalize_granularity_opaque_tensor(

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Are you sure you want to change the base?

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Uh oh!

Conversation

Xia-Weiwen commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

✅ No Failures

Uh oh!

Xia-Weiwen commented Sep 26, 2025

Uh oh!

Xia-Weiwen commented Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Sep 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Xia-Weiwen commented Sep 26, 2025 •

edited

Loading

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading

jerryzh168 Oct 6, 2025 •

edited

Loading

jerryzh168 Oct 6, 2025 •

edited

Loading