Skip to content

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Sep 26, 2025

Summary
We split the original big PR #2505 into the following smaller ones:

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

Copy link

pytorch-bot bot commented Sep 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6e1c2a2 with merge base 4013764 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025
@Xia-Weiwen Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 26, 2025
@Xia-Weiwen
Copy link
Collaborator Author

CC @mingfeima for review. Thanks.

@Xia-Weiwen
Copy link
Collaborator Author

Hi @mingfeima @jerryzh168 @andrewor14 Could you please review this PR? Thanks.

@Xia-Weiwen Xia-Weiwen marked this pull request as draft September 30, 2025 01:28
@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review September 30, 2025 01:35
@Xia-Weiwen
Copy link
Collaborator Author

Hi @mingfeima @jerryzh168 @andrewor14 Though this PR depends on #3100, could you please review this PR? Thanks.

float8_dtype=torch.float8_e4m3fn,
block_size=block_size,
)
data = _quantize_affine_float8(hp_tensor, scale, torch.float8_e4m3fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to use

def _quantize_affine_float8_non_decomposed(
?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Since we are not using Inductor for fusion like PT2E, it should be OK here.

return processed_granularity


def _normalize_granularity_opaque_tensor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this can't reuse the other normalize_granularity_tensor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated


# Define FP8Granularity type alias to break circular import dependencies
FP8Granularity = Union["PerTensor", "PerRow"]
FP8GranularityCPU = Union["PerTensor", "PerRow", "PerGroup"]
Copy link
Contributor

@jerryzh168 jerryzh168 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can reuse and extend FP8Granularity and assert only part of the options are supported for GPU right now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated.

block_size = get_block_size(x.shape, activation_granularity)
else:
group_size = activation_granularity.group_size
block_size = (*([1] * (len(x.shape) - 1)), group_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not included in get_block_size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks.

_check_hardware_support(granularity)
is_cpu = weight.device.type == "cpu"
if not is_cpu:
_check_hardware_support(granularity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to version 1? and then version 2 can do this check in the tensor itself probably

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks.

Comment on lines 1848 to 1856
if not is_cpu and not _fp8_mm_compat(weight):
# TODO(future PR): this should really throw an exception instead of silently
# not doing what the user asked
return weight

if isinstance(weight_granularity, PerRow):
if not is_cpu and isinstance(weight_granularity, PerRow):
assert weight.dtype == torch.bfloat16, (
"PerRow quantization only works for bfloat16 precision input weight"
)
Copy link
Contributor

@jerryzh168 jerryzh168 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also these checks, I feel we can move these to version 1 branch for now and deprecate later, we can add the checks to tensors for version 2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to version=1 branch causes CI failures. I will keep them as is. Maybe it can be improved later. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants