Add Int8Tensor for clearer interface #3038

namgyu-youn · 2025-09-21T11:06:20Z

Summary:
Introduce new tensor subclass API for int8 quantization with clearer interface.

The main change can be summarized to the following:

Old: Complex affine transform (AffineQuantizedTensor) with separate layout handling
New: Direct int8 tensor with scaling factor and zero point

Test plan:
test/quantization/quantize_/workflows/int8/test_int8_tensor.py

Introduce new tensor subclass API for int8 quantization with clearer interface. The main change can be summarized to the following: - Old: Complex affine transform (AffineQuantizedTensor) with separate layout handling - New: Direct int8 tensor with qdata, scale, and zero_point attributes Test plan: test/quantization/quantize_/workflows/int8/test_int8_tensor.py Future plan: Implement block-wise quantization using `block_size` parameter

pytorch-bot · 2025-09-21T11:06:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3038

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

jerryzh168 · 2025-09-22T17:41:34Z

can you add a version 2 and expose this tensor through

ao/torchao/quantization/quant_api.py

Line 1497 in 8525185

class Int8DynamicActivationInt8WeightConfig(AOBaseConfig):

? similar to

ao/torchao/quantization/quant_api.py

Line 1752 in 8525185

class Float8DynamicActivationFloat8WeightConfig(AOBaseConfig):

jerryzh168 · 2025-09-22T17:42:14Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        args[2] if len(args) > 2 else None,
+    )
+
+    if isinstance(input_tensor, Int8PlainInt8Tensor):


we also need to quantize input_tensor in this function now, please check

ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py

Lines 263 to 266 in 9d88c16

if act_quant_kwargs is not None:

input_tensor = _choose_quant_func_and_quantize_tensor(

input_tensor, act_quant_kwargs

)

jerryzh168 · 2025-09-23T17:43:58Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        x_int32 = input_tensor.qdata.to(torch.int32)
+        w_int32 = weight_tensor.qdata.to(torch.int32).t()
+
+        result = torch.mm(x_int32.view(-1, x_int32.size(-1)), w_int32)
+        scale = input_tensor.scale.view(-1, 1) * weight_tensor.scale.unsqueeze(0)
+        result = result.to(scale.dtype) * scale
+        result = result.view(*input_tensor.shape[:-1], -1)


this is not the same as

ao/torchao/dtypes/uintx/plain_layout.py

Lines 269 to 315 in 122b307

def _linear_int8_act_int8_weight_check(input_tensor, weight_tensor, bias):

return (

isinstance(input_tensor, AffineQuantizedTensor)

and _aqt_is_int8_reduced_range(input_tensor)

and isinstance(weight_tensor, AffineQuantizedTensor)

and _aqt_is_int8(weight_tensor)

and input_tensor.dtype == weight_tensor.dtype

and isinstance(input_tensor._layout, PlainLayout)

and isinstance(weight_tensor._layout, PlainLayout)

)

def _linear_int8_act_int8_weight_impl(input_tensor, weight_tensor, bias):

#

# 1. do the matrix form of dot(X_i, W_j)

#

#

# 2. rescale the output

#

# in cases with large matrices, y_dot_int32 can grow sufficiently

# large that y_dot_int32 * a float16 scale is greater than the maximum

# value of a float 16, (which results in a value of inf even if multiplying

# by the other scale would bring it within the expected range)

x_vals_int8 = input_tensor.tensor_impl.int_data

x_scales = input_tensor.tensor_impl.scale

w_vals_int8_t = weight_tensor.tensor_impl.int_data.contiguous().t()

w_scales = weight_tensor.tensor_impl.scale

tmp = x_vals_int8.reshape(-1, x_vals_int8.shape[-1])

x_scales_dtype = x_scales.dtype

# Cast fp16 scale to float to avoid overflow in int_scaled_matmul

intermediate_dtype = torch.float if x_scales_dtype == torch.half else x_scales_dtype

y_dot_scaled = int_scaled_matmul(

tmp, w_vals_int8_t, x_scales.reshape(-1, 1).to(intermediate_dtype)

)

y_dot_scaled = y_dot_scaled.to(x_scales_dtype)

y = (y_dot_scaled * w_scales).reshape(

*x_vals_int8.shape[:-1], y_dot_scaled.shape[-1]

)

# can downcast only at the very end

output_dtype = input_tensor.dtype

y = y.to(output_dtype)

if bias is not None:

y += bias

return y

?

can you add a test to check the kernel that's used similar to

ao/test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Line 420 in 8e2ca35

def test_expected_gpu_kernel_fbgemm(self):

as well?

can you add a test to check the kernel that's used similar to

ao/test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Line 420 in 8e2ca35

def test_expected_gpu_kernel_fbgemm(self):

as well?

Yes linked workflow should be better to prevent overhead, I will fix it.

jerryzh168 · 2025-09-23T17:44:46Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        result = result.to(scale.dtype) * scale
+        result = result.view(*input_tensor.shape[:-1], -1)
+    else:
+        # FP × INT8 (static)


also this is the code for weight only quant I think:

ao/torchao/dtypes/uintx/plain_layout.py

Line 250 in 122b307

def _linear_fp_act_int8_weight_impl(input_tensor, weight_tensor, bias):

Done at 9383550 , thanks for pointing it out.

jerryzh168 · 2025-09-23T17:46:05Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        block_size (Optional[list[int]]): block size for quantization granularity
+    """
+
+    kernel_preference: KernelPreference = KernelPreference.AUTO


seems like no multiple kernel preferences right now right? if so, we can remove this for now

We can remove this flag, but how about adding TODO for real kernel preference? Keeping current structure might be helpful for it.

we don't have different kernel options for this one I think

jerryzh168 · 2025-09-23T17:52:32Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+
+
+@unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available")
+class TestInt8Tensor(TorchAOIntegrationTestCase):


for test, maybe try to follow https://github.com/pytorch/ao/blob/main/test/quantization/quantize_/workflows/int4/test_int4_marlin_sparse_tensor.py for now and also add some tests for slicing?

ao/test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Lines 158 to 216 in 8e2ca35

def test_slice(self, granularity):

config = Float8DynamicActivationFloat8WeightConfig(granularity=granularity)

dtype = torch.bfloat16

device = "cuda"

dummy = torch.nn.Linear(256, 256, bias=False, dtype=dtype, device=device)

dummy1 = torch.nn.Linear(256, 64, bias=False, dtype=dtype, device=device)

dummy1.weight = torch.nn.Parameter(

dummy.weight.narrow(0, 0, 64), requires_grad=False

)

dummy2 = torch.nn.Linear(128, 256, dtype=dtype, device=device)

dummy2.weight = torch.nn.Parameter(

dummy.weight.narrow(1, 0, 128), requires_grad=False

)

quantize_(dummy, config)

weight1 = dummy.weight.clone().narrow(0, 0, 64)

weight2 = dummy.weight.clone().narrow(1, 0, 128)

self.assertEqual(

weight1.qdata,

dummy.weight.qdata.narrow(0, 0, 64),

)

self.assertEqual(

weight2.qdata,

dummy.weight.qdata.narrow(1, 0, 128),

)

if isinstance(granularity, PerRow):

self.assertEqual(

weight1.scale,

dummy.weight.scale.narrow(0, 0, 64),

)

self.assertEqual(

weight2.scale,

dummy.weight.scale,

)

else:

self.assertEqual(

weight1.scale,

dummy.weight.scale,

)

self.assertEqual(

weight2.scale,

dummy.weight.scale,

)

# check for sliced weight, before and after float8 quantization

# does not differ too much

input = torch.randn(2, 256, dtype=dtype, device=device)

res_ref = dummy1(input)

dummy.weight = torch.nn.Parameter(weight1.contiguous(), requires_grad=False)

res = dummy(input)

sqnr = compute_error(res, res_ref)

self.assertTrue(sqnr > 25, f"sqnr: {sqnr}")

input = torch.randn(2, 128, dtype=dtype, device=device)

res_ref = dummy2(input)

dummy.weight = torch.nn.Parameter(weight2.contiguous(), requires_grad=False)

res = dummy(input)

sqnr = compute_error(res, res_ref)

self.assertTrue(sqnr > 15, f"sqnr: {sqnr}")

and

ao/test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Line 278 in 8e2ca35

def test_slice_preserves_aliasing(self, granularity):

Yes linked unit test is helpful for slicing (PerTensor, PerRow) test, but I didn't implemented granularity in this PR yet for smaller PR size. Can I address it after this PR?

I don't think the slicing tests are specific to a granularity, you should be able to adapt it for the currently supported granularity I think

jerryzh168 · 2025-09-25T21:35:24Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+            raise ValueError("Expected 2D tensor and block_size length 2")
+
+        # Rounding function from high precision dtype
+        scale = w.abs().max(dim=-1, keepdim=True)[0] / 127.0


looks like block_size is not used? why is that?

you can checkout

ao/torchao/dtypes/uintx/plain_layout.py

Line 232 in 8c5c33e

def _linear_fp_act_int8_weight_check(input_tensor, weight_tensor, bias):

for expected granularity

also this should be using these quant primitive ops:

ao/torchao/quantization/quantize_/workflows/int4/int4_marlin_sparse_tensor.py

Lines 79 to 97 in 8c5c33e

scale, zero_point = choose_qparams_affine(

input=preprocessed_w,

mapping_type=MappingType.SYMMETRIC,

block_size=block_size,

target_dtype=target_dtype,

quant_min=quant_min,

quant_max=quant_max,

eps=1e-6,

)

wq = quantize_affine(

input=preprocessed_w,

block_size=block_size,

scale=scale,

zero_point=zero_point,

output_dtype=target_dtype,

quant_min=quant_min,

quant_max=quant_max,

)

, arguments can be found by tracing through the code path for int8 in

ao/torchao/quantization/quant_api.py

Line 1566 in 8c5c33e

new_weight = to_affine_quantized_intx(

and

ao/torchao/dtypes/affine_quantized_tensor.py

Line 325 in 8c5c33e

scale, zero_point = choose_qparams_affine(

this might require a bit too much context, let me know if you would like us to take over

Thanks, surely want to take over! Drafted this PR for those updates, but will look into it today (6 hours later)

btw, version 2 is updated at c53dad0 (version 1 is default)

jerryzh168

please rebase, and let me know when this is ready for review again @namgyu-youn

jerryzh168 · 2025-10-06T17:36:04Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+            self.input_fp, weight_q8_dynamic, self.bias
+        )
+
+        self.assertEqual(result_dynamic.shape, reference.shape)


nit: probably add a test for compute_error comparing floating point weight and int8+int8 weight as well

jerryzh168 · 2025-10-06T17:36:38Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+        )
+
+    def test_linear_operations(self):
+        """Test fp+int8 and int8+int8 linear ops"""


this is not int8+int8 I think? this is weight only quant

jerryzh168 · 2025-10-06T17:37:01Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+    def test_linear_operations(self):
+        """Test fp+int8 and int8+int8 linear ops"""
+        weight_q8 = Int8Tensor.from_hp(self.weight_fp, self.block_size)
+        input_q8 = Int8Tensor.from_hp(self.input_fp, self.block_size)
+
+        reference = torch.nn.functional.linear(self.input_fp, self.weight_fp, self.bias)
+        result_fp = torch.nn.functional.linear(self.input_fp, weight_q8, self.bias)
+        result_q8 = torch.nn.functional.linear(input_q8, weight_q8, self.bias)
+
+        self.assertEqual(result_fp.shape, reference.shape)
+        self.assertEqual(result_q8.shape, reference.shape)
+        self.assertTrue(compute_error(result_fp, reference) > 10)
+        self.assertTrue(compute_error(result_q8, reference) > 10)
+
+    def test_dynamic_quantization(self):


I think you can remove these 2 tests actually, since they are already tested in test_int8_linear_variants

jerryzh168 · 2025-10-06T17:37:37Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+        self.assertEqual(weight1.qdata, dummy.weight.qdata.narrow(0, 0, 64))
+        self.assertEqual(weight2.qdata, dummy.weight.qdata.narrow(1, 0, 128))


nit: add assert for scale as well?

jerryzh168 · 2025-10-06T17:38:16Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+        self.assertEqual(weight1.qdata, dummy.weight.qdata.narrow(0, 0, 64))
+        self.assertEqual(weight2.qdata, dummy.weight.qdata.narrow(1, 0, 128))
+
+    def test_transpose(self):


is this used anywhere? for most of the tensors we actually don't support transpose so far, we tend to add this only when need

jerryzh168 · 2025-10-06T17:38:34Z

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

+        weight_q8 = Int8Tensor.from_hp(self.weight_fp, self.block_size)
+        selected = weight_q8.select(0, 0)
+
+        self.assertEqual(selected.shape, (3,))


test the data as well?

you can follow this:

ao/test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Line 449 in 5346f0e

def test_index_select(self):

jerryzh168 · 2025-10-06T17:40:44Z

torchao/quantization/quant_api.py

+        )
+    else:
+        assert config.version == 2, f"Unexpected version: {config.version}"
+        block_size = [weight.shape[0], weight.shape[1]]


this should be the same as L1393 I think, you can extract L1390-L1393 out of the first if branch and use that I think

jerryzh168 · 2025-10-06T17:41:43Z

torchao/quantization/quant_api.py

+    else:
+        quantized_weight = Int8Tensor.from_hp(
+            weight,
+            block_size=get_weight_block_size(weight),


nit: can calculate block_size outside of the if/else

jerryzh168 · 2025-10-06T17:42:11Z

torchao/quantization/quantize_/common/quantize_tensor_kwargs.py

+    elif isinstance(quant_kwargs, QuantizeTensorToInt8Kwargs):
+        return Int8Tensor.from_hp(
+            tensor,
+            quant_kwargs.block_size or [1, tensor.shape[-1]],


nit: why not make block_size mandatory?

jerryzh168 · 2025-10-06T17:42:27Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        block_size (Optional[list[int]]): block size for quantization granularity
+    """
+
+    block_size: Optional[list[int]] = None


why is this optional?

jerryzh168 · 2025-10-06T17:42:55Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        "dtype",
+    ]
+
+    def __new__(


nit: please annotate the args with types to be clearer

jerryzh168 · 2025-10-06T17:43:02Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        }
+        return torch.Tensor._make_wrapper_subclass(cls, shape, **kwargs)
+
+    def __init__(


jerryzh168 · 2025-10-06T17:43:27Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        self.qdata = qdata
+        self.scale = scale
+        self.block_size = block_size
+        self._shape = shape


nit: we don't need to set shape here, since it will be set in torch.Tensor._make_wrapper_subclass

jerryzh168 · 2025-10-06T17:44:25Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+        # Reshape 1D scale to [N, 1] for broadcasting with [N, K] qdata
+        if scale.ndim == 1:
+            scale = scale.unsqueeze(1)


is this needed?

jerryzh168 · 2025-10-06T19:59:26Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+    )
+
+
+@implements(aten.transpose.int)


we don't need this yet I think, we can remove for now and add later when needed

jerryzh168 · 2025-10-06T20:00:14Z

torchao/quantization/quantize_/workflows/int8/int8_tensor.py

+    if dim == 0 and tensor.scale.ndim >= 1:
+        sliced_scale = aten.slice.Tensor(tensor.scale, 0, start, end, step)
+
+    sliced_shape = list(


why not get the shape from sliced tensor directly?

can you check

ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py

Line 419 in c96f2dd

@implements(aten.slice.Tensor)

? I'm not sure if the current implementation is enough to cover all cases actually

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 21, 2025

jerryzh168 reviewed Sep 22, 2025

View reviewed changes

torchao/quantization/quantize_/workflows/int8/int8_tensor.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Sep 22, 2025

View reviewed changes

namgyu-youn added 2 commits September 23, 2025 02:45

rename for clearly: Int8PlainInt8Tensor -> Int8Tensor

db23cf3

add flags for static/dynamic quant

b861dbc

namgyu-youn changed the title ~~Add Int8PlainInt8Tensor for clearer interface~~ Add Int8Tensor for clearer interface Sep 23, 2025

namgyu-youn requested a review from jerryzh168 September 23, 2025 15:12

jerryzh168 reviewed Sep 23, 2025

View reviewed changes

namgyu-youn added 4 commits September 25, 2025 01:33

update static/dynamic quantization workflows

9383550

add kernel preference unit test

2c84ba4

add kernel preference unit test

8ddddd3

Merge remote-tracking branch 'upstream/main' into int8-quant

bd6f58a

namgyu-youn requested a review from jerryzh168 September 24, 2025 17:26

fix missing attribute

b5cb3c8

jerryzh168 mentioned this pull request Sep 25, 2025

[WIP]Adds _weight_int8pack_mm pass for woq-int8 #3061

Open

jerryzh168 reviewed Sep 25, 2025

View reviewed changes

remove kernel preference args

9a51cae

namgyu-youn marked this pull request as draft September 28, 2025 13:23

namgyu-youn added 2 commits September 28, 2025 23:48

link new API with old API using version 2

c53dad0

add granularity, block size support

d300b02

namgyu-youn marked this pull request as ready for review September 30, 2025 06:09

namgyu-youn requested a review from jerryzh168 September 30, 2025 06:09

namgyu-youn mentioned this pull request Oct 3, 2025

make smoothquant more PT2 friendly #1639

Open

jerryzh168 reviewed Oct 4, 2025

View reviewed changes

namgyu-youn added 2 commits October 4, 2025 15:44

Merge branch 'main' into int8-quant

c43a3ec

add transpose, index selector workflows

590e0b7

namgyu-youn requested a review from jerryzh168 October 4, 2025 11:08

remove external zero point

b3d4f3e

jerryzh168 reviewed Oct 6, 2025

View reviewed changes

	if act_quant_kwargs is not None:
	input_tensor = _choose_quant_func_and_quantize_tensor(
	input_tensor, act_quant_kwargs
	)

	def _linear_int8_act_int8_weight_check(input_tensor, weight_tensor, bias):
	return (
	isinstance(input_tensor, AffineQuantizedTensor)
	and _aqt_is_int8_reduced_range(input_tensor)
	and isinstance(weight_tensor, AffineQuantizedTensor)
	and _aqt_is_int8(weight_tensor)
	and input_tensor.dtype == weight_tensor.dtype
	and isinstance(input_tensor._layout, PlainLayout)
	and isinstance(weight_tensor._layout, PlainLayout)
	)


	def _linear_int8_act_int8_weight_impl(input_tensor, weight_tensor, bias):
	#
	# 1. do the matrix form of dot(X_i, W_j)
	#
	#
	# 2. rescale the output
	#
	# in cases with large matrices, y_dot_int32 can grow sufficiently
	# large that y_dot_int32 * a float16 scale is greater than the maximum
	# value of a float 16, (which results in a value of inf even if multiplying
	# by the other scale would bring it within the expected range)

	x_vals_int8 = input_tensor.tensor_impl.int_data
	x_scales = input_tensor.tensor_impl.scale
	w_vals_int8_t = weight_tensor.tensor_impl.int_data.contiguous().t()
	w_scales = weight_tensor.tensor_impl.scale
	tmp = x_vals_int8.reshape(-1, x_vals_int8.shape[-1])
	x_scales_dtype = x_scales.dtype
	# Cast fp16 scale to float to avoid overflow in int_scaled_matmul
	intermediate_dtype = torch.float if x_scales_dtype == torch.half else x_scales_dtype
	y_dot_scaled = int_scaled_matmul(
	tmp, w_vals_int8_t, x_scales.reshape(-1, 1).to(intermediate_dtype)
	)
	y_dot_scaled = y_dot_scaled.to(x_scales_dtype)

	y = (y_dot_scaled * w_scales).reshape(
	*x_vals_int8.shape[:-1], y_dot_scaled.shape[-1]
	)

	# can downcast only at the very end
	output_dtype = input_tensor.dtype
	y = y.to(output_dtype)
	if bias is not None:
	y += bias
	return y



		@unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available")
		class TestInt8Tensor(TorchAOIntegrationTestCase):

	def test_slice(self, granularity):
	config = Float8DynamicActivationFloat8WeightConfig(granularity=granularity)
	dtype = torch.bfloat16
	device = "cuda"
	dummy = torch.nn.Linear(256, 256, bias=False, dtype=dtype, device=device)
	dummy1 = torch.nn.Linear(256, 64, bias=False, dtype=dtype, device=device)
	dummy1.weight = torch.nn.Parameter(
	dummy.weight.narrow(0, 0, 64), requires_grad=False
	)
	dummy2 = torch.nn.Linear(128, 256, dtype=dtype, device=device)
	dummy2.weight = torch.nn.Parameter(
	dummy.weight.narrow(1, 0, 128), requires_grad=False
	)

	quantize_(dummy, config)
	weight1 = dummy.weight.clone().narrow(0, 0, 64)
	weight2 = dummy.weight.clone().narrow(1, 0, 128)
	self.assertEqual(
	weight1.qdata,
	dummy.weight.qdata.narrow(0, 0, 64),
	)
	self.assertEqual(
	weight2.qdata,
	dummy.weight.qdata.narrow(1, 0, 128),
	)
	if isinstance(granularity, PerRow):
	self.assertEqual(
	weight1.scale,
	dummy.weight.scale.narrow(0, 0, 64),
	)
	self.assertEqual(
	weight2.scale,
	dummy.weight.scale,
	)
	else:
	self.assertEqual(
	weight1.scale,
	dummy.weight.scale,
	)
	self.assertEqual(
	weight2.scale,
	dummy.weight.scale,
	)

	# check for sliced weight, before and after float8 quantization
	# does not differ too much
	input = torch.randn(2, 256, dtype=dtype, device=device)
	res_ref = dummy1(input)
	dummy.weight = torch.nn.Parameter(weight1.contiguous(), requires_grad=False)
	res = dummy(input)
	sqnr = compute_error(res, res_ref)
	self.assertTrue(sqnr > 25, f"sqnr: {sqnr}")

	input = torch.randn(2, 128, dtype=dtype, device=device)
	res_ref = dummy2(input)
	dummy.weight = torch.nn.Parameter(weight2.contiguous(), requires_grad=False)
	res = dummy(input)
	sqnr = compute_error(res, res_ref)
	self.assertTrue(sqnr > 15, f"sqnr: {sqnr}")

	scale, zero_point = choose_qparams_affine(
	input=preprocessed_w,
	mapping_type=MappingType.SYMMETRIC,
	block_size=block_size,
	target_dtype=target_dtype,
	quant_min=quant_min,
	quant_max=quant_max,
	eps=1e-6,
	)

	wq = quantize_affine(
	input=preprocessed_w,
	block_size=block_size,
	scale=scale,
	zero_point=zero_point,
	output_dtype=target_dtype,
	quant_min=quant_min,
	quant_max=quant_max,
	)

		self.assertEqual(weight1.qdata, dummy.weight.qdata.narrow(0, 0, 64))
		self.assertEqual(weight2.qdata, dummy.weight.qdata.narrow(1, 0, 128))

		)


		@implements(aten.transpose.int)

Add Int8Tensor for clearer interface #3038

Are you sure you want to change the base?

Add Int8Tensor for clearer interface #3038

Conversation

namgyu-youn commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3038

Uh oh!

Uh oh!

jerryzh168 commented Sep 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

namgyu-youn commented Sep 21, 2025 •

edited

Loading

pytorch-bot bot commented Sep 21, 2025 •

edited

Loading

namgyu-youn Sep 24, 2025 •

edited

Loading

namgyu-youn Sep 29, 2025 •

edited

Loading