[High Risk]Support for immediate saving #965

Kaihui-intel · 2025-10-30T08:57:03Z

Accuracy

scheme /(opt-125m,)	format	RTN	iter>0
W4A16	auto_round	0.2882	0.3526
W2A16	auto_round		0.1657
W3A16	auto_round		0.3247
W8A16	auto_round		0.3784
bit s group_size 32	auto_round	0.3749	0.3679
bit s group_size 32	auto_gptq	0.3747	0.3658
bit s group_size 32	auto_awq	0.3749	0.3646

#788

Memory

memory check
Qwen2.5-7B-Instruct-w4g32 RTN auto_round
mprof peak
16659.441MiB->9200.250MiB ~55%

Time

quantization and saving time
opt-125m

branch	RTN	iter>0
cur branch	67s	78s
main branch	50s	88s

Qwen2.5-7B-Instruct

branch	RTN	iter>0
cur branch	4min4s	18min10s
main branch	3min54s	17min53s

immediate pacing and saving now only support formats[0]

Signed-off-by: Kaihui-intel <[email protected]>

for more information, see https://pre-commit.ci

wenhuach21 · 2025-10-30T09:00:00Z

Thanks for the great work! Could you check the maximum RAM usage to see whether it has been reduced significantly, as expected?

auto_round/compressors/base.py

Signed-off-by: Kaihui-intel <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Kaihui-intel <[email protected]>

… into kaihui/save_block

auto_round/compressors/base.py

auto_round/utils/common.py

auto_round/compressors/utils.py

Signed-off-by: Kaihui-intel <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Kaihui-intel <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Kaihui-intel <[email protected]>

wenhuach21 · 2025-10-31T15:25:59Z

auto_round/compressors/base.py

-        self.is_packing_immediate = False  # whether to pack the layer immediately after tuning
+
+        # Whether to pack the layer immediately after tuning
+        self.is_packing_immediate = kwargs.pop("is_packing_immediate", False)


Packing immediate is set automatically before. Have you handled this when exporting >1 formats? So it’s better not to set it in the API. Besides, as discussed, set save_immediate to True.
Another thing to verify is the time cost of save_immediate, have you measured the total quantization time comparing to main branch ?

Signed-off-by: Kaihui-intel <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Kaihui-intel <[email protected]>

wenhuach21 · 2025-11-05T09:18:08Z

auto_round/compressors/base.py

-        self.is_packing_immediate = False  # whether to pack the layer immediately after tuning
+
+        # Whether to pack the layer immediately after tuning
+        self.immediate_packing = kwargs.pop("immediate_packing", False)


no need to expose this arg, automatically setting this is a better way

wenhuach21 · 2025-11-05T09:18:51Z

auto_round/compressors/base.py

            q_layer_input = to_device(q_layer_input, self.cache_device)
            quant_layer(layer_name, layer_input, q_layer_input, device=self.device)
+            if self.immediate_packing:
+                from auto_round.export import PACKING_LAYER_WITH_FORMAT


wrapper it as a function

wenhuach21 · 2025-11-05T09:20:11Z

auto_round/compressors/base.py

-        self.is_packing_immediate = False  # whether to pack the layer immediately after tuning
+
+        # Whether to pack the layer immediately after tuning
+        self.immediate_packing = kwargs.pop("immediate_packing", False)


where is the code to set it to False for fake format or multiple formats

auto-round/auto_round/compressors/base.py

Lines 1546 to 1562 in 5375be6

if not hasattr(self, "formats"):

logger.warning("this API is deprecated, please use `quantize_and_save` instead")

else:

# Determine if immediate packing is required

formats = self.formats

if (

len(formats) == 1

and (

"awq" in formats[0]

or "gptq" in formats[0]

or "auto_round" in formats[0]

or "gguf" in formats[0]

or "llm_compressor" in formats[0]

)

and self.inplace

):

self.immediate_packing = True

Signed-off-by: Kaihui-intel <[email protected]>

for more information, see https://pre-commit.ci

Kaihui-intel and others added 8 commits October 16, 2025 01:41

save per block

9da69a8

Signed-off-by: Kaihui-intel <[email protected]>

enable multi block save

dc24682

Signed-off-by: Kaihui-intel <[email protected]>

support export save

856ab06

Signed-off-by: Kaihui-intel <[email protected]>

update rtn support

97cbfdb

Signed-off-by: Kaihui-intel <[email protected]>

support

fb026aa

Signed-off-by: Kaihui-intel <[email protected]>

rebase main

bb6113f

Signed-off-by: Kaihui-intel <[email protected]>

del utils.py

148f085

Signed-off-by: Kaihui-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

afb6eef

for more information, see https://pre-commit.ci

wenhuach21 requested review from n1ck-guo, xin3he and yiliu30 October 30, 2025 09:00

wenhuach21 reviewed Oct 30, 2025

View reviewed changes

auto_round/compressors/base.py Outdated Show resolved Hide resolved

fix args

b2c14df

Signed-off-by: Kaihui-intel <[email protected]>

xin3he modified the milestones: 1.0, 0.9.0 Oct 30, 2025

Kaihui-intel and others added 8 commits October 31, 2025 01:07

optimize memory and fix rtn module

3829778

Signed-off-by: Kaihui-intel <[email protected]>

revert max_shard_size

8f38839

Signed-off-by: Kaihui-intel <[email protected]>

move save_block_immediate into utils

70d873e

Signed-off-by: Kaihui-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f4ace2c

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

c314231

for more information, see https://pre-commit.ci

fix import

f64cfcf

Signed-off-by: Kaihui-intel <[email protected]>

update max_shard_size

e471148

Signed-off-by: Kaihui-intel <[email protected]>

Merge branch 'kaihui/save_block' of https://github.com/intel/auto-round…

b19db96

… into kaihui/save_block

Kaihui-intel requested a review from wenhuach21 October 31, 2025 07:56

wenhuach21 changed the title ~~Support for immediate saving~~ [High Risk]Support for immediate saving Oct 31, 2025

wenhuach21 requested review from WeiweiZhang1 and hshen14 October 31, 2025 08:14

wenhuach21 reviewed Oct 31, 2025

View reviewed changes

auto_round/compressors/base.py Show resolved Hide resolved

wenhuach21 reviewed Oct 31, 2025

View reviewed changes

auto_round/utils/common.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Oct 31, 2025

View reviewed changes

auto_round/compressors/utils.py Show resolved Hide resolved

wenhuach21 approved these changes Oct 31, 2025

View reviewed changes

Kaihui-intel and others added 6 commits October 31, 2025 07:33

remove is_meta_model

02f891d

Signed-off-by: Kaihui-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f6671ed

for more information, see https://pre-commit.ci

merge main

c1da5e5

Signed-off-by: Kaihui-intel <[email protected]>

add uts

39b5470

Signed-off-by: Kaihui-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

579a7f8

for more information, see https://pre-commit.ci

fix gpu ut model_name

f1793e0

Signed-off-by: Kaihui-intel <[email protected]>

wenhuach21 requested changes Oct 31, 2025

View reviewed changes

Kaihui-intel and others added 6 commits November 4, 2025 02:54

set immediate packing saving to True

5aa3f1e

Signed-off-by: Kaihui-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

67e2591

for more information, see https://pre-commit.ci

Merge branch 'main' into kaihui/save_block

ca55a90

flow saving setting

7a21c5a

Signed-off-by: Kaihui-intel <[email protected]>

check gguf

e5fd4f2

Signed-off-by: Kaihui-intel <[email protected]>

pack layer_names immediately

5367af6

Signed-off-by: Kaihui-intel <[email protected]>

wenhuach21 reviewed Nov 5, 2025

View reviewed changes

Kaihui-intel and others added 3 commits November 5, 2025 08:30

wrapper _immediate_pack & rm uts & pop lm_head & rm expose args

5065336

Signed-off-by: Kaihui-intel <[email protected]>

revert ut

ceb2a92

Signed-off-by: Kaihui-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5375be6

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[High Risk]Support for immediate saving #965

[High Risk]Support for immediate saving #965

Uh oh!

Kaihui-intel commented Oct 30, 2025 •

edited

Loading

Uh oh!

wenhuach21 commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Oct 31, 2025 •

edited

Loading

Uh oh!

wenhuach21 Nov 5, 2025

Uh oh!

wenhuach21 Nov 5, 2025

Uh oh!

wenhuach21 Nov 5, 2025

Uh oh!

Kaihui-intel Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if not hasattr(self, "formats"):
	logger.warning("this API is deprecated, please use `quantize_and_save` instead")
	else:
	# Determine if immediate packing is required
	formats = self.formats
	if (
	len(formats) == 1
	and (
	"awq" in formats[0]
	or "gptq" in formats[0]
	or "auto_round" in formats[0]
	or "gguf" in formats[0]
	or "llm_compressor" in formats[0]
	)
	and self.inplace
	):
	self.immediate_packing = True

[High Risk]Support for immediate saving #965

Are you sure you want to change the base?

[High Risk]Support for immediate saving #965

Uh oh!

Conversation

Kaihui-intel commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Accuracy

Memory

Time

Uh oh!

wenhuach21 commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Kaihui-intel Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kaihui-intel commented Oct 30, 2025 •

edited

Loading

wenhuach21 Oct 31, 2025 •

edited

Loading