4bit quantization for arbitrary `nn.Parameter` #1720

matthewdouglas · 2025-08-01T20:49:48Z

This PR is in the same spirit as the recently introduced feature in huggingface/peft#2638.

Several models exist in the Hugging Face ecosystem where there are MoE layers that use nn.Parameter and are not compatible with the default quantization approach of replacing nn.Linear. Such example models include, but are not limited to:

A new utility, bitsandbytes.nn.parametrize.replace_parameter_4bit() is introduced. This will quantize and replace an nn.Parameter with a parametrization layer which automatically dequantizes the parameter when it is accessed

Additional work will be done on the HF Transformers side to enable integration with options in BitsAndBytesConfig.
.

github-actions · 2025-08-01T20:53:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

winglian · 2025-08-06T16:21:37Z

Will there need to be any changes in PEFT to apply lora adapters to quantized parameters once this lands?

BenjaminBossan · 2025-08-06T16:33:45Z

We'll have to test, but at the very least, huggingface/peft#2710 needs to be merged in PEFT for this to work properly.

cmp-nct · 2025-08-17T21:27:15Z

We'll have to test, but at the very least, huggingface/peft#2710 needs to be merged in PEFT for this to work properly.

it's merged

…dules

matthewdouglas added this to the v0.47.0 milestone Aug 1, 2025

matthewdouglas added the Enhancement New feature or request label Aug 1, 2025

matthewdouglas mentioned this pull request Aug 1, 2025

WIP: Initial support for bnb 4bit on any nn.Parameter huggingface/transformers#39859

Draft

4 tasks

matthewdouglas modified the milestones: v0.47.0, v0.48.0 Aug 14, 2025

matthewdouglas added 4 commits September 3, 2025 13:07

Add parametrize util for targeting parameters outside of nn.Linear mo…

4db78c9

…dules

Parametrize 4bit: replace existing prequantized weight

0ca40cc

cleanup

2ad1e62

Add caching for parametrization

50fe09d

matthewdouglas force-pushed the parametrize-4bit branch from ab478e8 to 50fe09d Compare September 3, 2025 17:08

matthewdouglas added 3 commits September 3, 2025 14:56

Add tests

60725f2

Fix tests

0d5cda7

Guard for torch < 2.5

794710d

matthewdouglas marked this pull request as ready for review September 3, 2025 20:28

Guard for torch < 2.5

62267d7

matthewdouglas mentioned this pull request Sep 5, 2025

Improve kDequantizeBlockwise kernel performance for NF4/FP4 #1747

Closed

Another test gaurd for torch >= 2.5

7fa6973

matthewdouglas merged commit 27549fb into main Sep 8, 2025
116 of 129 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

4bit quantization for arbitrary `nn.Parameter` #1720

4bit quantization for arbitrary `nn.Parameter` #1720

Uh oh!

matthewdouglas commented Aug 1, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

winglian commented Aug 6, 2025

Uh oh!

BenjaminBossan commented Aug 6, 2025

Uh oh!

cmp-nct commented Aug 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

4bit quantization for arbitrary nn.Parameter #1720

4bit quantization for arbitrary nn.Parameter #1720

Uh oh!

Conversation

matthewdouglas commented Aug 1, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

winglian commented Aug 6, 2025

Uh oh!

BenjaminBossan commented Aug 6, 2025

Uh oh!

cmp-nct commented Aug 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

4bit quantization for arbitrary `nn.Parameter` #1720

4bit quantization for arbitrary `nn.Parameter` #1720