Skip to content

Conversation

@matthewdouglas
Copy link
Member

@matthewdouglas matthewdouglas commented Aug 1, 2025

What does this PR do?

This PR adds a new option to BitsAndBytesConfig called target_parameters with the same spirit as target_parameters in huggingface/peft#2638. The intent is to allow quantization of nn.Parameter that are not within a nn.Linear, e.g. those found commonly in certain MoE model implementations.

Requires bitsandbytes-foundation/bitsandbytes#1720 which is released in bitsandbytes v0.48.0.

Example usage with a Granite MoE:

model = GraniteMoeForCausalLM.from_pretrained(
    "ibm-granite/granite-3.1-3b-a800m-base",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=False,
        target_parameters=["block_sparse_moe.input_linear.weight", "block_sparse_moe.output_linear.weight"],
        llm_int8_skip_modules=["lm_head", "block_sparse_moe.router"]
    ),
)

Memory Usage - BF16

Metric Cur Usage Peak Usage Tot Alloc Tot Freed
Allocated memory 6291 MiB 6292 MiB 12583 MiB 6292 MiB
Active memory 6291 MiB 6292 MiB 12583 MiB 6292 MiB
Requested memory 6291 MiB 6291 MiB 12583 MiB 6291 MiB

Memory Usage - Before PR

Metric Cur Usage Peak Usage Tot Alloc Tot Freed
Allocated memory 6019 MiB 6027 MiB 9935 MiB 3916 MiB
Active memory 6019 MiB 6027 MiB 9935 MiB 3916 MiB
Requested memory 6015 MiB 6024 MiB 9929 MiB 3913 MiB

Memory Usage - After PR

Metric Cur Usage Peak Usage Tot Alloc Tot Freed
Allocated memory 1894 MiB 2054 MiB 9424 MiB 7530 MiB
Active memory 1894 MiB 2054 MiB 9424 MiB 7530 MiB
Requested memory 1875 MiB 2035 MiB 9389 MiB 7513 MiB

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ x ] Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
    (See Slack discussion)
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@SunMarc @MekkCyber @BenjaminBossan

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1
Copy link
Member

cc @MekkCyber

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice it would be great to add some tests (inference / saving) with gptoss model !

@matthewdouglas matthewdouglas force-pushed the bnb-parametrize-4bit branch 2 times, most recently from 78d55b3 to 61fdac5 Compare August 14, 2025 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants