Skip to content

Commit 3464d83

Browse files
committed
better
1 parent 27666a8 commit 3464d83

File tree

2 files changed

+21
-12
lines changed

2 files changed

+21
-12
lines changed

src/diffusers/quantizers/bitsandbytes/utils.py

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,16 @@
1+
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
114
"""
215
Adapted from
316
https://github.com/huggingface/transformers/blob/c409cd81777fb27aadc043ed3d8339dbc020fb3b/src/transformers/integrations/bitsandbytes.py
@@ -216,18 +229,13 @@ def _replace_with_bnb_linear(
216229

217230
def replace_with_bnb_linear(model, modules_to_not_convert=None, current_key_name=None, quantization_config=None):
218231
"""
219-
A helper function to replace all `torch.nn.Linear` modules by `bnb.nn.Linear8bit` modules from the `bitsandbytes`
220-
library. This will enable running your models using mixed int8 precision as described by the paper `LLM.int8():
221-
8-bit Matrix Multiplication for Transformers at Scale`. Make sure `bitsandbytes` compiled with the correct CUDA
222-
version of your hardware is installed before running this function. `pip install -i https://test.pypi.org/simple/
223-
bitsandbytes`.
224-
225-
The function will be run recursively and replace all `torch.nn.Linear` modules except for `modules_to_not_convert`
226-
that should be kept as a `torch.nn.Linear` module. The replacement is done under `init_empty_weights` context
227-
manager so no CPU/GPU memory is required to run this function. Int8 mixed-precision matrix decomposition works by
228-
separating a matrix multiplication into two streams: (1) and systematic feature outlier stream matrix multiplied in
229-
fp16 (0.01%), (2) a regular stream of int8 matrix multiplication (99.9%). With this method, int8 inference with no
230-
predictive degradation is possible for very large models (>=176B parameters).
232+
Helper function to replace the `nn.Linear` layers within `model` with either `bnb.nn.Linear8bit` or
233+
`bnb.nn.Linear4bit` using the `bitsandbytes` library.
234+
235+
References:
236+
* `bnb.nn.Linear8bit`: [LLM.int8(): 8-bit Matrix Multiplication for Transformers at
237+
Scale](https://arxiv.org/abs/2208.07339)
238+
* `bnb.nn.Linear4bit`: [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
231239
232240
Parameters:
233241
model (`torch.nn.Module`):

src/diffusers/utils/loading_utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ def load_video(
137137
return pil_images
138138

139139

140+
# Taken from `transformers`.
140141
def get_module_from_name(module, tensor_name: str) -> Tuple[Any, str]:
141142
if "." in tensor_name:
142143
splits = tensor_name.split(".")

0 commit comments

Comments
 (0)