Remove low_cpu_mem_usage and _fast_init #36963

Cyrilvallez · 2025-03-25T13:58:13Z

What does this PR do?

This PR removes the now useless _fast_init and low_cpu_mem_usage in from_pretrained, in order to simplify even more and limit the number of code paths, in the end making it much easier to maintain/debug. These 2 parameters should always be True anyway for optimized model loading.

Because a LOT of models have bad _init_weights() methods (i.e. it does not init ALL parameters), it might be an issue if loading corrupted state dict (i.e. loading a state dict with missing weight, and one of the missing weight not being handled by _init_weights properly). However, this should not be an issue in general as we don't expect to have too many corrupted state dicts on the hub. Moreover, this bug is ALREADY PRESENT whenever loading such a model with a device_map, or low_cpu_mem_usage=True (or whatever option ending in activating low_cpu_mem_usage=True). This is because doing so will force to load the parameters on meta, so weights initialized in the __init__ of a Layer or similar (which assumes instantiating the model on cpu) will result in wrong weight init when moving back to cpu.

Nevertheless, it can be hard to debug, and should not be the case, so this PR already fixes some model's _init_weights. Jointly, #37070 adds a test to always detect if a model's _init_weights is missing a few parameters, and I will fix more models directly in it (it relies on the fact that _fast_init and low_cpu_mem_usage are already gone).
Fun fact: even our faithful Llama has a bad _init_weights!! (missing the RMSNorm) 🤯

Most of the files changed are simply removing old _fast_init tests (which were skipped anyway 🙃🙃), as well as fixing weight initialization for a few models that were blocking general CI tests.

github-actions · 2025-03-25T13:58:26Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers.

HuggingFaceDocBuilderDev · 2025-03-25T14:59:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sssshhhhhh · 2025-04-14T08:42:01Z

Hi this is causing weights to all be meta tensors when from_flax=True for at least whisper and bert. This was already broken with low_cpu_mem_usage before so not unexpected I guess.

from transformers import BertModel
model = BertModel.from_pretrained("google-bert/bert-base-uncased", from_flax=True)  # also 'openai/whisper-tiny'
assert model.state_dict()['embeddings.word_embeddings.weight'].is_meta

I'm only using this to convert from jax so not a big deal to use an older version. Sorry if you're already aware of this issue.

Cyrilvallez · 2025-04-14T16:44:17Z

Hey @sssshhhhhh! We were actually not aware of it, it got through the radars as from_flax/from_tf are extremely rarely used! So rarely that apparently nobody ever reported that it was broken with a device_map (which implicitly used to activate low_cpu_mem_usage) 😳 I must say I'm quite surprised by this as this was in the codebase for quite a long time.

However, we will very soon start to deprecate flax and tf. As loading from_flax/from_tf uses the model architecture in the underlying library, it means that we will also stop supporting the from_flax/from_tf flags in from_pretrained. As a result, I don't think loading with these flags will be fixed in the current library state (main).

As a result, I do think the easiest is indeed to use older version to convert to pytorch if needed, then resave them. Would that be an acceptable way to proceed for your use-case? Or would that provide too much friction/disconfort?

farzadab · 2025-04-14T22:52:07Z

@Cyrilvallez I spent a lot of time trying to figure this out but I'm still left with no good solution.

Changing the behaviour of _init_weights as you suggested, cannot solve this issue because it assumes each inner module can be initialized separately (since it's called with model.apply), but that's not what I want. I want to be able to load sub-models (e.g. language_model and audio_tower) from checkpoints (e.g. HF Hub).

The best I can think of is to overwrite _load_pretrained_model, then somehow find the checkpoints for the sub-models and call language_model._load_pretrained_model on them.
What makes this extremely hard is that replicating the same behaviour for finding the checkpoints as .from_pretrained is very hard since from_pretrained is not very modular (1300 lines).

Cyrilvallez · 2025-04-15T09:51:52Z

Hey! Yes, repos are expected to contain all their weights, so things would be much simpler if you added all weights to your repo directly (i.e. the weights of the submodels). However, if you don't want to do that, I believe _init_weights can still be used with something along the lines of:

def _init_weights(self, module):
        
    if module is self.language_model:
        self.language_model = module.from_pretrained(...)
    elif module in self.language_model.modules():
        pass
    ....

but passing specific args (i.e. same args as the outer call) to that inner from_pretrained will require a bit more hack

from_pretrained should be much less lines now as well, we simplified a lot 🤗 Are you sure you're looking at main?

Hope this solves your issue 🤗

* Remove low_cpu_mem_usage and _fast_init * Update deepspeed.py * Update modeling_utils.py * remove the first 2 tests everywhere * Update test_modeling_common.py * remove what was remaining about fast_init * fix logic and simplify * mismatched keys logic update * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * fix 2 models init_weights * extend to others * remove grad * Update modeling_fsmt.py * init weights in tests * style * Update test_modeling_fsmt.py * more old models * fix more init_weights * copies * fix * style * Update modeling_lxmert.py * fix inits * more and more * more * should finalize * style * Update modeling_dinov2_with_registers.py * fix * Update modeling_encoder_decoder.py * fix * style * Update modeling_lxmert.py * post rebase cleanup * Update modeling_informer.py * back to start for device * fix * add test to detect all failing cases correctly * Update test_modeling_common.py * fix * fix * sam * style * Update modeling_maskformer_swin.py * CIs * CIs * remove test - will add it on separate PR * fix * fix * Update modeling_sam.py * CIs * CIs * CIs * convnext * suggestions * CIs * fix copies after merge --------- Co-authored-by: Yih-Dar <[email protected]>

github-actions bot marked this pull request as draft March 25, 2025 13:58

Cyrilvallez marked this pull request as ready for review March 25, 2025 13:59

github-actions bot requested review from ArthurZucker and Rocketknight1 March 25, 2025 13:59

Cyrilvallez force-pushed the remove-low-mem branch 5 times, most recently from dfa2228 to 3eefc2c Compare March 27, 2025 14:55

Cyrilvallez mentioned this pull request Mar 28, 2025

Detect and fix most _init_weights() issues - make it work for composite models #37070

Merged

Cyrilvallez force-pushed the remove-low-mem branch from 1551862 to 7658fc2 Compare March 28, 2025 15:56

Cyrilvallez added 17 commits March 28, 2025 18:02

Remove low_cpu_mem_usage and _fast_init

1e82f7e

Update deepspeed.py

cd19480

Update modeling_utils.py

148abd7

remove the first 2 tests everywhere

076155b

Update test_modeling_common.py

5800645

remove what was remaining about fast_init

b9501d7

fix logic and simplify

6813fcc

mismatched keys logic update

ee83ab3

Update modeling_utils.py

209d0d2

Update modeling_utils.py

28d8185

Update modeling_utils.py

f7c7490

Update modeling_utils.py

139d2a4

fix 2 models init_weights

5a30f74

extend to others

6878e1e

remove grad

d0918bb

Update modeling_fsmt.py

5f60e23

init weights in tests

f55fccb

SunMarc mentioned this pull request Apr 22, 2025

System kills the processes of llama2-70B fsdp finetune when loading the model #37664

Closed

4 tasks

This was referenced May 2, 2025

Speech2TextForConditionalGeneration broken in transformers 4.51.x #37874

Closed

[speech2text] fix init of sinusoidal embeddings #37931

Merged

Cyrilvallez mentioned this pull request Jun 12, 2025

Remove all traces of low_cpu_mem_usage #38792

Merged

socket-security bot mentioned this pull request Jul 1, 2025

Bump transformers from 4.52.4 to 4.53.0 alphasecio/prompt-guard#36

Closed

kleinhenz mentioned this pull request Jul 1, 2025

ckpt prescient-design/lobster#130

Merged

This was referenced Jul 17, 2025

[Snyk] Fix for 2 vulnerabilities kingjay66/unilmf#259

Open

[Snyk] Security upgrade transformers from 4.5.1 to 4.52.0 kingjay66/unilmf#260

Open

socket-security bot mentioned this pull request Aug 1, 2025

Bump transformers from 4.53.2 to 4.54.1 alphasecio/prompt-guard#39

Merged

socket-security bot mentioned this pull request Aug 12, 2025

[Snyk] Security upgrade transformers from 4.5.1 to 4.53.0 kingjay66/unilmf#271

Open

bluestealth mentioned this pull request Aug 21, 2025

AutoModel failed with empty tensor error #40357

Closed

4 tasks

socket-security bot mentioned this pull request Sep 1, 2025

Bump transformers from 4.55.0 to 4.56.0 alphasecio/prompt-guard#43

Closed

This was referenced Sep 25, 2025

[Snyk] Security upgrade transformers from 4.30.2 to 4.53.0 kingjay66/unilmf#278

Open

[Snyk] Security upgrade transformers from 2.10.0 to 4.53.0 kingjay66/unilmf#279

Open

[Snyk] Security upgrade transformers from 4.5.1 to 4.53.0 kingjay66/unilmf#281

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove low_cpu_mem_usage and _fast_init #36963

Remove low_cpu_mem_usage and _fast_init #36963

Uh oh!

Cyrilvallez commented Mar 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2025

Uh oh!

sssshhhhhh commented Apr 14, 2025

Uh oh!

Cyrilvallez commented Apr 14, 2025

Uh oh!

farzadab commented Apr 14, 2025 •

edited

Loading

Uh oh!

Cyrilvallez commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Remove low_cpu_mem_usage and _fast_init #36963

Remove low_cpu_mem_usage and _fast_init #36963

Uh oh!

Conversation

Cyrilvallez commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2025

Uh oh!

sssshhhhhh commented Apr 14, 2025

Uh oh!

Cyrilvallez commented Apr 14, 2025

Uh oh!

farzadab commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Cyrilvallez commented Mar 25, 2025 •

edited

Loading

farzadab commented Apr 14, 2025 •

edited

Loading