Clean load keys #24505

sgugger · 2023-06-26T20:52:52Z

What does this PR do?

This PR finishes the work done in and completely cleans up the _keys_to_ignore_on_save, _keys_to_ignore_on_load_missing and _keys_to_ignore_on_load_unexpected. Those were used in three situations:

Not saving the tied weights. This came from the (wrong) assumption that torch would take twice the space for tied weights (which it doesn't) and also created bugs where non-tied weights were not saved (unless a hack was added like for RoBERTa models). This is not necessary since PyTorch doesn't take more space for tied weights and safetensors will properly remove them (with _tied_weights_keys)
Ignoring non-saved non-persistent buffers. This can be done automatically in the code of modeling_utils as non-persistent buffers are keys in the model named buffers not in the state dict, so easy to dectect
Ignoring known unexpected weights from another architecture (like the pooler). This isn't necessary anymore since we don't issue a warning in this case.

HuggingFaceDocBuilderDev · 2023-06-26T21:10:57Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-06-27T14:40:20Z

src/transformers/modeling_utils.py


        # If this weight is going to tip up over the maximal size, we split.
-        if last_block_size + weight_size > max_shard_size:
+        if last_block_size + weight_size > max_shard_size and len(sharded_state_dicts[-1]) > 0:


This change is necessary to make sure we save something in the first shard. With the removal of position_ids from the tensors saved, some tests of checkpoint sharding with BERT started failing.

I think it's worth adding a comment mentioning why we're doing this!

sgugger · 2023-06-27T14:41:35Z

src/transformers/modeling_utils.py

+        model.tie_weights()
+        ptrs = collections.defaultdict(list)
+        for name, tensor in model.state_dict().items():
+            id_tensor = id_tensor_storage(tensor) if tensor.device != torch.device("meta") else id(tensor)


Accelerate detects tied weights on IDs but it doesn't work for all models (deformable_detr for instance). So we use the same test as elsewhere except when the tensor is on the meta device (in which case it fails) where we default to id.

sgugger · 2023-06-27T14:42:12Z

tests/models/roberta/test_modeling_roberta.py

        self.assertTrue(torch.allclose(output, expected_tensor, atol=1e-4))
-
-    # XXX: this might be a candidate for common tests if we have many of those
-    def test_lm_head_ignore_keys(self):


This was testing the hack added to remove the weights from the _keys_to_ignore_on_save when untied.

Nice to see it go

sgugger · 2023-06-27T14:43:00Z

tests/test_modeling_common.py

                        f"The shared pointers are incorrect, found different pointers for keys {shared_names}",
                    )

+    def test_load_save_without_tied_weights(self):


This new test checks that when weights are untied, they are properly saved and we complain if they are missing from the checkpoint.

sgugger · 2023-06-27T14:44:16Z

tests/test_modeling_common.py


-    def test_tied_model_weights_key_ignore(self):
-        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
+    def test_model_weights_reload_no_missing_tied_weights(self):


The previous test was mostly focused on checking tied weights were in the _keys_to_ignore_on_load_missing class variable, but we don't put them there anymore. It's thus adapted.

LysandreJik

Looks good to me! Thanks for this change!

LysandreJik · 2023-06-27T16:10:44Z

src/transformers/modeling_utils.py


        # If this weight is going to tip up over the maximal size, we split.
-        if last_block_size + weight_size > max_shard_size:
+        if last_block_size + weight_size > max_shard_size and len(sharded_state_dicts[-1]) > 0:


I think it's worth adding a comment mentioning why we're doing this!

LysandreJik · 2023-06-27T16:13:01Z

tests/models/roberta/test_modeling_roberta.py

        self.assertTrue(torch.allclose(output, expected_tensor, atol=1e-4))
-
-    # XXX: this might be a candidate for common tests if we have many of those
-    def test_lm_head_ignore_keys(self):


Nice to see it go

amyeroberts

Really nice tidy up - thanks for working on this and updating!

amyeroberts · 2023-06-27T18:21:08Z

src/transformers/models/clap/modeling_clap.py

        self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
-        self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1)))
+        self.register_buffer(
+            "position_ids", torch.arange(config.max_position_embeddings).expand((1, -1)), persistent=True


Just double checking this should be persistent=True. Assuming yes, given other buffers but most other models seem to have it as False

CLAP has the persistent=False->persistent=True in its copied from statement, that's why. I don't want to break it accidentally so didn't touch that statement.

amyeroberts · 2023-06-27T18:23:42Z

src/transformers/models/flaubert/modeling_flaubert.py


        # Initialize weights and apply final processing
        self.post_init()
-        self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1)))


Does it still work with the removal of this buffer?

Ah it was a duplicate (not shown by the diff). You can scroll to line 452 below to see it again defined with persistent=False.

* fix * fix --------- Co-authored-by: ydshieh <[email protected]>

manav-glean · 2024-02-13T17:50:59Z

@sgugger with this change, a few of our trained models that used the older format no longer are able to properly load unless we set strict=False because they contain embeddings.position_ids key that no longer exists. I wonder if there is a way to land this change such that it would be backwards compat with older model files as well. I see a few different issues have popped up as a result of this change and a lot of just required loading and resaving the model files but that is sometimes difficult to do at scale.

sgugger added 13 commits June 27, 2023 10:35

Preliminary work on some models

1d1040f

Fix test load missing and make sure nonpersistent buffers are tested

bcddced

Always ignore nonpersistent buffers if in state_dict

5e8754f

Treat models

15efbc0

More models

b8ca0cb

Treat remaining models

99fd175

Fix quality

a787b65

Fix tests

7fbf9c8

Remove draft

5d4ffa2

This test is not needed anymore

7aa7e80

Fix copies

c30366f

Fix last test

dc52803

Newly added models

9f4511e

sgugger force-pushed the clean_load_keys branch from b40268f to 9f4511e Compare June 27, 2023 14:37

sgugger commented Jun 27, 2023

View reviewed changes

Fix last tests

002c1b8

sgugger requested review from LysandreJik and amyeroberts June 27, 2023 15:49

LysandreJik approved these changes Jun 27, 2023

View reviewed changes

Address review comments

0ccd672

amyeroberts approved these changes Jun 27, 2023

View reviewed changes

sgugger merged commit 8e5d161 into main Jun 27, 2023

sgugger deleted the clean_load_keys branch June 27, 2023 18:45

sgugger mentioned this pull request Jun 28, 2023

Finishing tidying keys to ignore on load #24535

Merged

ydshieh mentioned this pull request Jun 29, 2023

Update some torchscript tests after #24505 #24566

Merged

ydshieh added a commit that referenced this pull request Jun 29, 2023

Update some torchscript tests after #24505 (#24566)

77db28d

* fix * fix --------- Co-authored-by: ydshieh <[email protected]>

ArthurZucker mentioned this pull request Jul 7, 2023

bug for from_pretrained method with ignore_mismatched_sizes=True #24704

Closed

4 tasks

ZHUI mentioned this pull request Jul 18, 2023

[LLM] Fix miss matched size for tie weights. PaddlePaddle/PaddleNLP#6394

Merged

riteshghorse mentioned this pull request Jul 28, 2023

[Failing Test]: pytorch_inference_it_test for MaskedLM is failing to load state dict apache/beam#27734

Closed

15 tasks

Vaibhavs10 mentioned this pull request Aug 1, 2023

🤗 Transformers compatibility | delete text_branch.embeddings.position_ids key LAION-AI/CLAP#118

Merged

patrickvonplaten mentioned this pull request Aug 2, 2023

[MMS] Fix mms #25267

Merged

5 tasks

EIFY mentioned this pull request Aug 11, 2023

Fix text.transformer.embeddings.position_ids key error mlfoundations/open_clip#595

Merged

Clean load keys #24505

Clean load keys #24505

Uh oh!

Conversation

sgugger commented Jun 26, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manav-glean commented Feb 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HuggingFaceDocBuilderDev commented Jun 26, 2023 •

edited

Loading