Allow loading pretrained shared Pytorch checkpoints into flax models #18170

Sea-Snell · 2022-07-18T03:01:12Z

Motivation: Sharded pytorch checkpoints cannot currently be loaded into flax models; this may be desirable in some cases (e.g. "google/ul2").

Changes: I added an few lines to modeling_flax_utils.py to support this behavior. The behavior of the added code exactly matches how sharded checkpoints are loaded in modeling_utils.py for pytorch models.

@patrickvonplaten, @patil-suraj

HuggingFaceDocBuilderDev · 2022-07-18T03:11:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sgugger

LGTM, thanks for your PR! You'll need to import this constant from the utils submodule however :-)

Sea-Snell · 2022-07-21T14:41:05Z

Oops! Thanks, just added that import.

sgugger · 2022-07-21T15:12:00Z

Now you'll need to run make style to fix the formatting issues :-)

Sea-Snell · 2022-07-21T16:43:23Z

done!

sgugger

@ArthurZucker could you also have a quick look?

ArthurZucker

Thanks for contributing, I would be more in favour of finalising #18026 or you can merge my branch.
Overall we should always test both locally and on the hub 😄

ArthurZucker · 2022-07-22T12:25:43Z

src/transformers/modeling_flax_utils.py

+                elif from_pt and os.path.isfile(os.path.join(pretrained_model_name_or_path, WEIGHTS_INDEX_NAME)):
+                    # Load from a sharded PyTorch checkpoint
+                    archive_file = os.path.join(pretrained_model_name_or_path, WEIGHTS_INDEX_NAME)
+                    is_sharded = True


LGTM, just wondering if could add a small test?
You can use hf-internal-testing/tiny-random-bert-sharded/.
Also I opened #18026 which is really similar, which adds

@is_pt_flax_cross_test def test_from_sharded_pt(self): model = FlaxBertModel.from_pretrained("hf-internal-testing/tiny-random-bert-sharded", from_pt=True) ref_model = FlaxBertModel.from_pretrained("ArthurZ/tiny-random-bert-flax-only") for p1, p2 in zip(flatten_dict(model.params).values(), flatten_dict(ref_model.params).values()): assert np.allclose(np.array(p1), np.array(p2))

Was not really aware that the conversion would be straight forward let me have a look

ArthurZucker · 2022-07-22T12:27:49Z

src/transformers/modeling_flax_utils.py

                if from_pt and os.path.isfile(os.path.join(pretrained_model_name_or_path, WEIGHTS_NAME)):
                    # Load from a PyTorch checkpoint
                    archive_file = os.path.join(pretrained_model_name_or_path, WEIGHTS_NAME)
+                elif from_pt and os.path.isfile(os.path.join(pretrained_model_name_or_path, WEIGHTS_INDEX_NAME)):


BTW this will only work if the WEIGHTS_INDEX_NAME file is locally present, and does not include the hub.

Yeah, let's just finalize yours. What's left to do?

Maybe just fixing the tests, and making sure that the tests are actually good. Should be quiet straightforward☺️🙌

@Sea-Snell we just need to fix test_from_sharded_pt which is failing because the model used for comparison are not the same! Simply using the same model (either upload a new model using the same config but shard it with save_pretrained and setting the max_shard_size to 150KB should do the trick.

github-actions · 2022-08-18T15:05:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

load from sharded pytorch checkpoint for flax model

9b52d1d

LysandreJik requested review from ArthurZucker and sgugger July 21, 2022 10:22

sgugger reviewed Jul 21, 2022

View reviewed changes

added import from utils

06fc890

ran make style

ce558a8

sgugger approved these changes Jul 21, 2022

View reviewed changes

ArthurZucker requested changes Jul 22, 2022

View reviewed changes

github-actions bot closed this Aug 27, 2022

Allow loading pretrained shared Pytorch checkpoints into flax models #18170

Allow loading pretrained shared Pytorch checkpoints into flax models #18170

Uh oh!

Conversation

Sea-Snell commented Jul 18, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Jul 18, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Sea-Snell commented Jul 21, 2022

Uh oh!

sgugger commented Jul 21, 2022

Uh oh!

Sea-Snell commented Jul 21, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 22, 2022

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 22, 2022

Choose a reason for hiding this comment

Uh oh!

Sea-Snell Jul 24, 2022

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 24, 2022

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 25, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants