fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 #26024

kai01ai · 2023-09-07T08:00:39Z

What does this PR do?

After resizing the input_embedding, the value of new_embeddings.weight.shape[0] is utilized as the new size for resizing the lm_head. However, when deepspeed zero3 is enabled, this value becomes 0. This PR addresses this issue by updating new_num_tokens explicitly.

Who can review?

@ArthurZucker, @pacman100

…deepspeed zero3

pacman100

Thank you @kai01ai for quickly fixing this issue, LGTM! 🤗

pacman100 · 2023-09-07T08:35:41Z

cc @amyeroberts

HuggingFaceDocBuilderDev · 2023-09-07T09:01:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

amyeroberts

Thanks for fixing!

…deepspeed zero3 (#26024)

kwonmha · 2023-09-21T07:20:42Z

src/transformers/modeling_utils.py

+        # Update new_num_tokens with the actual size of new_embeddings
+        if pad_to_multiple_of is not None:
+            if is_deepspeed_zero3_enabled():
+                import deepspeed
+
+                with deepspeed.zero.GatheredParameters(new_embeddings.weight, modifier_rank=None):
+                    new_num_tokens = new_embeddings.weight.shape[0]
+            else:
+                new_num_tokens = new_embeddings.weight.shape[0]
+


@kai01ai, I think this code block can be deleted and pass new_num_tokens as it is in this function scope to _get_resized_lm_head() on line 1453.
Because new_embeddings variable is created based on new_num_tokens in _get_resized_embeddings().
So it doesn't need to be reassigned from new_embeddings.weight.shape[0].

Or use new_num_tokens = new_embeddings.num_embeddings instead.

Is there any case whether new_num_tokens get changed?

By the way, is it appropriate for me to have review here?
I just wanted to add conversation but referencing code forced me to create review here.

The new_num_tokens can be changed in _get_resized_embeddings because of the pad_to_mulitple_of, which is why we can't do this. However if we can use new_embedding.num_embeddings, with both deepspeed and not deepspeed then sure! Would you like to open a pr for this?

Totally fine for you to review like this, it's the preferred method for code discussions!

…deepspeed zero3 (huggingface#26024)

fix _resize_token_embeddings will set lm head size to 0 when enabled …

0ab10d5

…deepspeed zero3

kai01ai mentioned this pull request Sep 7, 2023

Assertion error when using Trainer & Deepspeed stage 3 with model.resize_token_embeddings #25977

Closed

4 tasks

pacman100 approved these changes Sep 7, 2023

View reviewed changes

pacman100 requested a review from amyeroberts September 7, 2023 08:35

amyeroberts approved these changes Sep 7, 2023

View reviewed changes

amyeroberts merged commit df04959 into huggingface:main Sep 7, 2023

LysandreJik pushed a commit that referenced this pull request Sep 15, 2023

fix _resize_token_embeddings will set lm head size to 0 when enabled …

2ba46c1

…deepspeed zero3 (#26024)

kwonmha reviewed Sep 21, 2023

View reviewed changes

kwonmha mentioned this pull request Sep 22, 2023

Simplify getting the number of embedding tokens. #26333

Closed

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023

fix _resize_token_embeddings will set lm head size to 0 when enabled …

6a50a23

…deepspeed zero3 (huggingface#26024)

This was referenced Oct 19, 2023

Simplifying getting the number of embedding tokens #26945

Closed

Simplifying getting the number of embedding tokens #26946

Closed

[WIP] Make logic of adding the number of embedded tokens simpler #26957

Closed

kai01ai deleted the fix-resize-embedding branch October 28, 2023 15:57

Luobots mentioned this pull request Dec 4, 2023

[BUG] 8*A800-80G机器使用torchrun以及deepspeed启动72B-base-LoRA尝试sft失败 QwenLM/Qwen#718

Closed

2 tasks

XuehaiPan mentioned this pull request Dec 20, 2023

fix(models/pretrained): fix resizing embeddings under ZeRO-3 PKU-Alignment/safe-rlhf#158

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 #26024

fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 #26024

Uh oh!

kai01ai commented Sep 7, 2023

Uh oh!

pacman100 left a comment •

edited

Loading

Uh oh!

pacman100 commented Sep 7, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Sep 7, 2023

Uh oh!

amyeroberts left a comment

Uh oh!

kwonmha Sep 21, 2023 •

edited

Loading

Uh oh!

ArthurZucker Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 #26024

fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 #26024

Uh oh!

Conversation

kai01ai commented Sep 7, 2023

What does this PR do?

Who can review?

Uh oh!

pacman100 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pacman100 commented Sep 7, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Sep 7, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

kwonmha Sep 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Sep 21, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pacman100 left a comment •

edited

Loading

kwonmha Sep 21, 2023 •

edited

Loading