Skip to content

Conversation

@kai01ai
Copy link
Contributor

@kai01ai kai01ai commented Sep 7, 2023

What does this PR do?

Fixes #25977

After resizing the input_embedding, the value of new_embeddings.weight.shape[0] is utilized as the new size for resizing the lm_head. However, when deepspeed zero3 is enabled, this value becomes 0. This PR addresses this issue by updating new_num_tokens explicitly.

Who can review?

@ArthurZucker, @pacman100

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kai01ai for quickly fixing this issue, LGTM! 🤗

@pacman100
Copy link
Contributor

cc @amyeroberts

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@amyeroberts amyeroberts merged commit df04959 into huggingface:main Sep 7, 2023
LysandreJik pushed a commit that referenced this pull request Sep 15, 2023
Comment on lines +1440 to +1449
# Update new_num_tokens with the actual size of new_embeddings
if pad_to_multiple_of is not None:
if is_deepspeed_zero3_enabled():
import deepspeed

with deepspeed.zero.GatheredParameters(new_embeddings.weight, modifier_rank=None):
new_num_tokens = new_embeddings.weight.shape[0]
else:
new_num_tokens = new_embeddings.weight.shape[0]

Copy link
Contributor

@kwonmha kwonmha Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kai01ai, I think this code block can be deleted and pass new_num_tokens as it is in this function scope to _get_resized_lm_head() on line 1453.
Because new_embeddings variable is created based on new_num_tokens in _get_resized_embeddings().
So it doesn't need to be reassigned from new_embeddings.weight.shape[0].

Or use new_num_tokens = new_embeddings.num_embeddings instead.

Is there any case whether new_num_tokens get changed?

By the way, is it appropriate for me to have review here?
I just wanted to add conversation but referencing code forced me to create review here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new_num_tokens can be changed in _get_resized_embeddings because of the pad_to_mulitple_of, which is why we can't do this. However if we can use new_embedding.num_embeddings, with both deepspeed and not deepspeed then sure! Would you like to open a pr for this?

Totally fine for you to review like this, it's the preferred method for code discussions!

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
@kai01ai kai01ai deleted the fix-resize-embedding branch October 28, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assertion error when using Trainer & Deepspeed stage 3 with model.resize_token_embeddings

6 participants