- 
        Couldn't load subscription status. 
- Fork 31k
fix _resize_token_embeddings will set lm head size to 0 when enabled deepspeed zero3 #26024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kai01ai for quickly fixing this issue, LGTM! 🤗
| cc @amyeroberts | 
| The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
…deepspeed zero3 (#26024)
| # Update new_num_tokens with the actual size of new_embeddings | ||
| if pad_to_multiple_of is not None: | ||
| if is_deepspeed_zero3_enabled(): | ||
| import deepspeed | ||
|  | ||
| with deepspeed.zero.GatheredParameters(new_embeddings.weight, modifier_rank=None): | ||
| new_num_tokens = new_embeddings.weight.shape[0] | ||
| else: | ||
| new_num_tokens = new_embeddings.weight.shape[0] | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kai01ai, I think this code block can be deleted and pass new_num_tokens as it is in this function scope to _get_resized_lm_head() on line 1453.
Because new_embeddings variable is created based on new_num_tokens in _get_resized_embeddings().
So it doesn't need to be reassigned from new_embeddings.weight.shape[0].
Or use new_num_tokens = new_embeddings.num_embeddings instead.
Is there any case whether new_num_tokens get changed?
By the way, is it appropriate for me to have review here?
I just wanted to add conversation but referencing code forced me to create review here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new_num_tokens can be changed in _get_resized_embeddings because of the pad_to_mulitple_of, which is why we can't do this. However if we can use new_embedding.num_embeddings, with both deepspeed and not deepspeed then sure! Would you like to open a pr for this?
Totally fine for you to review like this, it's the preferred method for code discussions!
What does this PR do?
Fixes #25977
After resizing the input_embedding, the value of new_embeddings.weight.shape[0] is utilized as the new size for resizing the lm_head. However, when deepspeed zero3 is enabled, this value becomes 0. This PR addresses this issue by updating new_num_tokens explicitly.
Who can review?
@ArthurZucker, @pacman100