-
Notifications
You must be signed in to change notification settings - Fork 31k
Modular phi #34361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Modular phi #34361
Conversation
|
Awesome PR! 🥳 CC @Cyrilvallez |
48303e6 to
d942994
Compare
|
Rebased for failing CIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey~
As a first review, let's try to use more strength of inheritance
If you look at the PhiAttention a lot of stuff can be skimmed out because it already exists in Gemma for example. Thus you can only add what's required and use del to remove what's to remove (for example dense is o_proj I think)
PhiForCausalLM and all should inherit from Gemma rather than Llama as they can be 1-1 the same this way!
Congrats on the PR 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! 🤗 I let some comments to use more inheritance, you could try to go even further if you can!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot use Llama here as it causes self.classifier to become self.score, which may break weights initialization of pretrained models. You could try to find another model with the same names in the init 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found LlamaForTokenClassification closest to PhiForTokenClassification. Although it's copied from MptForTokenClassification, But SubClassing it causes issues for build_phi_alibi_tensor (i.e. build_mpt_alibi_tensor).
But I've addressed the issue by deleting self.score and adding self.classifier (basically renaming) from LlamaForTokenClassification.
@Cyrilvallez Let me know if changes required.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
d942994 to
12b3901
Compare
|
Feel free to ping us again once ready! 🤗 |
@ArthurZucker . Oh, I think that's all. Suggest me any changes if required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Just some final details to take care of 🤗 Sorry for the delay, all the team was in Martinique for a big offsite lately, we just came back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this code should not be needed, a single pass should be enough as this is identical with GemmaForCausalLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot possibly remove the forward method completely, as it is required at least for the documentation. Changed to super().forward(.....). @Cyrilvallez
d67d5b8 to
89199ac
Compare
|
Rebased. @Cyrilvallez |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Sorry for the delay!! Last small comments to cut even more code, then we're good to go! 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. @Cyrilvallez. I'm concerned. That removing this will result in the lm_head with bias=False as it is in Gemma. And I'm skeptic that it will affect the model initialization Or at least the output. I don't think we can neglect the bias. So kept the lm_head with bias=True in PhiForCausalLM.
Let me know if I'm correct or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes sorry, did not notice the bias! Then of course we need to keep it. You can still remove the self.model = ... and the post_init() lines though
77d1fd3 to
67ebbaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, LGTM! Great job thanks!
If #34858 gets merged before this one, you'll just need to make sure to modify accordingly, but it will be extremely straightforward!
|
🤗 |
…amline initialization
…e past_key_values type
…osition embeddings handling
67ebbaf to
9bf4eae
Compare
Done! @Cyrilvallez |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Do you want to include the recent changes in #35235 ? 🤗
|
|
||
| class PhiForCausalLM(PhiPreTrainedModel, GenerationMixin): | ||
| _tied_weights_keys = ["lm_head.weight"] | ||
| _tp_plan = {"lm_head": "colwise_rep"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for TP, the config needs TP as well!
| config.hidden_size // self.num_heads, eps=config.layer_norm_eps, elementwise_affine=True | ||
| ) | ||
|
|
||
| self.rotary_emb = PhiRotaryEmbedding(config=self.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be removed!
What does this PR do?
Adds Modular Phi #33916
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker @LysandreJik
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.