Skip to content

Conversation

@h3110Fr13nd
Copy link
Contributor

What does this PR do?

Adds Modular Phi #33916

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker @LysandreJik
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker
Copy link
Collaborator

Awesome PR! 🥳 CC @Cyrilvallez

@h3110Fr13nd
Copy link
Contributor Author

Rebased for failing CIs

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey~
As a first review, let's try to use more strength of inheritance
If you look at the PhiAttention a lot of stuff can be skimmed out because it already exists in Gemma for example. Thus you can only add what's required and use del to remove what's to remove (for example dense is o_proj I think)

PhiForCausalLM and all should inherit from Gemma rather than Llama as they can be 1-1 the same this way!

Congrats on the PR 🤗

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! 🤗 I let some comments to use more inheritance, you could try to go even further if you can!

Comment on lines 1010 to 886
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot use Llama here as it causes self.classifier to become self.score, which may break weights initialization of pretrained models. You could try to find another model with the same names in the init 🤗

Copy link
Contributor Author

@h3110Fr13nd h3110Fr13nd Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found LlamaForTokenClassification closest to PhiForTokenClassification. Although it's copied from MptForTokenClassification, But SubClassing it causes issues for build_phi_alibi_tensor (i.e. build_mpt_alibi_tensor).
But I've addressed the issue by deleting self.score and adding self.classifier (basically renaming) from LlamaForTokenClassification.
@Cyrilvallez Let me know if changes required.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Collaborator

Feel free to ping us again once ready! 🤗

@h3110Fr13nd
Copy link
Contributor Author

Feel free to ping us again once ready! 🤗

@ArthurZucker . Oh, I think that's all. Suggest me any changes if required.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just some final details to take care of 🤗 Sorry for the delay, all the team was in Martinique for a big offsite lately, we just came back

Comment on lines 873 to 943
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code should not be needed, a single pass should be enough as this is identical with GemmaForCausalLM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot possibly remove the forward method completely, as it is required at least for the documentation. Changed to super().forward(.....). @Cyrilvallez

@h3110Fr13nd h3110Fr13nd force-pushed the modular/phi branch 2 times, most recently from d67d5b8 to 89199ac Compare November 25, 2024 21:03
@h3110Fr13nd
Copy link
Contributor Author

Rebased. @Cyrilvallez

@Cyrilvallez Cyrilvallez self-requested a review November 27, 2024 14:02
Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Sorry for the delay!! Last small comments to cut even more code, then we're good to go! 🤗

Comment on lines 892 to 896
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. @Cyrilvallez. I'm concerned. That removing this will result in the lm_head with bias=False as it is in Gemma. And I'm skeptic that it will affect the model initialization Or at least the output. I don't think we can neglect the bias. So kept the lm_head with bias=True in PhiForCausalLM.
Let me know if I'm correct or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes sorry, did not notice the bias! Then of course we need to keep it. You can still remove the self.model = ... and the post_init() lines though

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, LGTM! Great job thanks!
If #34858 gets merged before this one, you'll just need to make sure to modify accordingly, but it will be extremely straightforward!

@h3110Fr13nd
Copy link
Contributor Author

🤗

@h3110Fr13nd
Copy link
Contributor Author

h3110Fr13nd commented Dec 11, 2024

All right, LGTM! Great job thanks!
If #34858 gets merged before this one, you'll just need to make sure to modify accordingly, but it will be extremely straightforward!

Done! @Cyrilvallez

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Do you want to include the recent changes in #35235 ? 🤗


class PhiForCausalLM(PhiPreTrainedModel, GenerationMixin):
_tied_weights_keys = ["lm_head.weight"]
_tp_plan = {"lm_head": "colwise_rep"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for TP, the config needs TP as well!

config.hidden_size // self.num_heads, eps=config.layer_norm_eps, elementwise_affine=True
)

self.rotary_emb = PhiRotaryEmbedding(config=self.config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be removed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants