Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Jan 20, 2025

What does this PR do?

Unblocks me on #35314 (comment). This PR raises error whenever one wants to use head_mask with SDPA/FA2, instead of raising a warning and redirecting to eager.

Note that we didn't change vision models like ViT in this PR, those models will simply fallback to eager and have no problems because they are not generative (no cache/no attn mask)

See the linked comment for reasons why we need this

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

(I think raising an exception is indeed the right thing to do, as we were not respecting the head_mask at all)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually happy to have mostly given that you are refactoring as well! TBH I don't mind ignoring head mask if the doc explicitly mentions that only eager uses it. Might be less work for us, less warning and aligned with kwargs that can be passed to attn integration functions

@zucchini-nlp
Copy link
Member Author

zucchini-nlp commented Feb 13, 2025

@ArthurZucker yeah, just removing also works since no-one seems to use it anyways. I updated each model's docs with small note about head-mask and the models now silently ignore the head-mask

Maybe we can simply remove it from valid args in v5.0?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM maybe a red light on the PR name!

@zucchini-nlp zucchini-nlp merged commit 955e61b into huggingface:main May 15, 2025
20 checks passed
@zucchini-nlp zucchini-nlp changed the title Remove head mask in generative models 🔴 Remove head mask in generative models May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants