Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Oct 20, 2025

Clip used old mask APIs leading to a confused usage:

  • A causal mask (normal triu mask)
  • A padding mask (encoder mask == only accounting for padding)
  • Add both of above == final mask --> causal mask with padding

^ works only for interfaces with support for 4D masks which disabled FA usage in general.

This PR now correctly changes this to the new API which handles padding automatically. We have to additionally pass the is_causal kwarg to dynamically switch between modality types (text == causal, image == full). This is only enabled through recent PRs (fa #39707, sdpa #41692).

Closes #41673
Fixes #41668

@vasqu
Copy link
Contributor Author

vasqu commented Oct 20, 2025

cc @yonigozlan when you come across models like these in the vision refactors

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see that we're changing only the text model. LGTM, as long as the slow tests are passing

@vasqu
Copy link
Contributor Author

vasqu commented Oct 20, 2025

run-slow: clip

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/clip']
quantizations: [] ...

@vasqu
Copy link
Contributor Author

vasqu commented Oct 20, 2025

@molbap @zucchini-nlp I changed a few things to align the kwargs with our modern practices, i.e. see 764e63f

This makes kwarg easy to type properly, otherwise we probably need to type it as FA kwargs 🤔

@vasqu vasqu changed the title [Clip] Fix masking and enable flash attention on all model types 🚨 [Clip] Fix masking and enable flash attention on all model types Oct 20, 2025
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better with the typing, thanks!

@vasqu
Copy link
Contributor Author

vasqu commented Oct 20, 2025

CI has issues today, will probably check tomorrow again (and propogate the changes to metaclip 2 + mlcd) and merge then

@vasqu vasqu added the Vision label Oct 20, 2025
@vasqu
Copy link
Contributor Author

vasqu commented Oct 20, 2025

run-slow: clip, metaclip_2, mlcd, llava

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/clip', 'models/llava', 'models/metaclip_2', 'models/mlcd']
quantizations: [] ...

@vasqu
Copy link
Contributor Author

vasqu commented Oct 20, 2025

run-slow: clip, metaclip_2, mlcd, llava

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/clip', 'models/llava', 'models/metaclip_2', 'models/mlcd']
quantizations: [] ...

@vasqu
Copy link
Contributor Author

vasqu commented Oct 20, 2025

Yea, the CI is not having a good day :D locally all the relevant tests passed, especially the integration tests - checking tomorrow

@vasqu
Copy link
Contributor Author

vasqu commented Oct 21, 2025

run-slow: clip, metaclip_2, mlcd, llava

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/clip', 'models/llava', 'models/metaclip_2', 'models/mlcd']
quantizations: [] ...

@vasqu
Copy link
Contributor Author

vasqu commented Oct 21, 2025

run-slow: clip, metaclip_2, mlcd, llava

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/clip', 'models/llava', 'models/metaclip_2', 'models/mlcd']
quantizations: [] ...

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: clip, metaclip_2, mlcd

@vasqu
Copy link
Contributor Author

vasqu commented Oct 21, 2025

run-slow: clip, metaclip_2, mlcd, llava

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/clip', 'models/llava', 'models/metaclip_2', 'models/mlcd']
quantizations: [] ...

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanlks

)

attn_output = attn_output.reshape(batch_size, seq_length, embed_dim).contiguous()
attn_output = attn_output.reshape(batch_size, seq_length, -1).contiguous()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we use -1 for the batchsize as text can be ragged but not an issue

@vasqu vasqu merged commit 7a833d1 into huggingface:main Oct 24, 2025
19 checks passed
@vasqu vasqu deleted the fix-clip-fa branch October 24, 2025 18:44
i3hz pushed a commit to i3hz/transformers that referenced this pull request Oct 30, 2025
…uggingface#41750)

* fix

* make kwargs fully passed and adjust with outputs xxx

* propogate metaclip 2

* propogate mlcd and fix test

* style

* fix repo consistency, need to add ignore rules as those are building blocks

* style

* oops

* fix mlcd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLIP incompatible with Flash Attention 3

5 participants