-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Delete deprecations with end-cycle in v4.xx and v5.0 #41681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Delete deprecations with end-cycle in v4.xx and v5.0 #41681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this so much 🥲, thanks for taking the time to do this @zucchini-nlp !
My only reservation is for max_size, I'll try to scan the hub to see how bad the damage would be right now, and open PRs where necessary
src/transformers/models/conditional_detr/image_processing_conditional_detr.py
Show resolved
Hide resolved
src/transformers/models/deformable_detr/image_processing_deformable_detr.py
Show resolved
Hide resolved
src/transformers/models/maskformer/image_processing_maskformer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking care of that!
General comment:
For every models that had something like
logger.warning_once(
"MimiModel is using MimiSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
)because of being copied from Gemma, I think we should keep a warning. I've seen your answer above about that, but this is only when output_attentions is setted in the config, not in the forward, no? Therefore, until those implementations rely on ALL_ATTENTION_FUNCTIONS, we still have to handle the warning here to ensure we don't silently skip outputting attentions.
Review from my side:
core models (meaning ones that do not rely on modular)
- moshi
- mimi
- clap
- seamless_m4t
- unispeech
- unispeech_sat
- wav2vec2
- wav2vec2_bert
- whisper
the rest are impacted by modular directly:
- hubert (wav2vec2)
- kyutai_speech_to_text (moshi)
- sew (wav2vec2)
- sew_d (wav2vec2)
- wavlm (wav2vec2)
- data2vec_audio (wav2vec2)
Ah, after looking I see that it's not going to raise any warning unless we use Also it will nudge us to incorporate |
|
Yep, totally agree! |
| self, | ||
| input_ids: Optional[torch.LongTensor] = None, | ||
| attention_mask: Optional[torch.FloatTensor] = None, | ||
| position_ids: Optional[torch.LongTensor] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model doesn't use positions, I believe it was simply copied from other models without checking
|
Hub down? 😢 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, altclip, beit, blip_2, bloom, bridgetower, clap, conditional_detr, data2vec, deepseek_v2, deformable_detr, detr, grounding_dino |
|
Oke, now it is done I think. @yonigozlan @eustlb if you want to take another look, I will also request core mantainer's review |
What does this PR do?
Deletes deprecation for the v5 release. cc @yonigozlan and @eustlb to verify that we can really delete them, since it is mostly deprecated stuff from processors
For core maintainers:
sdpa_attention, but for old models we have to manually duplicate the waning message