Skip to content

Conversation

@younesbelkada
Copy link
Contributor

Related to huggingface/transformers#25265

Users can easily benefit from flash attention, this PR adds a new argument in SFTTrainer to take care of that and properly document and raise errors when relevant.

This leads to some interesting speedups and memory saving that I will detail after doing some experiments, hence putting this PR as draft for now

Only available if you use pytorch nightlies and use packing=True as SDPA + flash attention does not support padding

cc @lvwerra @fxmarty @vwxyzjn

@younesbelkada younesbelkada changed the title [SFTTrainer] Flash attention support for SFTTrainer [skip ci] [SFTTrainer] Flash attention support for SFTTrainer Aug 17, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@fxmarty
Copy link

fxmarty commented Aug 17, 2023

Are trl users using padding?

@younesbelkada
Copy link
Contributor Author

adding packing=True should lead to having no padding in the input texts as it will concatenate text chunks until reaching max_seq_length

)
from transformers.trainer_callback import TrainerCallback
from transformers.trainer_utils import EvalPrediction
from transformers.utils import ContextManagers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahahah nice!

@younesbelkada younesbelkada marked this pull request as ready for review August 18, 2023 11:19
@younesbelkada younesbelkada changed the title [skip ci] [SFTTrainer] Flash attention support for SFTTrainer [SFTTrainer] Flash attention support for SFTTrainer Aug 18, 2023
@younesbelkada younesbelkada requested a review from lvwerra August 18, 2023 11:19
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@younesbelkada
Copy link
Contributor Author

Closing for now as huggingface/transformers#25598 might be merged

@qgallouedec qgallouedec deleted the flash-attn-sft branch October 7, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants