[`SFTTrainer`] Flash attention support for SFTTrainer #656

younesbelkada · 2023-08-17T12:50:13Z

Related to huggingface/transformers#25265

Users can easily benefit from flash attention, this PR adds a new argument in SFTTrainer to take care of that and properly document and raise errors when relevant.

This leads to some interesting speedups and memory saving that I will detail after doing some experiments, hence putting this PR as draft for now

Only available if you use pytorch nightlies and use packing=True as SDPA + flash attention does not support padding

cc @lvwerra @fxmarty @vwxyzjn

HuggingFaceDocBuilderDev · 2023-08-17T12:55:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

fxmarty · 2023-08-17T13:00:59Z

Are trl users using padding?

younesbelkada · 2023-08-17T13:33:30Z

adding packing=True should lead to having no padding in the input texts as it will concatenate text chunks until reaching max_seq_length

lvwerra · 2023-08-18T09:46:36Z

trl/trainer/sft_trainer.py

 )
 from transformers.trainer_callback import TrainerCallback
 from transformers.trainer_utils import EvalPrediction
+from transformers.utils import ContextManagers


I made this 😂

hahahah nice!

github-actions · 2023-09-16T15:04:38Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

younesbelkada · 2023-09-18T11:22:20Z

Closing for now as huggingface/transformers#25598 might be merged

younesbelkada added 2 commits August 17, 2023 12:33

flash attention support for SFTTrainer

1a47528

add autocast

c802681

younesbelkada changed the title ~~[SFTTrainer] Flash attention support for SFTTrainer~~ [skip ci] [SFTTrainer] Flash attention support for SFTTrainer Aug 17, 2023

trigger CI

c270614

lvwerra reviewed Aug 18, 2023

View reviewed changes

add docs and benchmarks

df6d872

younesbelkada marked this pull request as ready for review August 18, 2023 11:19

younesbelkada changed the title ~~[skip ci] [SFTTrainer] Flash attention support for SFTTrainer~~ [SFTTrainer] Flash attention support for SFTTrainer Aug 18, 2023

younesbelkada requested a review from lvwerra August 18, 2023 11:19

younesbelkada and others added 3 commits August 18, 2023 11:19

more details

a75441c

Merge branch 'main' into flash-attn-sft

8368441

use md

f638e31

younesbelkada closed this Sep 18, 2023

qgallouedec deleted the flash-attn-sft branch October 7, 2024 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`SFTTrainer`] Flash attention support for SFTTrainer #656

[`SFTTrainer`] Flash attention support for SFTTrainer #656

Uh oh!

younesbelkada commented Aug 17, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Aug 17, 2023

Uh oh!

fxmarty commented Aug 17, 2023

Uh oh!

younesbelkada commented Aug 17, 2023

Uh oh!

lvwerra Aug 18, 2023

Uh oh!

younesbelkada Aug 18, 2023

Uh oh!

github-actions bot commented Sep 16, 2023

Uh oh!

younesbelkada commented Sep 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SFTTrainer] Flash attention support for SFTTrainer #656

[SFTTrainer] Flash attention support for SFTTrainer #656

Uh oh!

Conversation

younesbelkada commented Aug 17, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Aug 17, 2023

Uh oh!

fxmarty commented Aug 17, 2023

Uh oh!

younesbelkada commented Aug 17, 2023

Uh oh!

lvwerra Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada Aug 18, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 16, 2023

Uh oh!

younesbelkada commented Sep 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[`SFTTrainer`] Flash attention support for SFTTrainer #656

[`SFTTrainer`] Flash attention support for SFTTrainer #656