remove SharedDDP as it is deprecated #25702

ji-huazhong · 2023-08-24T01:53:53Z

What does this PR do?

As mentioned previously(see), fairscale's ShardedDDP is deprecated, and PyTorch FSDP is the recommended method for scaling to large NN models. Now it's time to say goodbye to this library👋.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@muellerz Good day. Could you please review this PR? Thanks😄

src/transformers/trainer.py

ji-huazhong · 2023-08-24T03:16:46Z

@sgugger sorry for bothering you, but would you mind taking a look at this PR?

sgugger · 2023-08-24T05:36:50Z

cc @muellerzr and @pacman100 who will take over the Trainer.

HuggingFaceDocBuilderDev · 2023-08-24T11:38:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ji-huazhong · 2023-08-27T09:28:29Z

Rebase my commits to master HEAD to fix the merge conflict.
BTW, Is this PR still under reviewing? Any review suggestions? Please let me know if there is anything else that needs to be done.

pacman100

Thank you @statelesshz for removing the FairScale support as it is deprecated. The changes LGTM except for the TPU-related bits which Zach can review. Left a comment.

src/transformers/trainer.py

muellerzr

Thanks! This is looking great! Just a small note in terms of gradient scaling. We need to either keep this for now in this PR and merge, or coordinate a PR in accelerate with the right logic before this merges. I don't particularly have a leaning one way or another :)

src/transformers/trainer.py

ji-huazhong · 2023-09-01T16:08:28Z

@muellerzr Could you please take a second look at this PR?
Some modifications were made based on the code review comments.

muellerzr

Thanks! This all looks great to me, do you have access to a multi-gpu system by chance to run the sharded DDP yourself/test? Probably also good to have @pacman100 give it one more look as well :)

ji-huazhong · 2023-09-02T07:23:59Z

I've tested this PR using 4xA100-80G on FastChat with the following scripts

CUDA_VISIBLE_DEVICES=4,5,6,7  torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py \
    --model_name_or_path lmsys/vicuna-7b-v1.5  \
    --data_path data/dummy_conversation.json \
    --bf16 True \
    --output_dir output_vicuna \
    --num_train_epochs 1 \
    --max_steps 10 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard offload auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing False \
    --lazy_preprocess True

ji-huazhong · 2023-09-02T07:28:07Z

@pacman100 Could you please take a look once again :-) I think it's ready to be merged.

ji-huazhong · 2023-09-07T07:18:22Z

Resolve merge conflicts by rebasing to main branch

ji-huazhong · 2023-09-14T17:02:06Z

Rebasing my commits to master HEAD and resolving merge conflicts

ji-huazhong · 2023-09-20T02:37:15Z

Hi there. This PR is approved and the tests are green :D
cc @muellerzr and @pacman100

ji-huazhong · 2023-10-06T05:37:14Z

This PR is approved and the tests are green. @muellerzr Could you help to merge it?

LysandreJik · 2023-10-06T09:28:53Z

Thank you @statelesshz!

LysandreJik · 2023-10-06T09:29:06Z

Let me rebase quickly and merge if tests are green

…ner.

LysandreJik

Great, LGTM! Thanks @statelesshz

Necessary because: huggingface/transformers#25702

…rs#25702))

ji-huazhong marked this pull request as draft August 24, 2023 01:55

ji-huazhong commented Aug 24, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

ji-huazhong changed the title ~~remove SharedDDP as it is drepracated~~ remove SharedDDP as it is deprecated Aug 24, 2023

ji-huazhong commented Aug 24, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

ji-huazhong commented Aug 24, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

ji-huazhong marked this pull request as ready for review August 24, 2023 03:15

ji-huazhong force-pushed the drop_sharde_ddp branch from b99de26 to e069269 Compare August 24, 2023 09:32

ji-huazhong force-pushed the drop_sharde_ddp branch 4 times, most recently from b797639 to a56008a Compare August 27, 2023 09:27

pacman100 reviewed Aug 28, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

src/transformers/trainer.py Outdated Show resolved Hide resolved

src/transformers/trainer.py Outdated Show resolved Hide resolved

ji-huazhong force-pushed the drop_sharde_ddp branch 2 times, most recently from 2447a3b to 959cd5d Compare September 1, 2023 14:17

muellerzr reviewed Sep 1, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

ji-huazhong closed this Sep 1, 2023

ji-huazhong reopened this Sep 1, 2023

muellerzr approved these changes Sep 1, 2023

View reviewed changes

ji-huazhong requested a review from pacman100 September 4, 2023 12:13

ji-huazhong mentioned this pull request Sep 6, 2023

reduce gradient first for XLA when unscaling the gradients in mixed precision training with AMP. huggingface/accelerate#1926

Merged

5 tasks

ji-huazhong force-pushed the drop_sharde_ddp branch 2 times, most recently from 665a8ab to 1554e7a Compare September 7, 2023 07:16

ji-huazhong force-pushed the drop_sharde_ddp branch from 15facf5 to aceb42c Compare September 14, 2023 17:01

statelesshz and others added 8 commits October 6, 2023 11:29

remove SharedDDP as it was drepracated

2c9a538

apply review suggestion

892e554

make style

7f77ffc

Oops,forgot to remove the compute_loss context manager in Seq2SeqTrai…

224f397

…ner.

remove the unnecessary conditional statement

a1a6d78

keep the logic of IPEX

f7c87d8

clean code

1af9b59

mix precision setup & make fixup

140bd1d

LysandreJik force-pushed the drop_sharde_ddp branch from aceb42c to 140bd1d Compare October 6, 2023 09:30

LysandreJik approved these changes Oct 6, 2023

View reviewed changes

LysandreJik merged commit 27597fe into huggingface:main Oct 6, 2023

ji-huazhong deleted the drop_sharde_ddp branch October 7, 2023 07:32

ji-huazhong mentioned this pull request Oct 7, 2023

remove the obsolete code related to fairscale FSDP #26651

Merged

regisss mentioned this pull request Oct 9, 2023

Remove sharded_ddp huggingface/optimum-habana#456

Merged

3 tasks

AdamLouly mentioned this pull request Oct 11, 2023

Remove SharedDDP as it was deprecated from Transformers. huggingface/optimum#1443

Merged

muellerzr mentioned this pull request Oct 11, 2023

fix the model card issue as use_cuda_amp is no more available #26731

Merged

philschmid mentioned this pull request Nov 3, 2023

Add revision to push_to_hub huggingface/optimum-neuron#292

Merged

lenglaender added a commit to lenglaender/adapters that referenced this pull request Nov 7, 2023

Update AdapterTrainer since fairscale's ShardedDDP is deprecated

cc75e00

Necessary because: huggingface/transformers#25702

XuehaoSun pushed a commit to intel/intel-extension-for-transformers that referenced this pull request Jan 2, 2024

remove SharedDDP as it is deprecated ([#25702](huggingface/transforme…

f94f5b0

…rs#25702))

lkk12014402 mentioned this pull request Jan 2, 2024

remove SharedDDP as it is deprecated intel/intel-extension-for-transformers#1103

Merged

amyeroberts mentioned this pull request Apr 8, 2024

why cuda.amp was removed from version 3.35.0 #30094

Closed

4 tasks

remove SharedDDP as it is deprecated #25702

remove SharedDDP as it is deprecated #25702

Uh oh!

Conversation

ji-huazhong commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ji-huazhong commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Aug 24, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Aug 24, 2023

Uh oh!

ji-huazhong commented Aug 27, 2023

Uh oh!

pacman100 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

muellerzr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ji-huazhong commented Sep 1, 2023

Uh oh!

muellerzr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ji-huazhong commented Sep 2, 2023

Uh oh!

ji-huazhong commented Sep 2, 2023

Uh oh!

ji-huazhong commented Sep 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ji-huazhong commented Sep 14, 2023

Uh oh!

ji-huazhong commented Sep 20, 2023

Uh oh!

ji-huazhong commented Oct 6, 2023

Uh oh!

LysandreJik commented Oct 6, 2023

Uh oh!

LysandreJik commented Oct 6, 2023

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ji-huazhong commented Aug 24, 2023 •

edited

Loading

ji-huazhong commented Aug 24, 2023 •

edited

Loading

muellerzr left a comment •

edited

Loading

ji-huazhong commented Sep 7, 2023 •

edited

Loading