-
Notifications
You must be signed in to change notification settings - Fork 30.9k
remove SharedDDP as it is deprecated #25702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sgugger sorry for bothering you, but would you mind taking a look at this PR? |
|
cc @muellerzr and @pacman100 who will take over the Trainer. |
b99de26 to
e069269
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
b797639 to
a56008a
Compare
|
Rebase my commits to master HEAD to fix the merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @statelesshz for removing the FairScale support as it is deprecated. The changes LGTM except for the TPU-related bits which Zach can review. Left a comment.
2447a3b to
959cd5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is looking great! Just a small note in terms of gradient scaling. We need to either keep this for now in this PR and merge, or coordinate a PR in accelerate with the right logic before this merges. I don't particularly have a leaning one way or another :)
|
@muellerzr Could you please take a second look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This all looks great to me, do you have access to a multi-gpu system by chance to run the sharded DDP yourself/test? Probably also good to have @pacman100 give it one more look as well :)
|
I've tested this PR using 4xA100-80G on FastChat with the following scripts |
|
@pacman100 Could you please take a look once again :-) I think it's ready to be merged. |
665a8ab to
1554e7a
Compare
|
Resolve merge conflicts by rebasing to main branch |
15facf5 to
aceb42c
Compare
|
Rebasing my commits to master HEAD and resolving merge conflicts |
|
Hi there. This PR is approved and the tests are green :D |
|
This PR is approved and the tests are green. @muellerzr Could you help to merge it? |
|
Thank you @statelesshz! |
|
Let me rebase quickly and merge if tests are green |
aceb42c to
140bd1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM! Thanks @statelesshz
What does this PR do?
As mentioned previously(see), fairscale's ShardedDDP is deprecated, and PyTorch FSDP is the recommended method for scaling to large NN models. Now it's time to say goodbye to this library👋.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@muellerz Good day. Could you please review this PR? Thanks😄