-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Add ZenFlow code for Stage 3 #7516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @tohtana @sfc-gh-truwase @Antlera, when you have some time, could you please take a look at this PR? Thanks! |
db2dfac
to
133290e
Compare
@JoshWoo2003 - could you please resolve merge conflicts? |
d550814
to
47b10d8
Compare
Sorry for the very late reply! I’ve resolved the merge conflicts and updated the affinity setting as suggested. |
Hi @JoshWoo2003, the affinity part looks good to me. Thanks for the change! Can you also fix formatting? Thanks! |
- Introduced a new file: zenflow/engine_stage3.py to implement ZenFlow-specific Stage 3 logic. - Modified zero/stage3.py to ensure compatibility with Zenflow's execution flow. - Updated zero/parameter_offload.py to support the integration of ZenFlow with ZeRO-Stage 3. Signed-off-by: Yusen Wu <[email protected]>
- Add ZenFlowSelectiveAdamW_stage3 to support ZeRO Stage 3 - Update unit tests for ZeRO-Stage 3 with ZenFlow Signed-off-by: Yusen Wu <[email protected]>
Signed-off-by: Yusen Wu <[email protected]>
- Add default value (`zenflow=False`) in DeepSpeedZeROOffload.__init__ - Prevents TypeError when instantiating optimizer without zenflow Signed-off-by: Yusen Wu <[email protected]>
- Resolved merge conflicts with upstream changes - Unified ZenFlow affinity behavior for Stage 3 with Stage 1 and Stage 2 Signed-off-by: Yusen Wu <[email protected]> Co-authored-by: Ma, Guokai <[email protected]>
4f4e752
to
26cc5ec
Compare
Thanks for the review, @delock! The formatting issues were due to my branch being behind the base. I’ve rebased onto upstream/master and the latest push should fix them. Please take another look when you have a chance—thanks! @loadams @sfc-gh-truwase @tohtana @Antlera |
This PR completes the ZenFlow integration for DeepSpeed ZeRO Stage 3.
Highlights:
Note: Intergration with ZeRO Stage 1&2 was introduced in #7391