Skip to content

Conversation

@ver217
Copy link
Contributor

@ver217 ver217 commented Nov 30, 2021

Optimize communication of pipeline. Use P2POP instead of broadcast, and use async operation when sychronizing data over stages.

ver217 and others added 7 commits November 15, 2021 16:43
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <[email protected]>

Co-authored-by: 1SAA <[email protected]>
Co-authored-by: ver217 <[email protected]>
@ver217 ver217 requested a review from FrankLeeeee December 2, 2021 00:55
@FrankLeeeee FrankLeeeee merged commit 9e72bc1 into hpcaitech:develop/experiments Dec 4, 2021
@ver217 ver217 deleted the feature/pipeline branch December 6, 2021 04:22
FrankLeeeee added a commit that referenced this pull request Dec 9, 2021
* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <[email protected]>

Co-authored-by: 1SAA <[email protected]>
Co-authored-by: ver217 <[email protected]>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline

Co-authored-by: Frank Lee <[email protected]>
Co-authored-by: 1SAA <[email protected]>
Co-authored-by: binmakeswell <[email protected]>
FrankLeeeee added a commit that referenced this pull request Dec 9, 2021
* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <[email protected]>

* Split conv2d, class token, positional embedding in 2d, Fix random number in ddp
Fix convergence in cifar10, Imagenet1000

* Integrate 1d tensor parallel in Colossal-AI (#39)

* fixed 1D and 2D convergence (#38)

* optimized 2D operations

* fixed 1D ViT convergence problem

* Feature/ddp (#49)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <[email protected]>

Co-authored-by: 1SAA <[email protected]>
Co-authored-by: ver217 <[email protected]>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* support torch ddp

* fix loss accumulation

* add log for ddp

* change seed

* modify timing hook

Co-authored-by: Frank Lee <[email protected]>
Co-authored-by: 1SAA <[email protected]>
Co-authored-by: binmakeswell <[email protected]>

* Feature/pipeline (#40)

* remove redundancy func in setup (#19) (#20)

* use env to control the language of doc (#24) (#25)

* Support TP-compatible Torch AMP and Update trainer API (#27)

* Add gradient accumulation, fix lr scheduler

* fix FP16 optimizer and adapted torch amp with tensor parallel (#18)

* fixed bugs in compatibility between torch amp and tensor parallel and performed some minor fixes

* fixed trainer

* Revert "fixed trainer"

This reverts commit 2e0b0b7.

* improved consistency between trainer, engine and schedule (#23)

Co-authored-by: 1SAA <[email protected]>

Co-authored-by: 1SAA <[email protected]>
Co-authored-by: ver217 <[email protected]>

* add an example of ViT-B/16 and remove w_norm clipping in LAMB (#29)

* add explanation for ViT example (#35) (#36)

* optimize communication of pipeline parallel

* fix grad clip for pipeline

Co-authored-by: Frank Lee <[email protected]>
Co-authored-by: 1SAA <[email protected]>
Co-authored-by: binmakeswell <[email protected]>

* optimized 3d layer to fix slow computation ; tested imagenet performance with 3d; reworked lr_scheduler config definition; fixed launch args; fixed some printing issues; simplified apis of 3d layers (#51)

* Update 2.5d layer code to get a similar accuracy on imagenet-1k dataset

* update api for better usability (#58)

update api for better usability

Co-authored-by: 1SAA <[email protected]>
Co-authored-by: ver217 <[email protected]>
Co-authored-by: puck_WCR <[email protected]>
Co-authored-by: binmakeswell <[email protected]>
Co-authored-by: アマデウス <[email protected]>
Co-authored-by: BoxiangW <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants