-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[https://nvbugs/5154414][fix] Balanced layer to PP rank assignment #3827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9255220
to
c33ca15
Compare
/bot run |
PR_Github #3247 [ run ] triggered by Bot |
PR_Github #3247 [ run ] completed with state |
c33ca15
to
7697b73
Compare
/bot run |
PR_Github #3305 [ run ] triggered by Bot |
PR_Github #3305 [ run ] completed with state |
/bot run |
PR_Github #3339 [ run ] triggered by Bot |
PR_Github #3339 [ run ] completed with state |
/bot run |
/bot run |
PR_Github #3648 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use tensor_split
(ref) here for simplicity? I think it should do the same thing.
PR_Github #3648 [ run ] completed with state |
a8d63e0
to
1c9388e
Compare
Signed-off-by: Anurag Mukkara <[email protected]>
/bot run |
PR_Github #3928 [ run ] triggered by Bot |
PR_Github #3928 [ run ] completed with state |
/bot run |
PR_Github #3952 [ run ] triggered by Bot |
PR_Github #3952 [ run ] completed with state |
/bot run |
PR_Github #3974 [ run ] triggered by Bot |
PR_Github #3974 [ run ] completed with state |
Description
For some model and PP size combinations,
num_hidden_layers % pp_size != 0
. This PR creates a balanced assignment of layers to PP ranks in such cases, with few ranks assigned just one extra layer.For example, Deepseek-V3 with 61 layers, pp size = 8, 61 % 8 = 5:
First 5 ranks get 8 layers each, last 3 ranks get 7 layers each.
Before this change, first 7 ranks get 7 layers each, and last rank get 12 layers causing OOM on last rank.