-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Core][TPU] Support TPU Data Parallalism #27365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a get_dynamic_token_budget method to the Scheduler class, providing a hook for subclasses to implement dynamic scheduling constraints. This is a good extension point. The new method is then used in the schedule method to constrain the number of tokens for both running and waiting requests.
My main feedback is to simplify the newly added logic. The three lines added in two places to apply the dynamic budget can be condensed into a single line. This will make the code more concise and avoid duplication, improving maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
f18cbdf to
79c018b
Compare
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
cfbeef6 to
bf135a0
Compare
yaochengji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
@njhill could you please take a look at the small change? |
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
Signed-off-by: wenxindongwork <[email protected]>
This pull request introduces a small but important update to the data parallel initialization logic in
vllm/entrypoints/llm.py. The change ensures that the warning and error for unsupported single-process data parallel usage are not triggered when running on TPU platforms.TPU DP support added in vllm-project/tpu-inference#865
supported_models.mdandexamplesfor a new model.