-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[Core] Enable Scaling Down for Multi-Host TPU Replicas #43470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
50129eb
Multi-host replica deletion logic initial commit
ryanaoleary f532cf1
Multi-host replica deletion logic
ryanaoleary 8a1d4b9
Fix broken tests
ryanaoleary f1704b7
Fix KubeRay spelling
ryanaoleary 544a80b
Make replica_index optional and added KeyError check
ryanaoleary 2d3d990
Remove self from tags initialization
ryanaoleary f2c227b
Update python/ray/autoscaler/batching_node_provider.py
ryanaoleary 7d76c40
Update python/ray/autoscaler/_private/kuberay/node_provider.py
ryanaoleary 62f25b2
Update python/ray/autoscaler/_private/kuberay/node_provider.py
ryanaoleary de6f7f2
Move replicas_to_delete and change to set
ryanaoleary e6a7372
Add assert for self.replicas_to_nodes
ryanaoleary 1f1bccc
Move replicas_to_node initialization to constructor
ryanaoleary 2d969c8
Fixed comments
ryanaoleary 89c6c81
Merge branch 'master' into autoscaling-changes
ryanaoleary 1286434
Fixed comments
ryanaoleary 60930a4
Add back in changes lost in bad rebase
ryanaoleary 95c3f16
Change to iterate through worker_ids
ryanaoleary 73e9517
Fix replicas_to_nodes instantiation
ryanaoleary 3062e27
Merge branch 'master' into autoscaling-changes
ryanaoleary 143e03a
Fix errors causing autoscaler test to fail
ryanaoleary 1394935
Update python/ray/autoscaler/_private/kuberay/node_provider.py
ryanaoleary dd22594
Fixed comments
ryanaoleary 473b7cb
Update python/ray/autoscaler/_private/autoscaler.py
ryanaoleary fa083aa
Fix replicas_to_nodes creation
ryanaoleary 81df1df
Add log statement and make naming consistent
ryanaoleary c219c22
Change all instances of replica ID to index
ryanaoleary 112a448
Remove NumOfHosts check from node provider
ryanaoleary 64c5d4b
Update python/ray/autoscaler/batching_node_provider.py
ryanaoleary eab706f
Update python/ray/autoscaler/batching_node_provider.py
ryanaoleary 96272bb
Clean up replicas to nodes and fixed var names
ryanaoleary c70bdbb
Merge branch 'master' into autoscaling-changes
ryanaoleary fa4e285
Change from remove to pop
ryanaoleary 9635250
Add None check to replicas_to_delete
ryanaoleary 0504175
Add more testing for scale down behavior
ryanaoleary fd27aca
Add more error checking
ryanaoleary 47e46ac
Add node_tags None check
ryanaoleary 1ae9ec9
Merge branch 'master' into autoscaling-changes
ryanaoleary a06581c
Change get_node_tags and fix terminate node logic
ryanaoleary 08b8eb9
Merge branch 'master' into autoscaling-changes
ryanaoleary c5048d6
Merge branch 'master' into autoscaling-changes
ryanaoleary 33d7f0a
Move multi-host scale down logic entirely to batching_node_provider
ryanaoleary 45c5b23
Add logger info statement for multi-host scaling
ryanaoleary 184bc2b
Add back in replica index tag
ryanaoleary ef7e025
BatchingNodeProvider autoscaling test
ryanaoleary 72cfedf
Merge branch 'master' into autoscaling-changes
ryanaoleary 776f8df
Remove newline
ryanaoleary 5659499
Merge branch 'ray-project:master' into autoscaling-changes
ryanaoleary 3f168c5
Merge branch 'master' into autoscaling-changes
ryanaoleary 76f092d
Mock BatchNodeProvider directly
ryanaoleary f1037d1
Update podlist2 with actual TPU pod yaml
ryanaoleary 302b69a
Fix test comments
ryanaoleary 14e8f79
Fix batching node provider log statement
ryanaoleary aa28221
Remove unused index increment
ryanaoleary 875767d
Merge branch 'master' into autoscaling-changes
ryanaoleary 2aa5a31
Add type annotation and fix nits
ryanaoleary b67be14
Merge branch 'master' into autoscaling-changes
ryanaoleary de1fa8a
Merge branch 'master' into autoscaling-changes
ryanaoleary File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.