-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown #19045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…going-to-be-shutdown-r2
…going-to-be-shutdown-r2
…going-to-be-shutdown-r2
Test build #81103 has finished for PR 19045 at commit
|
Are you still working on this? @holdenk |
I'll bump it if anyone has a chance to review, but I think we'll see how #19267 (comment) plays out first. |
So it seems like the YARN changes are only going to happen in Hadoop 3+ so this might make sense regardless of what happens in #19267 (since folks like K8 or whoever can send the message as desired). |
Chatted with some K8s folks and I'll revive this PR with that in mind. |
Test build #95306 has finished for PR 19045 at commit
|
cc @ifilonenko it's super WIP but since you joined me on the stream where I was working on reviving this I thought it would be good to get your early comments (especially if you have any suggestions around making effective integration tests for this). |
Test build #95837 has finished for PR 19045 at commit
|
Test build #95838 has finished for PR 19045 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
FYI... the k8s integration test failure was caused by this: i have a fix ready to go, but am still wondering why this suddenly popped up. :( |
Thanks @shaneknapp :) |
Test build #105181 has finished for PR 19045 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #105184 has finished for PR 19045 at commit
|
Test build #105183 has finished for PR 19045 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
I've created a new pull request & design doc with some feedback and just general updates since Spark has shifted a lot. We can continue the discussion in #26440 |
…emption support This PR is based on an existing/previou PR - #19045 ### What changes were proposed in this pull request? This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache. There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing ### Why are the changes needed? With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required. ### Does this PR introduce any user-facing change? There is no API change, however an additional configuration flag is added to enable/disable this behaviour. ### How was this patch tested? New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s. Closes #26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4. Lead-authored-by: Holden Karau <[email protected]> Co-authored-by: Holden Karau <[email protected]> Signed-off-by: Holden Karau <[email protected]>
…emption support This PR is based on an existing/previou PR - apache#19045 ### What changes were proposed in this pull request? This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache. There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing ### Why are the changes needed? With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required. ### Does this PR introduce any user-facing change? There is no API change, however an additional configuration flag is added to enable/disable this behaviour. ### How was this patch tested? New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s. Closes apache#26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4. Lead-authored-by: Holden Karau <[email protected]> Co-authored-by: Holden Karau <[email protected]> Signed-off-by: Holden Karau <[email protected]>
…emption support This PR is based on an existing/previou PR - apache#19045 This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache. There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required. There is no API change, however an additional configuration flag is added to enable/disable this behaviour. New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s. Closes apache#26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4. Lead-authored-by: Holden Karau <[email protected]> Co-authored-by: Holden Karau <[email protected]> Signed-off-by: Holden Karau <[email protected]>
…emption support This PR is based on an existing/previou PR - apache#19045 This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache. There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required. There is no API change, however an additional configuration flag is added to enable/disable this behaviour. New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s. Closes apache#26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4. Lead-authored-by: Holden Karau <[email protected]> Co-authored-by: Holden Karau <[email protected]> Signed-off-by: Holden Karau <[email protected]>
What changes were proposed in this pull request?
Design document: https://docs.google.com/document/d/1bC2sxHoF3XbAvUHQebpylAktH6B3PSTVAGIOCYj0Mbg/edit?usp=sharing
Keep track of nodes which are going to be shutdown to prevent scheduling tasks. The PR is designed with spot instances in mind, where there is some notice (depending on the cloud vendor) that the node will be shut down.
Since Kubernetes has a first class notion of pod shut down and grace periods the decommissioning support is available on Kubernetes. For other deployments it is left to the instance to notify the worker(s) of decommissioning with SIGPWR.
SPARK-20628 is a sub-task of SPARK-20624 with follow up tasks to perform migration of data and re-launching of tasks. SPARK-20628 is distinct from other mechanism where Spark its self has control of executor decommissioning, however the later follow up tasks in SPARK-20624 should be usable across voluntary and involuntary termination (e.g. #19041 could provide a good mechanism for doing data copy during involuntary termination).
How was this patch tested?
Extension of AppClientSuite to cover decommissioning and addition of explicit worker decom suite.
Areas of future work: