[WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown #19045

holdenk · 2017-08-24T21:07:09Z

What changes were proposed in this pull request?

Design document: https://docs.google.com/document/d/1bC2sxHoF3XbAvUHQebpylAktH6B3PSTVAGIOCYj0Mbg/edit?usp=sharing

Keep track of nodes which are going to be shutdown to prevent scheduling tasks. The PR is designed with spot instances in mind, where there is some notice (depending on the cloud vendor) that the node will be shut down.

Since Kubernetes has a first class notion of pod shut down and grace periods the decommissioning support is available on Kubernetes. For other deployments it is left to the instance to notify the worker(s) of decommissioning with SIGPWR.

SPARK-20628 is a sub-task of SPARK-20624 with follow up tasks to perform migration of data and re-launching of tasks. SPARK-20628 is distinct from other mechanism where Spark its self has control of executor decommissioning, however the later follow up tasks in SPARK-20624 should be usable across voluntary and involuntary termination (e.g. #19041 could provide a good mechanism for doing data copy during involuntary termination).

How was this patch tested?

Extension of AppClientSuite to cover decommissioning and addition of explicit worker decom suite.

Areas of future work:

Follow up with relevant companion notification scripts in the relevant spark-on-[X] for X in {cloud providers} projects.
Integrate Yarn support ( see [WIP][SPARK-20628][CORE] Blacklist nodes when they transition to DECOMMISSIONING state in YARN #19267 for how to get / handle Yarn state, but depends on YARN-6483 which is merged but not back ported).

…tead maybe

…going-to-be-shutdown-r2

… host to call

SparkQA · 2017-08-25T00:16:47Z

Test build #81103 has finished for PR 19045 at commit 65a29c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2017-11-06T09:03:05Z

Are you still working on this? @holdenk

holdenk · 2017-11-08T21:55:23Z

I'll bump it if anyone has a chance to review, but I think we'll see how #19267 (comment) plays out first.

holdenk · 2017-12-01T20:05:46Z

So it seems like the YARN changes are only going to happen in Hadoop 3+ so this might make sense regardless of what happens in #19267 (since folks like K8 or whoever can send the message as desired).

holdenk · 2018-08-13T18:02:38Z

Chatted with some K8s folks and I'll revive this PR with that in mind.

SparkQA · 2018-08-27T19:30:43Z

Test build #95306 has finished for PR 19045 at commit c40fac5.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2018-09-08T17:58:35Z

cc @ifilonenko it's super WIP but since you joined me on the stream where I was working on reviving this I thought it would be good to get your early comments (especially if you have any suggestions around making effective integration tests for this).

SparkQA · 2018-09-08T18:10:30Z

Test build #95837 has finished for PR 19045 at commit 0ba0ca5.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-08T18:20:34Z

Test build #95838 has finished for PR 19045 at commit 5877c16.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-14T01:42:35Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8886/

SparkQA · 2019-03-14T01:49:30Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8886/

shaneknapp · 2019-03-15T23:35:25Z

FYI... the k8s integration test failure was caused by this:
https://issues.apache.org/jira/browse/SPARK-27178

i have a fix ready to go, but am still wondering why this suddenly popped up. :(

holdenk · 2019-05-07T01:39:21Z

Thanks @shaneknapp :)

SparkQA · 2019-05-07T01:39:25Z

Test build #105181 has finished for PR 19045 at commit 09a01cf.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-07T01:54:09Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/10464/

SparkQA · 2019-05-07T02:02:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/10466/

SparkQA · 2019-05-07T02:10:59Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/10464/

SparkQA · 2019-05-07T02:18:01Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/10466/

SparkQA · 2019-05-07T02:25:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/10467/

SparkQA · 2019-05-07T02:42:42Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/10467/

SparkQA · 2019-05-07T03:45:41Z

Test build #105184 has finished for PR 19045 at commit 55fa260.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-07T04:13:15Z

Test build #105183 has finished for PR 19045 at commit 9a5000d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-16T19:38:21Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/15793/

SparkQA · 2019-09-16T19:55:46Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/15793/

holdenk · 2019-11-08T21:18:53Z

I've created a new pull request & design doc with some feedback and just general updates since Spark has shifted a lot. We can continue the discussion in #26440

…emption support This PR is based on an existing/previou PR - #19045 ### What changes were proposed in this pull request? This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache. There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing ### Why are the changes needed? With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required. ### Does this PR introduce any user-facing change? There is no API change, however an additional configuration flag is added to enable/disable this behaviour. ### How was this patch tested? New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s. Closes #26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4. Lead-authored-by: Holden Karau <[email protected]> Co-authored-by: Holden Karau <[email protected]> Signed-off-by: Holden Karau <[email protected]>

…emption support This PR is based on an existing/previou PR - apache#19045 ### What changes were proposed in this pull request? This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache. There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing ### Why are the changes needed? With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required. ### Does this PR introduce any user-facing change? There is no API change, however an additional configuration flag is added to enable/disable this behaviour. ### How was this patch tested? New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s. Closes apache#26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4. Lead-authored-by: Holden Karau <[email protected]> Co-authored-by: Holden Karau <[email protected]> Signed-off-by: Holden Karau <[email protected]>

…emption support This PR is based on an existing/previou PR - apache#19045 This changes adds a decommissioning state that we can enter when the cloud provider/scheduler lets us know we aren't going to be removed immediately but instead will be removed soon. This concept fits nicely in K8s and also with spot-instances on AWS / preemptible instances all of which we can get a notice that our host is going away. For now we simply stop scheduling jobs, in the future we could perform some kind of migration of data during scale-down, or at least stop accepting new blocks to cache. There is a design document at https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing With more move to preemptible multi-tenancy, serverless environments, and spot-instances better handling of node scale down is required. There is no API change, however an additional configuration flag is added to enable/disable this behaviour. New integration tests in the Spark K8s integration testing. Extension of the AppClientSuite to test decommissioning seperate from the K8s. Closes apache#26440 from holdenk/SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r4. Lead-authored-by: Holden Karau <[email protected]> Co-authored-by: Holden Karau <[email protected]> Signed-off-by: Holden Karau <[email protected]>

holdenk added 16 commits June 19, 2017 08:57

Start of work on adventures

81fff20

Mini progresss

e470bac

Go down the path of handling as lost but urgh lets just blacklist ins…

a00c707

…tead maybe

Plumb through executor loss to the scheduables

74ade44

AppClient suite works! yay

a880177

Decomissioning now works in the coarse grained scheduler, yay....

b970403

Remove sketchy println debugging

ded6bbc

Add a worker decommissioning suite

16c855a

Merge in latest master

c79a06d

Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…

e3798d0

…going-to-be-shutdown-r2

Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…

4f70706

…going-to-be-shutdown-r2

Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…

07c3e3e

…going-to-be-shutdown-r2

Add decommissioning script for whatever process is running locally on…

c2a0ad8

… host to call

Leave polling mechanism up to the cloud vendors

672c3b6

Remove legacy comment and remove some unecessary blank lines

9cfdb7f

Remove manually debugging printlns (oops)

65a29c1

holdenk added 3 commits August 13, 2018 14:15

Merge in master

9f08b7e

Update and add blocking for K8s

258a116

Add workerDecomissioning to K8s conf

c40fac5

Merge in master

0ba0ca5

Tidy up small things.

5877c16

holdenk added 2 commits May 6, 2019 18:22

Attempt to merge in master

6c41552

Fix minor style issues after merge

09a01cf

Add license header to decom script

9a5000d

waitpid is the syscall wait is the shell command

e271a1d

holdenk added 2 commits May 6, 2019 19:08

Start cleaning up the decom script, todo fix the PID extraction

7400792

Print out the termination log at the end as well

55fa260

dongjoon-hyun added the SPARK CORE label Jun 14, 2019

holdenk mentioned this pull request Nov 8, 2019

[SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support #26440

Closed

holdenk closed this Nov 8, 2019

[WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown #19045

[WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown #19045

Uh oh!

Conversation

holdenk commented Aug 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Aug 25, 2017

Uh oh!

jiangxb1987 commented Nov 6, 2017

Uh oh!

holdenk commented Nov 8, 2017

Uh oh!

holdenk commented Dec 1, 2017

Uh oh!

holdenk commented Aug 13, 2018

Uh oh!

SparkQA commented Aug 27, 2018

Uh oh!

holdenk commented Sep 8, 2018

Uh oh!

SparkQA commented Sep 8, 2018

Uh oh!

SparkQA commented Sep 8, 2018

Uh oh!

SparkQA commented Mar 14, 2019

Uh oh!

SparkQA commented Mar 14, 2019

Uh oh!

shaneknapp commented Mar 15, 2019

Uh oh!

holdenk commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented May 7, 2019

Uh oh!

SparkQA commented Sep 16, 2019

Uh oh!

SparkQA commented Sep 16, 2019

Uh oh!

holdenk commented Nov 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

45 participants

holdenk commented Aug 24, 2017 •

edited

Loading