Log the driver exit status explicitly #282

lins05 · 2017-05-18T15:39:08Z

Close #276 .

How is the patch tested?

Manually add logical errors to the SparkPi example and check the output.

2017-05-18 15:33:11 INFO  Client:54 - Application spark-pi-1495121556913 failed with exit code 1. You may want to check the driver pod logs.

Close apache-spark-on-k8s#276

ash211 · 2017-05-18T15:51:43Z

...re/src/main/scala/org/apache/spark/deploy/kubernetes/submit/v1/LoggingPodStatusWatcher.scala

  private def status: String = pod.map(_.getStatus().getContainerStatuses().toString())
    .getOrElse("unknown")

+  private var driverPodExitCode: Int = 0


make this an Option[Int] and only set when we get an exit code. Otherwise if something goes wrong clients of LoggingPodStatusWatcher might think exit code is 0 when it's not

ash211 · 2017-05-18T15:52:55Z

...re/src/main/scala/org/apache/spark/deploy/kubernetes/submit/v1/LoggingPodStatusWatcher.scala

    }.mkString("")
  }
+
+  def getDriverPodExitCode: Int = {


mccheah · 2017-05-18T19:43:45Z

We shouldn't be making changes to V1 submission. Any changes should be done to V2 submission instead.

Incidentally this reminds me that V2 submission currently always runs in fire and forget mode - continuous monitoring mode needs to be re-done there.

ash211 · 2017-05-18T21:22:24Z

@mccheah can we make changes to both? I don't think V2 is ready for people to move over to yet, though coming closer

mccheah · 2017-05-18T21:23:37Z

I think the logging pod status is the last feature that isn't in V2 but is in V1. It would be good to add that to V2 and then remove the V1 code path entirely.

ash211 · 2017-05-18T21:26:51Z

@mccheah I'd ideally like to get confirmation from someone that's not us that they've run V2 and it worked, before deleting V1

mccheah · 2017-05-18T21:28:23Z

Except that would require us or someone building a custom version of Spark to actually change the code path itself. SparkSubmit currently is coded to use the V1 main class, but we would need it to switch to the V2 main class. And we don't want to make the code path configurable.

mccheah · 2017-05-19T00:26:09Z

@lins05 @ash211 I essentially re-built this in keeping in mind that we need the logging watcher in V2 in general as well, and that when we do, we'll want to include this logic to get the exit code in V2 as well. See #283.

erikerlandson · 2017-05-19T13:56:25Z

@ash211 @mccheah What is involved in building a V2 version? Is it a matter of checking out the right branch, compiling, and spinning corresponding images?

mccheah · 2017-05-19T18:09:26Z

One has to change this line in SparkSubmit to point to the V2 submission client instead. Then, the driver image must correspond to what we have in this Dockerfile. Finally, all of the parameters for submission need to correspond to what's used in V2, which is currently largely undocumented. When we transition to V2 we will document everything that is required.

mccheah · 2017-05-19T18:15:01Z

I created #285 to discuss the actual transition. I'll work on a PR that switches the code paths and the documentation.

ash211 · 2017-05-22T23:18:02Z

Included in #283

…ache [NOSQUASH] Resync Apache

Log the driver exit status explicitly.

df2fc00

Close apache-spark-on-k8s#276

ash211 reviewed May 18, 2017

View reviewed changes

mccheah mentioned this pull request May 19, 2017

Monitor pod status in submission v2. #283

Merged

Merge branch 'branch-2.1-kubernetes' into k8s-improve-driver-exit-output

21be5d3

ash211 closed this May 22, 2017

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

Merge pull request apache-spark-on-k8s#282 from palantir/os/resync-ap…

a17d07c

…ache [NOSQUASH] Resync Apache

Log the driver exit status explicitly #282

Log the driver exit status explicitly #282

Uh oh!

Conversation

lins05 commented May 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How is the patch tested?

Uh oh!

ash211 May 18, 2017

Choose a reason for hiding this comment

Uh oh!

ash211 May 18, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah commented May 18, 2017

Uh oh!

ash211 commented May 18, 2017

Uh oh!

mccheah commented May 18, 2017

Uh oh!

ash211 commented May 18, 2017

Uh oh!

mccheah commented May 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mccheah commented May 19, 2017

Uh oh!

erikerlandson commented May 19, 2017

Uh oh!

mccheah commented May 19, 2017

Uh oh!

mccheah commented May 19, 2017

Uh oh!

ash211 commented May 22, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lins05 commented May 18, 2017 •

edited

Loading

mccheah commented May 18, 2017 •

edited

Loading