Skip to content

Conversation

tnachen
Copy link
Contributor

@tnachen tnachen commented Dec 16, 2015

SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.

We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.

@tnachen
Copy link
Contributor Author

tnachen commented Dec 16, 2015

@dragos PTAL

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

There seems to be a race condition :) @skyluc opened #10329, but the change is in SparkSubmit. I wonder which one we should take. We tested #10329 locally and it passed. This will take a while to re-test

@marmbrus
Copy link
Contributor

I would lean towards this patch since it only affects mesos and not standalone mode.

@dragos
Copy link
Contributor

dragos commented Dec 16, 2015

Agreed.

@skyluc
Copy link

skyluc commented Dec 16, 2015

Code LGTM. Unfortunately, I cannot try it before a couple of hours.

@tnachen
Copy link
Contributor Author

tnachen commented Dec 16, 2015

Yes I would also want to just make changes on the Mesos side and not cause any possible regression on standalone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add (SPARK-12345) here, but I'll fix this myself on merge.

@andrewor14
Copy link
Contributor

LGTM merging into master and 1.6. Just FYI I might revert this patch in master because I believe #10329 is a better fix in the long run, but for now let's just unblock the release.

asfgit pushed a commit that referenced this pull request Dec 16, 2015
…h Mesos cluster mode.

SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.

We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.

Author: Timothy Chen <[email protected]>

Closes #10332 from tnachen/scheduler_ui.

(cherry picked from commit ad8c1f0)
Signed-off-by: Andrew Or <[email protected]>
@asfgit asfgit closed this in ad8c1f0 Dec 16, 2015
@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #47830 has finished for PR 10332 at commit baea28f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there is a subtle error here, and this is a no-op. And nobody ran this code, it seems.

Here's what happens: environmentVariables is a map, not a sequence. So filter works on Pairs, and a pair will never be equal to a string. The correct call would have been filterKeys.

Unfortunately this went in RC3 without fixing the bug. It is harmless otherwise, but highlights the fact that there are no easy fixes or safe changes. :-/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really the problem, I think we should fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting I wonder if I ran it differently than what my code have, since I was able to see it not passed through.
Thanks for retesting this, I think having the automated tests is going to be crucial to prevent mistakes like this that I'm making :(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, odd. I ran this against DCOS and didn't see the error.

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 17, 2015
… server

Fix problem with apache#10332, this one should fix Cluster mode on Mesos

Author: Iulian Dragos <[email protected]>

Closes apache#10359 from dragos/issue/fix-spark-12345-one-more-time.
asfgit pushed a commit that referenced this pull request Dec 17, 2015
… server

Fix problem with #10332, this one should fix Cluster mode on Mesos

Author: Iulian Dragos <[email protected]>

Closes #10359 from dragos/issue/fix-spark-12345-one-more-time.

(cherry picked from commit 8184568)
Signed-off-by: Kousuke Saruta <[email protected]>
@andrewor14
Copy link
Contributor

OK I'm going to go ahead and revert this patch since it doesn't work...

@andrewor14
Copy link
Contributor

Oh wait, looks like #10359 which fixes this is already merged.

@dragos
Copy link
Contributor

dragos commented Dec 17, 2015

Yeah, just about to comment on that @andrewor14

@andrewor14
Copy link
Contributor

Note: I'm reverting this patch in master only since #10329, the better alternative, is merged there.
This patch continues to exist in branch-1.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants