Skip to content

Conversation

@andrewor14
Copy link
Contributor

The bulk of this PR is comprised of tests. All changes in functionality are made in SparkSubmit.scala (~20 lines).

SPARK-3319. There is currently a divergence in behavior when the user passes in additional jars through --jars and through setting spark.jars in the default properties file. The former will happily resolve the paths (e.g. convert my.jar to file:/absolute/path/to/my.jar), while the latter does not. We should resolve paths consistently in both cases. This also applies to the following pairs of command line arguments and Spark configs:

  • --jars ~ spark.jars
  • --files ~ spark.files / spark.yarn.dist.files
  • --archives ~ spark.yarn.dist.archives
  • --py-files ~ spark.submit.pyFiles

SPARK-3338. This PR also fixes the following bug: if the user sets spark.submit.pyFiles in his/her properties file, it does not actually get picked up even if --py-files is not set. This is simply because the config is overridden by an empty string.

@SparkQA
Copy link

SparkQA commented Sep 1, 2014

QA tests have started for PR 2232 at commit 05e03d6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 2, 2014

QA tests have finished for PR 2232 at commit 05e03d6.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14 andrewor14 changed the title [SPARK-3319 / 3338] Resolve Spark submit config paths [SPARK-3319] [SPARK-3338] Resolve Spark submit config paths Sep 3, 2014
@andrewor14
Copy link
Contributor Author

@sryza can you look at this?

@SparkQA
Copy link

SparkQA commented Sep 17, 2014

QA tests have started for PR 2232 at commit f0fae64.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 17, 2014

QA tests have finished for PR 2232 at commit f0fae64.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sryza
Copy link
Contributor

sryza commented Sep 17, 2014

Could this change behavior in cases where the spark.yarn.dist.files is configured with no scheme? Without this change, it would interpret no scheme to mean that it's on HDFS, and with it, it would interpret it to mean that it's on the local FS?

@andrewor14
Copy link
Contributor Author

Yes, this changes the behavior if the user sets spark.yarn.dist.*. Note that the whole point of this PR is to fix the divergence in path resolution behavior in the following two cases: (1) if the user specifies --files, and (2) if the user specifies the corresponding config, which in the case of yarn-client is spark.yarn.dist.files. Currently, we resolve the paths only for (1) but not for (2).

However I believe these configs are actually introduced before Spark submit. Do you think we should leave the yarn ones out in this PR to keep backward compatibility?

@sryza
Copy link
Contributor

sryza commented Sep 18, 2014

Hmm. My feeling is that it's better to be consistent here and consider the old behavior a bug than to maintain compatibility than to support a cornerish case for a property that's rarely used. Just wanted to bring it up. @pwendell do you have an opinion on this?

Either way, for each of these properties, we should document whether they can take HDFS paths and the default behavior when no scheme is given. Also, are there any strange corner cases we can run into if a file exists at the same path on the local FS and HDFS?

Last, do we need to also resolve URIs for spark.yarn.jar?

@andrewor14
Copy link
Contributor Author

Yes, probably. This list is by no means complete. For spark.yarn.dist.* however, it seems that there is some prior discussion on keeping it unresolved (i.e. relative paths become HDFS paths) here: #969. @tgravescs and @vanzin may have objections on this.

@tgravescs
Copy link
Contributor

From the other pr we need to keep backwards compatibility for the env variables, the configs should be fine. I believe this pr matches the behavior described below. Please correct me if I'm wrong as I haven't looked at the code in detail.

The behavior we want is and should have been the behavior as of pr969:

  • --archives/--files via sparksubmit defaults to use file:// if not specified, for both yarn-client and yarn-cluster
  • spark.yarn.dist.archives/spark.yarn.dist.files defaults to use file:// if not specified, for both yarn-client and yarn-cluster
  • env variable SPARK_YARN_DIST_ARCHIVES/SPARK_YARN_DIST_FILES set in yarn-client then it should default to hdfs://
  • --files/--archives specified from spark-class in yarn-cluster mode then it should default to hdfs://

@andrewor14
Copy link
Contributor Author

Hi @tgravescs, I believe every point of the behavior you listed is correct and preserved in this PR, since it only affects spark.yarn.dist.* and these are resolved to file:// if relative paths are provided. Then it seems OK to keep the spark.yarn.dist.* in the list of things to resolve in SparkSubmit here.

Andrew Or added 2 commits October 30, 2014 12:29
…paths

Conflicts:
	core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
@andrewor14
Copy link
Contributor Author

Alright, I just resolved the merge conflict and added spark.yarn.jar. @sryza @tgravescs I believe I addressed all your comments. Anything else?

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22549 has started for PR 2232 at commit fff2869.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22549 has finished for PR 2232 at commit fff2869.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22549/
Test PASSed.

@sryza
Copy link
Contributor

sryza commented Oct 30, 2014

This looks good to me.

@andrewor14
Copy link
Contributor Author

Alright merging this.

@asfgit asfgit closed this in 24c5129 Oct 30, 2014
@andrewor14 andrewor14 deleted the resolve-config-paths branch October 30, 2014 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants