[SPARK-1177] Allow SPARK_JAR to be set programmatically in system properties #987

dbtsai · 2014-06-05T22:46:02Z

No description provided.

AmplabJenkins · 2014-06-05T22:47:59Z

Build triggered.

vanzin · 2014-06-05T22:48:07Z

#560 has what I believe is a better way of handling this.

AmplabJenkins · 2014-06-05T22:52:58Z

Build triggered.

dbtsai · 2014-06-05T23:08:30Z

@chesterxgchen

#560 Agree, it's a more throughout way to handle this issue. In the code you have, it seems that the spark jar setting is moved to conf: SparkConf in favor of the CONF_SPARK_JAR. But it will make users difficult to set it up since the Client.scala also has to be changed. Simple question, with your change, how users can submit job with their own spark jar by passing the CONF_SPARK_JAR correctly?

def sparkJar(conf: SparkConf) = {
   if (conf.contains(CONF_SPARK_JAR)) {
     conf.get(CONF_SPARK_JAR)
   } else if (System.getenv(ENV_SPARK_JAR) != null) {
     logWarning(
      s"$ENV_SPARK_JAR detected in the system environment. This variable has been deprecated " 
       s"in favor of the $CONF_SPARK_JAR configuration variable.")
     System.getenv(ENV_SPARK_JAR)
   } else {
     SparkContext.jarOfClass(this.getClass).head
   }
 }

vanzin · 2014-06-06T00:08:07Z

There's no need to change Client.scala with my change; all you need to do is set "spark.yarn.jar" somewhere - JVM system property, spark-defaults.conf, of in the app's code (modifying the SparkConf instance), and it will be picked up by the Yarn code.

AmplabJenkins · 2014-06-06T00:53:31Z

Build started.

dbtsai · 2014-06-06T00:56:15Z

The app's code will only run in the application master in yarn-cluster mode, how can yarn client know which jar will be submitted to distributed cache if we set it in the app's spark conf?

vanzin · 2014-06-06T00:59:40Z

Ok, in cluster mode you can't use SparkConf.set(), but the other two options work fine. You can't do System.setProperty() in cluster mode to achieve that either, so even with your patch, you'd have to use -DSPARK_JAR=foo in the command line for it to work in yarn-cluster mode.

AmplabJenkins · 2014-06-06T01:18:43Z

Build finished.

AmplabJenkins · 2014-06-06T01:18:43Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15489/

dbtsai · 2014-06-06T02:12:04Z

We lunched Spark job inside our tomcat, and we directly use Client.scala API. With my patch, I can setup the spark jar using System.setProperty() before

  val sparkConf = new SparkConf
  val args = getArgsFromConf(conf)
  new Client(new ClientArguments(args, sparkConf), hadoopConfig, sparkConf).run

Do you mean that with your work, I can setup the jar location in the sparkConf which will be passed into the new Client?

Can we have the following in sparkJar method

def sparkJar(conf: SparkConf) = {
   if (conf.contains(CONF_SPARK_JAR)) {
     conf.get(CONF_SPARK_JAR)
   } else if (System.getProperty(ENV_SPARK_JAR) != null) {
     logWarning(
      s"$ENV_SPARK_JAR detected in the system property. This variable has been deprecated " 
       s"in favor of the $CONF_SPARK_JAR configuration variable.")
     System.getProperty(ENV_SPARK_JAR)
   } else if (System.getenv(ENV_SPARK_JAR) != null) {
     logWarning(
      s"$ENV_SPARK_JAR detected in the system environment. This variable has been deprecated " 
       s"in favor of the $CONF_SPARK_JAR configuration variable.")
     System.getenv(ENV_SPARK_JAR)
   } else {
     SparkContext.jarOfClass(this.getClass).head
   }
 }

vanzin · 2014-06-06T16:35:13Z

I mean you can set system properties the same way. SparkConf initializes its configuration from system properties, so my patch covers not only your case, but also others (like using a spark-defaults.conf file for spark-submit users).

dbtsai · 2014-06-06T17:38:21Z

Got you. Looking forward to having your patch merged. Thanks.

Sent from my Google Nexus 5
On Jun 6, 2014 9:35 AM, "Marcelo Vanzin" [email protected] wrote:

I mean you can set system properties the same way. SparkConf initializes
its configuration from system properties, so my patch covers not only your
case, but also others (like using a spark-defaults.conf file for
spark-submit users).

—
Reply to this email directly or view it on GitHub
#987 (comment).

dbtsai · 2014-07-11T18:01:35Z

#560 is merged. Close this PR.

* [CARMEL-6055] Backport code-gen code for SortMergeJoin PR list: * Cheng Su 2021/12/30, 12:12 PM [SPARK-37726][SQL] Add spill size metrics for sort merge join * Cheng Su 2021/11/19, 12:36 PM [SPARK-37370][SQL] Add SQL configs to control newly added join code-gen in 3.3 * Cheng Su 2021/11/17, 9:48 PM [SPARK-37316][SQL] Add code-gen for existence sort merge join * Cheng Su 2021/11/17, 10:44 AM [SPARK-37341][SQL] Avoid unnecessary buffer and copy in full outer sort merge join * Cheng Su 2021/11/15, 7:34 PM [SPARK-35352][SQL] Add code-gen for full outer sort merge join * Cheng Su 2021/11/3, 11:18 AM [SPARK-32567][SQL] Add code-gen for full outer shuffled hash join * Cheng Su 2021/6/2, 2:01 PM [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join * Cheng Su 2021/5/27, 12:59 PM [SPARK-35351][SQL][FOLLOWUP] Avoid using `loaded` variable for LEFT ANTI SMJ code-gen * Cheng Su 2021/5/18, 3:56 PM [SPARK-35351][SQL] Add code-gen for left anti sort merge join * Cheng Su 2021/5/17, 1:49 AM [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it * Cheng Su 2021/5/13, 8:52 PM [SPARK-35350][SQL] Add code-gen for left semi sort merge join * Cheng Su 2021/5/12, 10:10 PM [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join * Cheng Su 2021/5/11, 10:21 AM [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type * [CARMEL-6055] Backport code-gen code for SortMergeJoin * Backport pr for BroadcastNestedLoopJoin * Fix UT * Fix UT * Backport [CARMEL-3719] OOM in sort merge join * Backport [CARMEL-3719] OOM in sort merge join * fix ut * Enable code-gen for full outer sort merge join by default

dbtsai added 2 commits May 27, 2014 16:07

Allow users to programmatically set the spark jar.

196df1c

Doc update

bdff88a

Update doc

a3f0815

dbtsai closed this Jul 11, 2014

Uh oh!

[SPARK-1177] Allow SPARK_JAR to be set programmatically in system properties #987

[SPARK-1177] Allow SPARK_JAR to be set programmatically in system properties #987

Uh oh!

Conversation

dbtsai commented Jun 5, 2014

Uh oh!

AmplabJenkins commented Jun 5, 2014

Uh oh!

vanzin commented Jun 5, 2014

Uh oh!

AmplabJenkins commented Jun 5, 2014

Uh oh!

dbtsai commented Jun 5, 2014

Uh oh!

vanzin commented Jun 6, 2014

Uh oh!

AmplabJenkins commented Jun 6, 2014

Uh oh!

dbtsai commented Jun 6, 2014

Uh oh!

vanzin commented Jun 6, 2014

Uh oh!

AmplabJenkins commented Jun 6, 2014

Uh oh!

AmplabJenkins commented Jun 6, 2014

Uh oh!

dbtsai commented Jun 6, 2014

Uh oh!

vanzin commented Jun 6, 2014

Uh oh!

dbtsai commented Jun 6, 2014

Uh oh!

dbtsai commented Jul 11, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants