Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Jun 5, 2014

No description provided.

@AmplabJenkins
Copy link

Build triggered.

@vanzin
Copy link
Contributor

vanzin commented Jun 5, 2014

#560 has what I believe is a better way of handling this.

@AmplabJenkins
Copy link

Build triggered.

@dbtsai
Copy link
Member Author

dbtsai commented Jun 5, 2014

@chesterxgchen

#560 Agree, it's a more throughout way to handle this issue. In the code you have, it seems that the spark jar setting is moved to conf: SparkConf in favor of the CONF_SPARK_JAR. But it will make users difficult to set it up since the Client.scala also has to be changed. Simple question, with your change, how users can submit job with their own spark jar by passing the CONF_SPARK_JAR correctly?

def sparkJar(conf: SparkConf) = {
   if (conf.contains(CONF_SPARK_JAR)) {
     conf.get(CONF_SPARK_JAR)
   } else if (System.getenv(ENV_SPARK_JAR) != null) {
     logWarning(
      s"$ENV_SPARK_JAR detected in the system environment. This variable has been deprecated " 
       s"in favor of the $CONF_SPARK_JAR configuration variable.")
     System.getenv(ENV_SPARK_JAR)
   } else {
     SparkContext.jarOfClass(this.getClass).head
   }
 }

@vanzin
Copy link
Contributor

vanzin commented Jun 6, 2014

There's no need to change Client.scala with my change; all you need to do is set "spark.yarn.jar" somewhere - JVM system property, spark-defaults.conf, of in the app's code (modifying the SparkConf instance), and it will be picked up by the Yarn code.

@AmplabJenkins
Copy link

Build started.

@dbtsai
Copy link
Member Author

dbtsai commented Jun 6, 2014

The app's code will only run in the application master in yarn-cluster mode, how can yarn client know which jar will be submitted to distributed cache if we set it in the app's spark conf?

@vanzin
Copy link
Contributor

vanzin commented Jun 6, 2014

Ok, in cluster mode you can't use SparkConf.set(), but the other two options work fine. You can't do System.setProperty() in cluster mode to achieve that either, so even with your patch, you'd have to use -DSPARK_JAR=foo in the command line for it to work in yarn-cluster mode.

@AmplabJenkins
Copy link

Build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15489/

@dbtsai
Copy link
Member Author

dbtsai commented Jun 6, 2014

We lunched Spark job inside our tomcat, and we directly use Client.scala API. With my patch, I can setup the spark jar using System.setProperty() before

  val sparkConf = new SparkConf
  val args = getArgsFromConf(conf)
  new Client(new ClientArguments(args, sparkConf), hadoopConfig, sparkConf).run

Do you mean that with your work, I can setup the jar location in the sparkConf which will be passed into the new Client?

Can we have the following in sparkJar method

def sparkJar(conf: SparkConf) = {
   if (conf.contains(CONF_SPARK_JAR)) {
     conf.get(CONF_SPARK_JAR)
   } else if (System.getProperty(ENV_SPARK_JAR) != null) {
     logWarning(
      s"$ENV_SPARK_JAR detected in the system property. This variable has been deprecated " 
       s"in favor of the $CONF_SPARK_JAR configuration variable.")
     System.getProperty(ENV_SPARK_JAR)
   } else if (System.getenv(ENV_SPARK_JAR) != null) {
     logWarning(
      s"$ENV_SPARK_JAR detected in the system environment. This variable has been deprecated " 
       s"in favor of the $CONF_SPARK_JAR configuration variable.")
     System.getenv(ENV_SPARK_JAR)
   } else {
     SparkContext.jarOfClass(this.getClass).head
   }
 }

@vanzin
Copy link
Contributor

vanzin commented Jun 6, 2014

I mean you can set system properties the same way. SparkConf initializes its configuration from system properties, so my patch covers not only your case, but also others (like using a spark-defaults.conf file for spark-submit users).

@dbtsai
Copy link
Member Author

dbtsai commented Jun 6, 2014

Got you. Looking forward to having your patch merged. Thanks.

Sent from my Google Nexus 5
On Jun 6, 2014 9:35 AM, "Marcelo Vanzin" [email protected] wrote:

I mean you can set system properties the same way. SparkConf initializes
its configuration from system properties, so my patch covers not only your
case, but also others (like using a spark-defaults.conf file for
spark-submit users).


Reply to this email directly or view it on GitHub
#987 (comment).

@dbtsai
Copy link
Member Author

dbtsai commented Jul 11, 2014

#560 is merged. Close this PR.

@dbtsai dbtsai closed this Jul 11, 2014
wangyum pushed a commit that referenced this pull request May 26, 2023
* [CARMEL-6055] Backport code-gen code for SortMergeJoin

PR list:
 
* Cheng Su 2021/12/30, 12:12 PM [SPARK-37726][SQL] Add spill size metrics for sort merge join
* Cheng Su 2021/11/19, 12:36 PM [SPARK-37370][SQL] Add SQL configs to control newly added join code-gen in 3.3
* Cheng Su 2021/11/17, 9:48 PM [SPARK-37316][SQL] Add code-gen for existence sort merge join
* Cheng Su 2021/11/17, 10:44 AM [SPARK-37341][SQL] Avoid unnecessary buffer and copy in full outer sort merge join
* Cheng Su 2021/11/15, 7:34 PM [SPARK-35352][SQL] Add code-gen for full outer sort merge join
* Cheng Su 2021/11/3, 11:18 AM [SPARK-32567][SQL] Add code-gen for full outer shuffled hash join
* Cheng Su 2021/6/2, 2:01 PM [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join
* Cheng Su 2021/5/27, 12:59 PM [SPARK-35351][SQL][FOLLOWUP] Avoid using `loaded` variable for LEFT ANTI SMJ code-gen
* Cheng Su 2021/5/18, 3:56 PM [SPARK-35351][SQL] Add code-gen for left anti sort merge join
* Cheng Su 2021/5/17, 1:49 AM [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it
* Cheng Su 2021/5/13, 8:52 PM [SPARK-35350][SQL] Add code-gen for left semi sort merge join
* Cheng Su 2021/5/12, 10:10 PM [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
* Cheng Su 2021/5/11, 10:21 AM [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type

* [CARMEL-6055] Backport code-gen code for SortMergeJoin

* Backport pr for BroadcastNestedLoopJoin

* Fix UT

* Fix UT

* Backport [CARMEL-3719] OOM in sort merge join

* Backport [CARMEL-3719] OOM in sort merge join

* fix ut

* Enable code-gen for full outer sort merge join by default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants