Skip to content

Conversation

@sun-rui
Copy link
Contributor

@sun-rui sun-rui commented Oct 20, 2015

Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes.

The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also launch R worker script.

BTW, envrionment variable "SPARKR_DRIVER_R" is used to locate R shell on the local host.

For your information, PYSPARK has two environment variables serving simliar purpose:
PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is python).
PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).
pySpark use the code here to determine the python executable for a python script.

@tgravescs
Copy link
Contributor

thanks for working on this. The changes look fine to me. Obviously need someone more familiar with the R stuff to review too.

Is there any documentation for the R configs that should be updated for this?

@SparkQA
Copy link

SparkQA commented Oct 20, 2015

Test build #43974 has finished for PR 9179 at commit 3de695c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

+1 on a documentation update, otherwise this won't be discoverable. I'll defer to a reviewer with more R experience for final sign-off and merge.

@shivaram
Copy link
Contributor

Yeah this can go under the SparkR section at http://spark.apache.org/docs/latest/configuration.html#sparkr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this actually happen ? I thought we only used RRunner when we launched non-shell jobs (i.e. when we are executing scripts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we only use RRunner when we launched non-shell jobs (i.e. when we are executing scripts). But R scripts can be executed in either client mode or cluster mode. In client mode, the R script is to be executed on the local host, while in cluster mode, it is to be executed on a selected worker node.

@felixcheung
Copy link
Member

Does SPARKR_DRIVER_R map to spark.sparkr.r.driver.command?

@felixcheung
Copy link
Member

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 21, 2015

@felixcheung, no, SPARKR_DRIVER_R is for the R shell executable on the local host, while "spark.sparkr.r.driver.command" is intended for the Rscript script engine on the local host.

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 21, 2015

There seems some in-consistency for the configurations, some of which are spark conf options, the other is an env variable. I don't have strong opinion on this, and try to keep them as is for backward compatibility

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 21, 2015

Will update the doc until this new conf option is agreed.

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 21, 2015

@felixcheung, you are correct, the value of PYSPARK_PYTHON is passed to workers. In sparkR, "spark.sparkr.r.command" is a spark conf option, and will be passed to workers. They are passed to worker in different ways.

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 21, 2015

Add safeguard that 'spark.submit.deployMode' may be missing. This is for consistency with SPARK-10711

@felixcheung
Copy link
Member

Got it, I don't know what is the plan with env vs. spark conf, but maybe we want consistency here.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44030 has finished for PR 9179 at commit 817cc58.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

FWIW Spark config variables are preferred to environment variables if we are adding something new.

@shivaram
Copy link
Contributor

@sun-rui Code change looks pretty good. Could we also make the docs change in this same PR or were you planning on a separate PR ?

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 21, 2015

@shivaram, I will update the documentation

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 22, 2015

@shivaram, documentation updated.

@shivaram
Copy link
Contributor

LGTM. Thanks @sun-rui -- Will merge after Jenkins passes

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44130 has finished for PR 9179 at commit bfd42e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 22, 2015

@shivaram, some thought on the naming consistency of options.
For pySpark, the name convention is spark.python.xxx, for example, spark.python.profile
And we a documented SparkR config option, that is: spark.r.numRBackendThreads

in this PR, the names of the two options to be documented are:
"spark.sparkr.r.driver.command" and "spark.sparkr.r.command"

I am thinking rename them to
"spark.r.driver.rscript" and "spark.r.rscript" for better naming consistency.

There is backward compatibility issue for renaming "spark.sparkr.r.command" to "spark.r.rscript", but I think since it is not documented before, it won't cost too much.

If you have no strong preference for this thought, you can merge this PR.
If you agree, I can update this PR.

@shivaram
Copy link
Contributor

Hmm lets use spark.r.driver.command and spark.r.command. Its better to use command than rscript. For the backwards compatibility thing can we just support both in the code but only document spark.r.command ?

@felixcheung
Copy link
Member

+1 on spark.r.driver.command and spark.r.command

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 24, 2015

@shivaram, good idea, agree.

@SparkQA
Copy link

SparkQA commented Oct 24, 2015

Test build #44277 has finished for PR 9179 at commit 4f44138.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Thanks @sun-rui -- LGTM. Merging this

asfgit pushed a commit that referenced this pull request Oct 24, 2015
Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes.

The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395).

BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host.

For your information, PYSPARK has two environment variables serving simliar purpose:
PYSPARK_PYTHON	      Python binary executable to use for PySpark in both driver and workers (default is `python`).
PYSPARK_DRIVER_PYTHON	Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).
pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script.

Author: Sun Rui <[email protected]>

Closes #9179 from sun-rui/SPARK-10971.

(cherry picked from commit 2462dbc)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in 2462dbc Oct 24, 2015
@msannell
Copy link
Contributor

I believe I've noticed a problem with this fix. You fixed
core/src/main/scala/org/apache/spark/deploy/RRunner.scala to handle
"spark.r.command" (with nice support for the depreciated
"spark.sparkr.r.command"), but you didn't change
core/src/main/scala/org/apache/spark/api/r/RRDD.scala, which still
just uses "spark.sparkr.r.command". This should also be changed.

@sun-rui
Copy link
Contributor Author

sun-rui commented Oct 30, 2015

@msannell, good catch. Thanks! I will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants