Skip to content

Conversation

@msannell
Copy link
Contributor

@msannell msannell commented Jun 1, 2015

This is a simple change to add a new environment variable
"spark.sparkr.r.command" that specifies the command that SparkR will
use when creating an R engine process. If this is not specified,
"Rscript" will be used by default.

I did not add any documentation, since I couldn't find any place where
environment variables (such as "spark.sparkr.use.daemon") are
documented.

I also did not add a unit test. The only test that would work
generally would be one starting SparkR with
sparkR.init(sparkEnvir=list(spark.sparkr.r.command="Rscript")), just
using the default value. I think that this is a low-risk change.

Likely committers: @shivaram

@shivaram
Copy link
Contributor

shivaram commented Jun 1, 2015

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented Jun 1, 2015

Test build #33916 has finished for PR 6557 at commit 7eac142.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Jun 1, 2015

Thanks @msannell for the change.

@davies -- In python we use an environment variable PYSPARK_PYTHON to do this. Any reason we should use an environment variable instead of an option ?

@davies
Copy link
Contributor

davies commented Jun 1, 2015

SparkConf is the consistent way to manage configurations, we are moving away from environment variables since Spark 1.0, but still keep compatibility for old environment variables.

Sometimes, environment variables will be easy to use than SparkConf, for example, we can switch the version of Python in a single line:

PYSPARK_PYTHON=pypy pypy xxx.py

@JoshRosen may knows more about PYSPARK_PYTHON

@SparkQA
Copy link

SparkQA commented Jun 1, 2015

Test build #866 has finished for PR 6557 at commit 7eac142.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Jun 1, 2015

Jenkins, retest this please

@shivaram
Copy link
Contributor

shivaram commented Jun 1, 2015

So we already have a SPARKR_DRIVER_R which we use to build commands in Spark Submit on the driver side. I think we should unify these but I need to check if we can use an env variable everywhere or if we can use the config variable at both places

@msannell
Copy link
Contributor Author

msannell commented Jun 1, 2015

In my work, I haven't used spark-submit: I start R (or TERR), and
access SparkR from there.

Note that SPARKR_DRIVER_R defaults to "R", whereas my proposed
variable would default to "Rscript", which is a slightly different
application. I agree that it would be nice to find a way to unify
these.

@SparkQA
Copy link

SparkQA commented Jun 1, 2015

Test build #33925 has finished for PR 6557 at commit 7eac142.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@davies @shivaram any updates on this?

@davies
Copy link
Contributor

davies commented Jun 29, 2015

LGTM. We can document this later.

@andrewor14
Copy link
Contributor

OK merging into master.

@asfgit asfgit closed this in 4a9e03f Jun 30, 2015
@quasiben
Copy link

Was this officially documented ?

@felixcheung
Copy link
Member

This is updated by #9179

@quasiben
Copy link

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants