Skip to content

Commit 9695f45

Browse files
Sun Ruishivaram
authored andcommitted
[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript.
Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes. The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395). BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host. For your information, PYSPARK has two environment variables serving simliar purpose: PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is `python`). PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script. Author: Sun Rui <[email protected]> Closes #9179 from sun-rui/SPARK-10971. (cherry picked from commit 2462dbc) Signed-off-by: Shivaram Venkataraman <[email protected]>
1 parent 03d3ad4 commit 9695f45

File tree

2 files changed

+28
-1
lines changed

2 files changed

+28
-1
lines changed

core/src/main/scala/org/apache/spark/deploy/RRunner.scala

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,16 @@ object RRunner {
3939

4040
// Time to wait for SparkR backend to initialize in seconds
4141
val backendTimeout = sys.env.getOrElse("SPARKR_BACKEND_TIMEOUT", "120").toInt
42-
val rCommand = "Rscript"
42+
val rCommand = {
43+
// "spark.sparkr.r.command" is deprecated and replaced by "spark.r.command",
44+
// but kept here for backward compatibility.
45+
var cmd = sys.props.getOrElse("spark.sparkr.r.command", "Rscript")
46+
cmd = sys.props.getOrElse("spark.r.command", cmd)
47+
if (sys.props.getOrElse("spark.submit.deployMode", "client") == "client") {
48+
cmd = sys.props.getOrElse("spark.r.driver.command", cmd)
49+
}
50+
cmd
51+
}
4352

4453
// Check if the file path exists.
4554
// If not, change directory to current working directory for YARN cluster mode

docs/configuration.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1564,6 +1564,20 @@ Apart from these, the following properties are also available, and may be useful
15641564
Number of threads used by RBackend to handle RPC calls from SparkR package.
15651565
</td>
15661566
</tr>
1567+
<tr>
1568+
<td><code>spark.r.command</code></td>
1569+
<td>Rscript</td>
1570+
<td>
1571+
Executable for executing R scripts in cluster modes for both driver and workers.
1572+
</td>
1573+
</tr>
1574+
<tr>
1575+
<td><code>spark.r.driver.command</code></td>
1576+
<td>spark.r.command</td>
1577+
<td>
1578+
Executable for executing R scripts in client modes for driver. Ignored in cluster modes.
1579+
</td>
1580+
</tr>
15671581
</table>
15681582

15691583
#### Cluster Managers
@@ -1603,6 +1617,10 @@ The following variables can be set in `spark-env.sh`:
16031617
<td><code>PYSPARK_DRIVER_PYTHON</code></td>
16041618
<td>Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).</td>
16051619
</tr>
1620+
<tr>
1621+
<td><code>SPARKR_DRIVER_R</code></td>
1622+
<td>R binary executable to use for SparkR shell (default is <code>R</code>).</td>
1623+
</tr>
16061624
<tr>
16071625
<td><code>SPARK_LOCAL_IP</code></td>
16081626
<td>IP address of the machine to bind to.</td>

0 commit comments

Comments
 (0)