Skip to content

Commit 9cfa9a5

Browse files
Sun Ruishivaram
authored andcommitted
[SPARK-6812] [SPARKR] filter() on DataFrame does not work as expected.
According to the R manual: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html, " if a function .First is found on the search path, it is executed as .First(). Finally, function .First.sys() in the base package is run. This calls require to attach the default packages specified by options("defaultPackages")." In .First() in profile/shell.R, we load SparkR package. This means SparkR package is loaded before default packages. If there are same names in default packages, they will overwrite those in SparkR. This is why filter() in SparkR is masked by filter() in stats, which is usually in the default package list. We need to make sure SparkR is loaded after default packages. The solution is to append SparkR to default packages, instead of loading SparkR in .First(). BTW, I'd like to discuss our policy on how to solve name conflict. Previously, we rename API names from Scala API if there is name conflict with base or other commonly-used packages. However, from long term perspective, this is not good for API stability, because we can't predict name conflicts, for example, if in the future a name added in base package conflicts with an API in SparkR? So the better policy is to keep API name same as Scala's without worrying about name conflicts. When users use SparkR, they should load SparkR as last package, so that all API names are effective. Use can explicitly use :: to refer to hidden names from other packages. If we agree on this, I can submit a JIRA issue to change back some rename API methods, for example, DataFrame.sortDF(). Author: Sun Rui <[email protected]> Closes #5938 from sun-rui/SPARK-6812 and squashes the following commits: b569145 [Sun Rui] [SPARK-6812][SparkR] filter() on DataFrame does not work as expected.
1 parent 773aa25 commit 9cfa9a5

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

R/pkg/inst/profile/shell.R

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,13 @@
2020
.libPaths(c(file.path(home, "R", "lib"), .libPaths()))
2121
Sys.setenv(NOAWT=1)
2222

23-
library(utils)
24-
library(SparkR)
25-
sc <- sparkR.init(Sys.getenv("MASTER", unset = ""))
23+
# Make sure SparkR package is the last loaded one
24+
old <- getOption("defaultPackages")
25+
options(defaultPackages = c(old, "SparkR"))
26+
27+
sc <- SparkR::sparkR.init(Sys.getenv("MASTER", unset = ""))
2628
assign("sc", sc, envir=.GlobalEnv)
27-
sqlCtx <- sparkRSQL.init(sc)
29+
sqlCtx <- SparkR::sparkRSQL.init(sc)
2830
assign("sqlCtx", sqlCtx, envir=.GlobalEnv)
2931
cat("\n Welcome to SparkR!")
3032
cat("\n Spark context is available as sc, SQL context is available as sqlCtx\n")

0 commit comments

Comments
 (0)