[SPARK-20585][SPARKR] R generic hint support #17851

zero323 · 2017-05-04T00:35:40Z

What changes were proposed in this pull request?

Adds support for generic hints on SparkDataFrame

How was this patch tested?

Unit tests, check-cran.sh

SparkQA · 2017-05-04T00:40:41Z

Test build #76435 has finished for PR 17851 at commit 261e5a6.

This patch fails R style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-04T01:20:38Z

Test build #76436 has finished for PR 17851 at commit ee52b53.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-05-04T04:20:02Z

R/pkg/R/DataFrame.R

+#'
+#' @param x a SparkDataFrame.
+#' @param name a name of the hint.
+#' @param ... additional argument(s) passed to the method.


in this case the ... is actually meaningful, so I'd suggest documenting it, eg. similar to scala, "(optional) properties"

Scala is even more cryptic here. I adjusted it a bit, but I think we can revisit this once we have some practical examples.

hmm, yes ;)

felixcheung · 2017-05-04T04:22:39Z

R/pkg/R/DataFrame.R

+#' @param x a SparkDataFrame.
+#' @param name a name of the hint.
+#' @param ... additional argument(s) passed to the method.
+#'


nit: remove empty line

felixcheung · 2017-05-04T04:23:29Z

R/pkg/R/DataFrame.R

+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")


you recreated createDataFrame(mtcars) here, do you mean to use df from the line before?

or you want to show these are two different dataset? maybe it's worthwhile to comment that

I wanted to use different datasets to avoid aliasing and trivially equal warning. It works but it is confusing (note: equi-join syntax same as in Scala or Python would be great, and it shouldn't be that hard to add). Once we merge alias this shouldn't been an issue. Since we don't run it, I can add aliases now.

alias is going to 2.3, this is going to 2.2 - I think we leave this for now and can improve this in master later

Also with alias it will be quite dense:

#' @examples #' \dontrun{ #' # Set aliases to avoid ambiguity #' df <- alias(createDataFrame(mtcars), "cars") #' avg_mpg <- alias(mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg"), "avg_mpg") #' #' head(join( #' df, hint(avg_mpg, "broadcast"), #' column("cars.cyl") == column("avg_mpg.cyl") #' )) #' }

right - I think the example makes sense now but it might not be very obvious - for example,

createDataFrame(mtcars) createDataFrame(mtcars)

vs

df <- createDataFrame(mtcars) df

is not very subtle unless you know what Spark is doing differently here. This is why I suggested pointing out the need to have distinct "copies" of data

felixcheung · 2017-05-04T04:25:55Z

R/pkg/R/DataFrame.R

+
+#' hint
+#'
+#' Specifies execution plan hint on the current SparkDataFrame.


the R programming model is a bit different - I think it is better to point out the original SparkDataFrame is not actually changed - instead say .... hint and return a new SparkDataFrame is better

ouch sorry I didn't mean to put ".... hint and return a new SparkDataFrame"
but "Specifies execution plan hint and return a new SparkDataFrame"

felixcheung · 2017-05-04T04:28:16Z

R/pkg/inst/tests/testthat/test_sparkSQL.R

+  execution_plan_hint <- capture.output(
+    explain(join(df1, hint(df2, "broadcast"), df1$id == df2$id))
+  )
+  expect_true(any(grepl("BroadcastHashJoin", execution_plan_hint)))


felixcheung · 2017-05-04T04:28:56Z

R/pkg/R/generics.R

+#' @rdname hint
+#' @export
+setGeneric("hint", function(x, name, ...) { standardGeneric("hint") })
+


this should move after groupBy, I think

zero323 · 2017-05-04T06:12:13Z

@felixcheung Do you think this makes o.a.s.sql.functions.broadcast obsolete? I a have WIP on this but it is a tricky one. There is an internal, non-generic broadcast, with different signature so we'd have to either adjust it. or use different name (broadcast_df, broadcast_table`?).

SparkQA · 2017-05-04T06:43:53Z

Test build #76442 has finished for PR 17851 at commit 1183441.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-05-04T07:41:08Z

R/pkg/R/DataFrame.R

+
+#' hint
+#'
+#' Specifies execution plan hint on the current SparkDataFrame.


ouch sorry I didn't mean to put ".... hint and return a new SparkDataFrame"
but "Specifies execution plan hint and return a new SparkDataFrame"

felixcheung · 2017-05-04T07:43:03Z

do you mean this?

we can talk about what to do a bit later.

felixcheung

LGTM. Waiting on Jenkins.

felixcheung · 2017-05-04T07:53:49Z

it looks like AppVeyor is stuck since about 22 hrs ago..

SparkQA · 2017-05-04T08:21:42Z

Test build #76449 has finished for PR 17851 at commit e6c6d82.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Adds support for generic hints on `SparkDataFrame` Unit tests, `check-cran.sh` Author: zero323 <[email protected]> Closes #17851 from zero323/SPARK-20585. (cherry picked from commit 9c36aa2) Signed-off-by: Felix Cheung <[email protected]>

felixcheung · 2017-05-04T08:44:22Z

merged to master/2.2

zero323 · 2017-05-04T08:45:55Z

Thanks.

rxin · 2017-05-05T18:48:09Z

@felixcheung was this merged only in master but not branch-2.2?

felixcheung · 2017-05-06T01:00:26Z

This is branch-2.2? 3f5c548 It missed the rc2 by a few hours though

zero323 added 2 commits May 4, 2017 01:54

Initial implementation

e21d51e

Add since note

261e5a6

Fix style

ee52b53

felixcheung requested changes May 4, 2017

View reviewed changes

zero323 added 4 commits May 4, 2017 07:34

Put hint generic in the right place

a1e9233

Remove empty line

633b038

Adjust hint description

3ed4d76

Adjust ... description

1183441

felixcheung requested changes May 4, 2017

View reviewed changes

Adujst description

e6c6d82

felixcheung approved these changes May 4, 2017

View reviewed changes

asfgit closed this in 9c36aa2 May 4, 2017

zero323 deleted the SPARK-20585 branch May 8, 2017 09:07

[SPARK-20585][SPARKR] R generic hint support #17851

[SPARK-20585][SPARKR] R generic hint support #17851

Uh oh!

Conversation

zero323 commented May 4, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 4, 2017

Uh oh!

SparkQA commented May 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zero323 commented May 4, 2017

Uh oh!

SparkQA commented May 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

felixcheung commented May 4, 2017

Uh oh!

SparkQA commented May 4, 2017

Uh oh!

felixcheung commented May 4, 2017

Uh oh!

zero323 commented May 4, 2017

Uh oh!

rxin commented May 5, 2017

Uh oh!

felixcheung commented May 6, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

felixcheung commented May 4, 2017 •

edited

Loading