Skip to content

Conversation

@mfliu
Copy link
Contributor

@mfliu mfliu commented Oct 8, 2015

I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file.
Pull request because I filed this JIRA bug report:
https://issues.apache.org/jira/browse/SPARK-10981

@shivaram
Copy link
Contributor

shivaram commented Oct 8, 2015

Thanks for PR @mfliu - Could you format the PR title as [SPARKR] [SPARK-10981] SparkR Join improvements ? More instructions are at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PreparingtoContributeCodeChanges

@shivaram
Copy link
Contributor

shivaram commented Oct 8, 2015

Jenkins, ok to test

@shivaram
Copy link
Contributor

shivaram commented Oct 8, 2015

cc @sun-rui

@mfliu mfliu changed the title SparkR joins. Used DataFrame.R and test_sparkSQL.R from Spark 1.5.1 [SPARKR] [SPARK-10981] SparkR Join improvements Oct 8, 2015
@mfliu
Copy link
Contributor Author

mfliu commented Oct 8, 2015

The errors in the log file don't seem to be related to the changes I made? They are primarily in PythonRDD.scala:
java.net.SocketException: Socket is closed

org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:203)
at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)

Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:139)

Errors seem to be from PythonRDD.scala, but I made no changes to that file, and I'm not sure how changing R code affects that interface?

Can someone else take a look? @shivaram, @sun-rui

@shivaram
Copy link
Contributor

shivaram commented Oct 8, 2015

@mfliu I think thats just a flaky test, but irrespective of that your changes should be against the current master branch. Right now it looks like there are a lot more lines in the diff because of the change being against 1.5.1 ? For fixing this in branch-1.5, we will first merge with master and then during merge we can backport it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is undoing a recent PR, could you check?

@mfliu
Copy link
Contributor Author

mfliu commented Oct 9, 2015

@felixcheung Yes, you are correct. The arrange function was different. I pulled again and changed those files and it is working on my machine. Can you test again?

@SparkQA
Copy link

SparkQA commented Oct 9, 2015

Test build #43468 has finished for PR 9029 at commit d4a1ed3.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

expect_equal(count(joined7, 3)) was changed to expect_equal(count(joined7), 3)
@mfliu
Copy link
Contributor Author

mfliu commented Oct 9, 2015

Sorry, had a typo in one of my unit tests. It now passes the run-tests.sh on my machine. Can you test again?

@SparkQA
Copy link

SparkQA commented Oct 9, 2015

Test build #43469 has finished for PR 9029 at commit efa072c.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2015

Test build #43473 has finished for PR 9029 at commit 216be37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we should change right_outer to rightouter ? It'll break code that used to work with previous versions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change was made based on the comment on the JIRA report:
https://issues.apache.org/jira/browse/SPARK-10981

In the PR, please:

  1. Support all join types defined in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala (You can remove the "_" char from the currently supported join types in SparkR)
  2. Add test cases for missing join types including "leftsemi"

Perhaps I misunderstood?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 breaking API changes, IMO we really need to come up on some policy on that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add "right_outer" and "left_outer" along with "rightouter" and "leftouter"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, API compatibility is a concern. So we can make R code consistent with the scala version at https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala#L21

That is, replace the "_" char in the join type string with empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so lets support both right_outer and rightouter. That way we don't break backwards compatibility. One simple way to do this as @sun-rui said is to just replace all "_"s in the join string with "" using gsub or something like that.

@SparkQA
Copy link

SparkQA commented Oct 13, 2015

Test build #43664 has finished for PR 9029 at commit 9603722.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 13, 2015

Test build #43666 has finished for PR 9029 at commit a67965a.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

@mfliu Just FYI, you can check the lint-r tests locally by running the script dev/lint-r

@SparkQA
Copy link

SparkQA commented Oct 13, 2015

Test build #43668 has finished for PR 9029 at commit d4eff64.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Jenkins, retest this please

Merge remote-tracking branch 'upstream/master'
@SparkQA
Copy link

SparkQA commented Oct 13, 2015

Test build #43671 has finished for PR 9029 at commit d4eff64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 13, 2015

Test build #43673 has finished for PR 9029 at commit 8813b1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

looks good

@sun-rui
Copy link
Contributor

sun-rui commented Oct 14, 2015

LGTM

@shivaram
Copy link
Contributor

Thanks @mfliu - LGTM. Merging this

asfgit pushed a commit that referenced this pull request Oct 14, 2015
I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file.
Pull request because I filed this JIRA bug report:
https://issues.apache.org/jira/browse/SPARK-10981

Author: Monica Liu <[email protected]>

Closes #9029 from mfliu/master.

(cherry picked from commit 8b32885)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in 8b32885 Oct 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants