[SPARK-10720][SQL][JAVA] Add a java wrapper to create a dataframe from a local list of java beans #8879

holdenk · 2015-09-23T06:44:24Z

Similar to SPARK-10630 it would be nice if Java users didn't have to parallelize there data explicitly (as Scala users already can skip). Issue came up in http://stackoverflow.com/questions/32613413/apache-spark-machine-learning-cant-get-estimator-example-to-work

SparkQA · 2015-09-23T08:52:31Z

Test build #42893 has finished for PR 8879 at commit decf0c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…-dataframe-from-a-local-list-of-java-beans

SparkQA · 2015-09-23T23:50:52Z

Test build #42937 has finished for PR 8879 at commit 73f0537.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2015-09-24T00:01:20Z

I think thats spurious, jenkins retest this please.

SparkQA · 2015-09-24T01:48:47Z

Test build #42943 has finished for PR 8879 at commit 73f0537.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-09-24T08:28:33Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

Can any of this be refactored to share implementation with the RDD version? there's some non-trivial duplication here.

Perhaps, I could take the code to work on Iterators of Rows and move that to a shared function they could both use (along with have it pass in bean info since in local mode we don't need to construct the class by name it already exists). I could also just prallelize it and call the RDD implementation, but I figured it would be better to use the LocalRelation as we did with the List of Rows.

Up to your judgment since you know the details better of course. If it's not worth the overhead to factor this out then leave it.

I'll take another look at it with fresh eyes tomorrow.

…is the correct place for this (the other functions in SQLContext are all related to the singleton, but it seemed like the right place. I could put it in a Utils object or similar if this isn't the best place for it).

holdenk · 2015-09-25T21:13:56Z

@srowen so I refactored the shared code, but for serialization reasons I put it in the companion object. Let me know if this looks good to you :)

This reverts commit 67b0c82.

This reverts commit 8cb2ce6.

SparkQA · 2015-09-26T02:07:13Z

Test build #43037 has finished for PR 8879 at commit 8a3c124.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-09-26T07:33:37Z

Yeah looks good to me. Let me hold it open a day or so to see anyone has comments

holdenk · 2015-09-27T20:08:54Z

@srowen just pinging since its been a few days and no extra comments have come in :)

srowen · 2015-09-27T20:15:10Z

Fair enough yeah I'm back online merging things now. Time for it.

srowen · 2015-09-27T20:16:34Z

Merged to master

holdenk added 4 commits September 22, 2015 23:22

Start trying to add create dataframe for javabeans from local java list

2f6369e

Compiles now

2f53975

Add a test for creation through list as well

3e7737a

Class already exists.

decf0c5

holdenk changed the title ~~[SPARK-10720][SQL][JAVA][WIP] Add a java wrapper to create a dataframe from a local list of java beans~~ [SPARK-10720][SQL][JAVA] Add a java wrapper to create a dataframe from a local list of java beans Sep 23, 2015

Merge branch 'master' into SPARK-10720-add-a-java-wrapper-to-create-a…

73f0537

…-dataframe-from-a-local-list-of-java-beans

srowen reviewed Sep 24, 2015
View reviewed changes

holdenk added 2 commits September 25, 2015 14:00

Refactor to share beansToRows between the RDD & List based creations

42ca8a3

holdenk added 4 commits September 25, 2015 14:33

Clear SPARK_YARN_MODE property on context shutdown

8cb2ce6

Clear Spark property during yarn client stop as well

67b0c82

Revert "Clear Spark property during yarn client stop as well"

975c241

This reverts commit 67b0c82.

Revert "Clear SPARK_YARN_MODE property on context shutdown"

8a3c124

This reverts commit 8cb2ce6.

asfgit closed this in 8ecba3e Sep 27, 2015

[SPARK-10720][SQL][JAVA] Add a java wrapper to create a dataframe from a local list of java beans #8879

[SPARK-10720][SQL][JAVA] Add a java wrapper to create a dataframe from a local list of java beans #8879

Uh oh!

Conversation

holdenk commented Sep 23, 2015

Uh oh!

SparkQA commented Sep 23, 2015

Uh oh!

SparkQA commented Sep 23, 2015

Uh oh!

holdenk commented Sep 24, 2015

Uh oh!

SparkQA commented Sep 24, 2015

Uh oh!

srowen Sep 24, 2015

Choose a reason for hiding this comment

Uh oh!

holdenk Sep 25, 2015

Choose a reason for hiding this comment

Uh oh!

srowen Sep 25, 2015

Choose a reason for hiding this comment

Uh oh!

holdenk Sep 25, 2015

Choose a reason for hiding this comment

Uh oh!

holdenk commented Sep 25, 2015

Uh oh!

SparkQA commented Sep 26, 2015

Uh oh!

srowen commented Sep 26, 2015

Uh oh!

holdenk commented Sep 27, 2015

Uh oh!

srowen commented Sep 27, 2015

Uh oh!

srowen commented Sep 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants