-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10720][SQL][JAVA] Add a java wrapper to create a dataframe from a local list of java beans #8879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #42893 has finished for PR 8879 at commit
|
…-dataframe-from-a-local-list-of-java-beans
|
Test build #42937 has finished for PR 8879 at commit
|
|
I think thats spurious, jenkins retest this please. |
|
Test build #42943 has finished for PR 8879 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can any of this be refactored to share implementation with the RDD version? there's some non-trivial duplication here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, I could take the code to work on Iterators of Rows and move that to a shared function they could both use (along with have it pass in bean info since in local mode we don't need to construct the class by name it already exists). I could also just prallelize it and call the RDD implementation, but I figured it would be better to use the LocalRelation as we did with the List of Rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to your judgment since you know the details better of course. If it's not worth the overhead to factor this out then leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take another look at it with fresh eyes tomorrow.
…is the correct place for this (the other functions in SQLContext are all related to the singleton, but it seemed like the right place. I could put it in a Utils object or similar if this isn't the best place for it).
|
@srowen so I refactored the shared code, but for serialization reasons I put it in the companion object. Let me know if this looks good to you :) |
|
Test build #43037 has finished for PR 8879 at commit
|
|
Yeah looks good to me. Let me hold it open a day or so to see anyone has comments |
|
@srowen just pinging since its been a few days and no extra comments have come in :) |
|
Fair enough yeah I'm back online merging things now. Time for it. |
|
Merged to master |
Similar to SPARK-10630 it would be nice if Java users didn't have to parallelize there data explicitly (as Scala users already can skip). Issue came up in http://stackoverflow.com/questions/32613413/apache-spark-machine-learning-cant-get-estimator-example-to-work