-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-13233][SQL][WIP] Python Dataset #11117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #50927 has finished for PR 11117 at commit
|
|
Test build #51258 has finished for PR 11117 at commit
|
|
Test build #51262 has finished for PR 11117 at commit
|
|
Test build #51297 has finished for PR 11117 at commit
|
| ).compute(inputIterator, context.partitionId(), context) | ||
|
|
||
| if (outputIsPickled) { | ||
| outputIterator.map(bytes => InternalRow(bytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid copying the bytes, here I create safe rows. However, according to #10511, operators should always produce unsafe rows. Actually python UDF operator(BatchPythonEvaluation) also produce safe rows, which may also have problems. Should we bring back the requireUnsafeRow stuff? In some cases like here, converting to unsafe rows is expensive and may not have much benefit.
cc @davies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchPythonEvaluation will produce UnsafeRow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I missed the unsafe projection at the very last. Then we can probably add an unsafe projection here too.
|
Test build #51300 has finished for PR 11117 at commit
|
|
Test build #51301 has finished for PR 11117 at commit
|
|
Test build #51308 has finished for PR 11117 at commit
|
|
retest this please |
|
Test build #51354 has finished for PR 11117 at commit
|
|
Test build #51433 has finished for PR 11117 at commit
|
|
Test build #51436 has finished for PR 11117 at commit
|
|
Test build #51438 has finished for PR 11117 at commit
|
|
Test build #51518 has finished for PR 11117 at commit
|
|
Test build #51526 has finished for PR 11117 at commit
|
|
Test build #51534 has finished for PR 11117 at commit
|
|
retest this please |
|
Test build #51559 has finished for PR 11117 at commit
|
|
Test build #51588 has finished for PR 11117 at commit
|
|
Test build #51610 has finished for PR 11117 at commit
|
|
retest this please |
|
Test build #51617 has finished for PR 11117 at commit
|
|
Test build #51620 has finished for PR 11117 at commit
|
|
Test build #51638 has finished for PR 11117 at commit
|
|
Test build #51652 has finished for PR 11117 at commit
|
|
retest this please |
|
Test build #51680 has finished for PR 11117 at commit
|
|
retest this please |
|
Test build #51720 has finished for PR 11117 at commit
|
draft prototype, submit PR to test it via jenkins.
TODO:
map,mapPartitions, etc. conflict with the existing ones(which just forward toRDD), we should remove old ones, but that will break some other code, so here we still keep the old ones, and use different names likemapPartitions2for new ones.mapPartitionsis the fundamental function, which is enough for prototype, I'll addmap,flatMap, etc. later based on it.