- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-12289][SQL] Support UnsafeRow in TakeOrderedAndProject/Limit #10330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Test build #47820 has finished for PR 10330 at commit  
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InterpretedProjection can be replaced by UnsafeProjection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still have a dumb question. When calling the eval of each of the specified expressions, how can we know they can process unsafe rows? Why does the planner insert unsafe->safe conversion in the original design of TakeOrderedAndProject?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is just because not all expressions support unsafe before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now all the expressions can support unsafe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. If there are expressions still not supporting unsafe, we should make it support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you!
| Test build #47892 has finished for PR 10330 at commit  
 | 
| Test build #47890 has finished for PR 10330 at commit  
 | 
| Test build #47901 has finished for PR 10330 at commit  
 | 
| Test build #47916 has finished for PR 10330 at commit  
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the projectList is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added few lines to check if we need do extra unsafe projection for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you have done when projectList is None is exactly same with ConvertToUnsafe right? How about we change this to if (projectList.isDefined) true else child.outputsUnsafeRows, then our framework can insert ConvertToUnsafe if it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I've updated it.
| Test build #48296 has finished for PR 10330 at commit  
 | 
| @cloud-fan any other comments? | 
| Test build #48323 has finished for PR 10330 at commit  
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need copy here? We have already copied the rows when getting data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I was thinking that it is needed to copy the returned row because it is the same object. But after I checked GenerateUnsafeProjection, looks like it will create new row every time. I've updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it will be a problem without this copy(). HiveCompatibilitySuite will be failed.
[info]   key    value
[info]   !== HIVE - 5 row(s) ==   == CATALYST - 5 row(s) ==
[info]   !0 val_0                 4 val_4
[info]   !0 val_0                 4 val_4
[info]   !0 val_0                 4 val_4
[info]   !2 val_2                 4 val_4
[info]    4 val_4                 4 val_4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-checked GenerateUnsafeProjection, it will return the same unsafe row. So we should use another copy() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry I misread the code, the copy is needed even we already copied before takeOrdered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other cases that we will have the conversion? Or, we can create a dummy operator that only accepts safe rows. So, we can still test the logic of adding conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a dummy node for it. Thanks.
| overall LGTM | 
| Test build #48324 has finished for PR 10330 at commit  
 | 
| Test build #48331 has finished for PR 10330 at commit  
 | 
| Test build #48409 has finished for PR 10330 at commit  
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easy that just process UnsafeRow and output UnsafeRow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I need to close this again.....
…ut UnsafeRow It's confusing that some operator output UnsafeRow but some not, easy to make mistake. This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them. Closes apache#10330 cc JoshRosen rxin Author: Davies Liu <[email protected]> Closes apache#10511 from davies/unsafe_row.
JIRA: https://issues.apache.org/jira/browse/SPARK-12289
This change is needed for #10283. Without this, JavaDataFrameSuite will be failed when we support UnsafeRow in LocalTableScan.
Support in Limit is added first. TakeOrderedAndProject will be added later.