-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19716][SQL] support by-name resolution for struct type elements in array #17398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not serialize an unresolved encoder, so I make it transient here
|
Test build #75096 has finished for PR 17398 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't have any encodeDecodeTest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encodeDecodeTest is a round trip test so that the type and schema match exactly. I added an end-to-end test in DatasetSuite
|
Test build #75216 has finished for PR 17398 at commit
|
| (s"""${classOf[Builder[_, _]].getName} $builderValue = $getBuilderVar; | ||
| $builderValue.sizeHint($dataLength);""", | ||
| ( | ||
| s""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below are style-only changes
|
Test build #75317 has finished for PR 17398 at commit
|
|
LGTM. Merging to master. Thanks! |
…erializable ## What changes were proposed in this pull request? In #17398 we introduced `UnresolvedMapObjects` as a placeholder of `MapObjects`. Unfortunately `UnresolvedMapObjects` is not serializable as its `function` may reference Scala `Type` which is not serializable. Ideally this is fine, as we will never serialize and send unresolved expressions to executors. However users may accidentally do this, e.g. mistakenly reference an encoder instance when implementing `Aggregator`, we should fix it so that it's just a performance issue(more network traffic) and should not fail the query. ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #17639 from cloud-fan/minor.
What changes were proposed in this pull request?
Previously when we construct deserializer expression for array type, we will first cast the corresponding field to expected array type and then apply
MapObjects.However, by doing that, we lose the opportunity to do by-name resolution for struct type inside array type. In this PR, I introduce a
UnresolvedMapObjectsto hold the lambda function and the input array expression. Then during analysis, after the input array expression is resolved, we get the actual array element type and apply by-name resolution. Then we don't need to addCastfor array type when constructing the deserializer expression, as the element type is determined later at analyzer.How was this patch tested?
new regression test