Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Previously when we construct deserializer expression for array type, we will first cast the corresponding field to expected array type and then apply MapObjects.

However, by doing that, we lose the opportunity to do by-name resolution for struct type inside array type. In this PR, I introduce a UnresolvedMapObjects to hold the lambda function and the input array expression. Then during analysis, after the input array expression is resolved, we get the actual array element type and apply by-name resolution. Then we don't need to add Cast for array type when constructing the deserializer expression, as the element type is determined later at analyzer.

How was this patch tested?

new regression test

@cloud-fan
Copy link
Contributor Author

cc @rxin @marmbrus @liancheng

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not serialize an unresolved encoder, so I make it transient here

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75096 has finished for PR 17398 at commit 2497b17.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedMapObjects(function: Expression => Expression, child: Expression)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't have any encodeDecodeTest?

Copy link
Contributor Author

@cloud-fan cloud-fan Mar 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeDecodeTest is a round trip test so that the type and schema match exactly. I added an end-to-end test in DatasetSuite

@SparkQA
Copy link

SparkQA commented Mar 25, 2017

Test build #75216 has finished for PR 17398 at commit 14800a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • test(\"as seq of case class - reorder fields by name\")

(s"""${classOf[Builder[_, _]].getName} $builderValue = $getBuilderVar;
$builderValue.sizeHint($dataLength);""",
(
s"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below are style-only changes

@SparkQA
Copy link

SparkQA commented Mar 28, 2017

Test build #75317 has finished for PR 17398 at commit a13ab67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

LGTM. Merging to master. Thanks!

@asfgit asfgit closed this in 295747e Apr 4, 2017
asfgit pushed a commit that referenced this pull request Apr 16, 2017
…erializable

## What changes were proposed in this pull request?

In #17398 we introduced `UnresolvedMapObjects` as a placeholder of `MapObjects`. Unfortunately `UnresolvedMapObjects` is not serializable as its `function` may reference Scala `Type` which is not serializable.

Ideally this is fine, as we will never serialize and send unresolved expressions to executors. However users may accidentally do this, e.g. mistakenly reference an encoder instance when implementing `Aggregator`, we should fix it so that it's just a performance issue(more network traffic) and should not fail the query.

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #17639 from cloud-fan/minor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants