[SPARK-26057][SQL] Transform also analyzed plans when dedup references #23035

mgaido91 · 2018-11-14T15:20:22Z

What changes were proposed in this pull request?

In SPARK-24865 AnalysisBarrier was removed and in order to improve resolution speed, the analyzed flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans.

How was this patch tested?

added UT

Please review http://spark.apache.org/contributing.html before opening a pull request.

mgaido91 · 2018-11-14T15:20:42Z

cc @cloud-fan @gatorsmile @rxin

SparkQA · 2018-11-14T18:58:40Z

Test build #98829 has finished for PR 23035 at commit 62a895f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-15T02:43:39Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

  }
+
+  test("SPARK-26057: attribute deduplication on already analyzed plans") {
+    withTempView("cc", "p", "c") {


if we don't care about naming, how about a, b, c instead of cc, p, c?

cloud-fan · 2018-11-15T02:48:01Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+          |  WHERE c.id = cc.id AND c.layout = cc.layout AND c.ts > p.ts)
+          |GROUP BY cc.id, cc.layout
+        """.stripMargin).createOrReplaceTempView("pcc")
+      val res = spark.sql(


good catch on the problem! Do you think it's possible to simplify the test? I think we just need a temp view with subquery, and use it in a join.

yes, I simplified as much as I was able to. I hope now it is fine. Thanks.

SparkQA · 2018-11-15T11:50:09Z

Test build #98861 has finished for PR 23035 at commit 98d91a3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-15T12:10:57Z

thanks, merging to master/2.4!

## What changes were proposed in this pull request? In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. ## How was this patch tested? added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit b46f75a) Signed-off-by: Wenchen Fan <[email protected]>

## What changes were proposed in this pull request? In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. ## How was this patch tested? added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit b46f75a) Signed-off-by: Wenchen Fan <[email protected]> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

## What changes were proposed in this pull request? In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. ## How was this patch tested? added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit b46f75a) Signed-off-by: Wenchen Fan <[email protected]>

[SPARK-26054][SQL] Trasform also analyzed plans when dedup references

62a895f

mgaido91 changed the title ~~[SPARK-26054][SQL] Transform also analyzed plans when dedup references~~ [SPARK-26057][SQL] Transform also analyzed plans when dedup references Nov 14, 2018

cloud-fan reviewed Nov 15, 2018

View reviewed changes

mgaido91 added 3 commits November 15, 2018 09:10

simplify ut

63c70e5

fix

f23a46e

remove newline

98d91a3

asfgit closed this in b46f75a Nov 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-26057][SQL] Transform also analyzed plans when dedup references #23035

[SPARK-26057][SQL] Transform also analyzed plans when dedup references #23035

Uh oh!

mgaido91 commented Nov 14, 2018

Uh oh!

mgaido91 commented Nov 14, 2018

Uh oh!

SparkQA commented Nov 14, 2018

Uh oh!

cloud-fan Nov 15, 2018

Uh oh!

cloud-fan Nov 15, 2018

Uh oh!

mgaido91 Nov 15, 2018

Uh oh!

SparkQA commented Nov 15, 2018

Uh oh!

cloud-fan commented Nov 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-26057][SQL] Transform also analyzed plans when dedup references #23035

[SPARK-26057][SQL] Transform also analyzed plans when dedup references #23035

Uh oh!

Conversation

mgaido91 commented Nov 14, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

mgaido91 commented Nov 14, 2018

Uh oh!

SparkQA commented Nov 14, 2018

Uh oh!

cloud-fan Nov 15, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 15, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 Nov 15, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 15, 2018

Uh oh!

cloud-fan commented Nov 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants