Skip to content

Conversation

maropu
Copy link
Member

@maropu maropu commented Aug 29, 2016

What changes were proposed in this pull request?

Partial aggregations are generated in EnsureRequirements, but the planner fails to
check if partial aggregation satisfies sort requirements.
For the following query:

val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2")
spark.sql("select max(b) from t2 group by a").explain(true)

Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation.

== Physical Plan ==
SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17])
+- *Sort [a#5 ASC], false, 0
   +- Exchange hashpartitioning(a#5, 200)
      +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19])
         +- LocalTableScan [a#5, b#6]

Actually, a correct plan is:

== Physical Plan ==
SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17])
+- *Sort [a#5 ASC], false, 0
   +- Exchange hashpartitioning(a#5, 200)
      +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19])
         +- *Sort [a#5 ASC], false, 0
            +- LocalTableScan [a#5, b#6]

How was this patch tested?

Added tests in PlannerSuite.

@hvanhovell
Copy link
Contributor

cc @clockfly

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64587 has finished for PR 14865 at commit a0cc986.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

the fix LGTM

@liancheng
Copy link
Contributor

@maropu Discussed with @clockfly and @cloud-fan offline. @cloud-fan proposed a simpler alternative of #10896. Please refer to this comment for details.

This PR still LGTM and I'm merging it since master is broken right now. Thanks for fixing it!

@asfgit asfgit closed this in 94922d7 Aug 30, 2016
@maropu
Copy link
Member Author

maropu commented Aug 30, 2016

@liancheng okay, I'll check the discussion. Thanks!

@maropu maropu deleted the SPARK-17289 branch July 5, 2017 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants