[SPARK-38578][SQL] AdaptiveSparkPlanExec should ensure user-specified ordering #35924

ulysses-you · 2022-03-21T13:25:47Z

What changes were proposed in this pull request?

Ensure output ordering using requiredOrdering
Override outputOrdering in AdaptiveSparkPlanExec

Why are the changes needed?

AdaptiveSparkPlanExec should ensure the output ordering is the requiredOrdering, so we leverage the EnsureRequirements to add sort if need.

FileFormatWriter will check and add an implicit sort for dynamic partition columns or bucket columns according to the input physical plan. The check became always failure since AQE AdaptiveSparkPlanExec has no outputOrdering.

That casues a redundant sort if user has specified a sort which satisfies the required ordering (dynamic partition and bucket columns).

Does this PR introduce any user-facing change?

no, improve performance

How was this patch tested?

add test

CREATE TABLE t1 (c int) USING PARQUET PARTITIONED BY(p string);
CREATE TABLE t2 USING PARQUET AS SELECT 1 c, 'a' p;
INSERT INTO TABLE t1 PARTITION(p) select c, p from t2 order by p;

Before:

After:

ulysses-you · 2022-03-21T13:28:12Z

cc @maryannxue @cloud-fan

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala

cloud-fan · 2022-03-22T12:55:23Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

do we need to override outputPartitioning as well?

I think we can, but there is no requirements about outputPartitioning

cloud-fan · 2022-03-22T12:57:42Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ValidateRequirements.scala

Seems we can remove this now?

OptimizeSkewedJoin still use this. I considered unify them, but seems OptimizeSkewedJoin does not affect the required output ordering.

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

cloud-fan · 2022-03-22T15:13:14Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

We should refine the test name if we don't really test table insertion.

yeah, refined it

cloud-fan · 2022-03-23T03:01:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala

Suggested change

// User-specified repartition is only effective when it's the root node, or under

// User-specified sort is only effective when it's the root node, or under

ulysses-you · 2022-03-23T07:41:42Z

The failed test is irrelevant

ReportSinkMetricsSuite.test ReportSinkMetrics
org.scalatest.exceptions.TestFailedException: Expected null, but got {"metrics-1"="value-1", "metrics-2"="value-2"}

cloud-fan · 2022-03-23T07:42:18Z

Can we rerun the tests?

ulysses-you · 2022-03-23T07:56:06Z

rebased since that flaky test has been fixed

cloud-fan · 2022-03-23T13:39:58Z

After a second thought, what will happen if we just do a one-line fix override def outputOrdering: Seq[SortOrder] = requiredOrdering? AQE optimization may remove user-specified sort, but it doesn't matter as the file writing will add the necessary sort.

ulysses-you · 2022-03-24T01:32:41Z

file writing will add the necessary sort.

it does not. FileFormatWriter just checks the initialization SparkPlan if the ordering satisfies the required. If AQE changes the ordering at runtime, FileFormatWriter can not aware the change. So if we override the outputOrdering, we must make sure AQE won't change it.

ulysses-you · 2022-03-24T09:27:57Z

closed, in favor of #34568

github-actions bot added the SQL label Mar 21, 2022

singhpk234 reviewed Mar 22, 2022

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala Outdated Show resolved Hide resolved

ulysses-you force-pushed the aqe-ordering branch from 65121f8 to 821e694 Compare March 22, 2022 10:41

cloud-fan reviewed Mar 22, 2022

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 22, 2022

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 22, 2022

View reviewed changes

ulysses-you changed the title ~~[SPARK-38578][SQL] Avoid unnecessary sort in FileFormatWriter if user has specified sort in AQE~~ [SPARK-38578][SQL] AdaptiveSparkPlanExec should ensure user-specified ordering Mar 23, 2022

cloud-fan reviewed Mar 23, 2022

View reviewed changes

cloud-fan approved these changes Mar 23, 2022

View reviewed changes

ulysses-you added 4 commits March 23, 2022 15:54

Avoid unnecessary sort in FileFormatWriter if user has specified sort

8bd70c9

code comments

ef1e713

test name

3859727

address comment

602c33c

ulysses-you force-pushed the aqe-ordering branch from dfa553c to 602c33c Compare March 23, 2022 07:55

ulysses-you closed this Mar 24, 2022

ulysses-you deleted the aqe-ordering branch March 24, 2022 09:28

ulysses-you mentioned this pull request Nov 9, 2022

[SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache #38558

Closed

	// User-specified repartition is only effective when it's the root node, or under
	// User-specified sort is only effective when it's the root node, or under

[SPARK-38578][SQL] AdaptiveSparkPlanExec should ensure user-specified ordering #35924

[SPARK-38578][SQL] AdaptiveSparkPlanExec should ensure user-specified ordering #35924

Uh oh!

Conversation

ulysses-you commented Mar 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ulysses-you commented Mar 21, 2022

Uh oh!

Uh oh!

Uh oh!

cloud-fan Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

ulysses-you Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

ulysses-you Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloud-fan Mar 22, 2022

Choose a reason for hiding this comment

Uh oh!

ulysses-you Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Mar 23, 2022

Uh oh!

cloud-fan commented Mar 23, 2022

Uh oh!

ulysses-you commented Mar 23, 2022

Uh oh!

cloud-fan commented Mar 23, 2022

Uh oh!

ulysses-you commented Mar 24, 2022

Uh oh!

ulysses-you commented Mar 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ulysses-you commented Mar 21, 2022 •

edited

Loading