-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-38578][SQL] AdaptiveSparkPlanExec should ensure user-specified ordering #35924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala
Outdated
Show resolved
Hide resolved
65121f8 to
821e694
Compare
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to override outputPartitioning as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can, but there is no requirements about outputPartitioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we can remove this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OptimizeSkewedJoin still use this. I considered unify them, but seems OptimizeSkewedJoin does not affect the required output ordering.
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refine the test name if we don't really test table insertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, refined it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // User-specified repartition is only effective when it's the root node, or under | |
| // User-specified sort is only effective when it's the root node, or under |
|
The failed test is irrelevant |
|
Can we rerun the tests? |
dfa553c to
602c33c
Compare
|
rebased since that flaky test has been fixed |
|
After a second thought, what will happen if we just do a one-line fix |
it does not. FileFormatWriter just checks the |
|
closed, in favor of #34568 |
What changes were proposed in this pull request?
requiredOrderingoutputOrderinginAdaptiveSparkPlanExecWhy are the changes needed?
AdaptiveSparkPlanExecshould ensure the output ordering is therequiredOrdering, so we leverage theEnsureRequirementsto add sort if need.FileFormatWriter will check and add an implicit sort for dynamic partition columns or bucket columns according to the input physical plan. The check became always failure since AQE AdaptiveSparkPlanExec has no outputOrdering.
That casues a redundant sort if user has specified a sort which satisfies the required ordering (dynamic partition and bucket columns).
Does this PR introduce any user-facing change?
no, improve performance
How was this patch tested?
add test
Before:

After:
