-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-33494][SQL][AQE] Do not use local shuffle reader for repartition #30432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Test build #131369 has finished for PR 30432 at commit
|
|
Kubernetes integration test status success |
|
@cloud-fan |
|
I believe this is more of a refactoring than a fix, @JkSelf . |
| * Returns whether the shuffle partition number can be changed. | ||
| */ | ||
| def canChangeNumPartitions: Boolean | ||
| def canChangeNumPartitions: Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like these two methods are redundant if we have partitioningFlexibilty already.
I know that removing these methods would cause more code changes on the caller side, but for the integrity and simplicity of the interface, it's probably worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like a util method as implementations don't need to implement it.
We already did. We don't coalesce shuffles if |
It does fix an issue, see the added test. |
The |
|
@JkSelf code-wise, you are right, as it becomes |
|
I see. It makes sense to me. Thanks for your detailed explanations. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #131545 has finished for PR 30432 at commit
|
|
retest this please. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #131556 has finished for PR 30432 at commit
|
|
Test build #131551 has finished for PR 30432 at commit
|
|
Test build #131549 has finished for PR 30432 at commit
|
|
Test build #131595 has finished for PR 30432 at commit
|
| } | ||
|
|
||
| // Describes where the shuffle operator comes from. | ||
| object ShuffleOrigin extends Enumeration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use seal trait/abstract class, so we type won't have a trailing .Value?
|
Test build #131627 has finished for PR 30432 at commit
|
|
retest this please |
|
Test build #131636 has finished for PR 30432 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
LGTM Thanks! Merged to master |
This PR updates `ShuffleExchangeExec` to carry more information about how much we can change the partitioning. For `repartition(col)`, we should preserve the user-specified partitioning and don't apply the AQE local shuffle reader. Similar to `repartition(number, col)`, we should respect the user-specified partitioning. No a new test Closes apache#30432 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This PR updates `ShuffleExchangeExec` to carry more information about how much we can change the partitioning. For `repartition(col)`, we should preserve the user-specified partitioning and don't apply the AQE local shuffle reader. Similar to `repartition(number, col)`, we should respect the user-specified partitioning. No a new test Closes apache#30432 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This PR updates `ShuffleExchangeExec` to carry more information about how much we can change the partitioning. For `repartition(col)`, we should preserve the user-specified partitioning and don't apply the AQE local shuffle reader. Similar to `repartition(number, col)`, we should respect the user-specified partitioning. No a new test Closes apache#30432 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This PR updates `ShuffleExchangeExec` to carry more information about how much we can change the partitioning. For `repartition(col)`, we should preserve the user-specified partitioning and don't apply the AQE local shuffle reader. Similar to `repartition(number, col)`, we should respect the user-specified partitioning. No a new test Closes apache#30432 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This PR updates `ShuffleExchangeExec` to carry more information about how much we can change the partitioning. For `repartition(col)`, we should preserve the user-specified partitioning and don't apply the AQE local shuffle reader. Similar to `repartition(number, col)`, we should respect the user-specified partitioning. No a new test Closes apache#30432 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This PR updates `ShuffleExchangeExec` to carry more information about how much we can change the partitioning. For `repartition(col)`, we should preserve the user-specified partitioning and don't apply the AQE local shuffle reader. Similar to `repartition(number, col)`, we should respect the user-specified partitioning. No a new test Closes apache#30432 from cloud-fan/aqe. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR updates
ShuffleExchangeExecto carry more information about how much we can change the partitioning. Forrepartition(col), we should preserve the user-specified partitioning and don't apply the AQE local shuffle reader.Why are the changes needed?
Similar to
repartition(number, col), we should respect the user-specified partitioning.Does this PR introduce any user-facing change?
No
How was this patch tested?
a new test