-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-43025][SQL] Eliminate Union if filters have the same child plan #40661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ping @cloud-fan |
|
I think @peter-toth did something similar before, can you share some ideas @peter-toth ? |
I guess @peter-toth did the similar thing for scalar subquery, but this one try to fix non-scalar subquery. |
Sorry, I haven't got time to fully review the PR (maybe next week) but at first sight it seems to copy some fuctions (e.g. This PR combines UNION ALL legs if they return disjoint set of rows from the same source node. I think this makes sense in those cases when there are overlaping scans in the legs (despite the disjoint filters), and by "overlapping" I mean that the scans use some common set of files. BTW, |
|
@peter-toth Thank you for your first look. The partitioning/bucketing column filter seems doesn't improve anything. I will optimize it in further. |
1ac36aa to
d65466c
Compare
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
There are a lot of SQL with union multiple subquery with filter in user scenarios. Take an example,
q1
In fact, we can simplify this SQL as
q2
In fact, we can simplify this SQL as
This PR optimizes
Unionoperators if the children exists at least twoFilterby:Unionoperators if all the children areFilterand all the child of theseFilters are same. We just need merging the predicates into one single predicate by connecting theseFilters withOr.Filteroperators into one if all the child of theseFilters are same. We just need merging the predicates into one single predicate by connecting theseFilters withOrtoo.Why are the changes needed?
Simply the SQL plan and improve the performance.
Does this PR introduce any user-facing change?
'No'.
New feature and just update the inner implementation.
How was this patch tested?
New test cases.
The micro benchmark for q1 and q2.
Before this PR
After this PR