-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Move nested union optimization from plan builder to logical optimizer #7695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e8e6a32 to
79e20c6
Compare
4c42dc4 to
770bf79
Compare
a3a2484 to
b926721
Compare
b926721 to
7259e9f
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @maruschin -- this is looking quite close. The basic idea and testing looks very solid. I think several of the CI failures on this PR are due to #7701, so merging / rebasing with master will help.
I also double checked this branch gets the same plan as main
❯ explain select * from (select 1 UNION ALL (select 2 UNION ALL select 3));
+---------------+----------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------+
| logical_plan | Union |
| | Projection: Int64(1) AS Int64(1) |
| | EmptyRelation |
| | Projection: Int64(2) AS Int64(1) |
| | EmptyRelation |
| | Projection: Int64(3) AS Int64(1) |
| | EmptyRelation |
| physical_plan | UnionExec |
| | ProjectionExec: expr=[1 as Int64(1)] |
| | EmptyExec: produce_one_row=true |
| | ProjectionExec: expr=[2 as Int64(1)] |
| | EmptyExec: produce_one_row=true |
| | ProjectionExec: expr=[3 as Int64(1)] |
| | EmptyExec: produce_one_row=true |
| | |
+---------------+----------------------------------------+
2 rows in set. Query took 0.008 seconds.I found a few sqllogictests seem to fail, specifically
cargo test --test sqllogictests -- timestamp
Running "timestamps.slt"
External error: query result mismatch:
[SQL] SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00+00:45') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00+00:30') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00+00:15') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00-00:15') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00-00:30') as ts_irregular_offset
UNION ALL
SELECT date_trunc('hour', TIMESTAMPTZ '2000-01-01T00:00:00-00:45') as ts_irregular_offset
[Diff] (-expected|+actual)
- 1999-12-31T23:00:00Z
- 1999-12-31T23:00:00Z
- 1999-12-31T23:00:00Z
- 2000-01-01T00:00:00Z
- 2000-01-01T00:00:00Z
- 2000-01-01T00:00:00Z
+ 1999-12-31T23:00:00
+ 1999-12-31T23:00:00
+ 1999-12-31T23:00:00
+ 2000-01-01T00:00:00
+ 2000-01-01T00:00:00
+ 2000-01-01T00:00:00
at test_files/timestamps.slt:1471Which looks to me like the timestamp logic got lost somehow 🤔
c4a11b7 to
cdad0be
Compare
dfd4dab to
4e116ea
Compare
d8024ab to
e874a08
Compare
6258d57 to
e59e54c
Compare
e59e54c to
65f09fa
Compare
65f09fa to
bd6c4d4
Compare
| plan: &LogicalPlan, | ||
| _config: &dyn OptimizerConfig, | ||
| ) -> Result<Option<LogicalPlan>> { | ||
| // TODO: Add optimization for nested distinct unions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at one point (maybe as a TODO) we probably should also remove unions that have only a single child since they are basically a no-op/pass-through. Not sure if this should be an separate optimizer rule or if this should be done in this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the EliminateOneUnion rule, also added in this PR, handles the one union case
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @maruschin -- this is a really nice piece of work. It is well tested and the code is very clean.
Thank you @crepererum and @jackwener for the reviews
| plan: &LogicalPlan, | ||
| _config: &dyn OptimizerConfig, | ||
| ) -> Result<Option<LogicalPlan>> { | ||
| // TODO: Add optimization for nested distinct unions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the EliminateOneUnion rule, also added in this PR, handles the one union case
|
I took the liberty of merging this branch up from main and fixing clippy as I had it all checked out already I also wrote some end to end tests in #7787 |
|
Thanks again @maruschin |
…apache#7695) * Add naive implementation of eliminate_nested_union * Remove union optimization from LogicalPlanBuilder::union * Fix propagate_union_children_different_schema test * Add implementation of eliminate_one_union * Simplified eliminate_nested_union test * Fix * clippy --------- Co-authored-by: Evgeny Maruschenko <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #7481.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?