-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add support for utilization complex aggregate exprs inside group by #8107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for utilization complex aggregate exprs inside group by #8107
Conversation
|
@alamb, this should solve the test that fails on your end. Can you please confirm? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @mustafasrepo -- I am a little confused about the algorithm (I left some questions)
@alamb, this should solve the test that fails on your end. Can you please confirm?
I tried @ozankabak and unfortunately it does not solve our issue. I will work tomorrow on getting a datafusion only reproducer for our issue and add it to the ticket.
| let projection_mapping = self.implicit_projection_mapping(&exprs); | ||
| let projected_eqs = | ||
| self.project(&projection_mapping, projection_mapping.output_schema()); | ||
| if let Some(projected_reqs) = projection_mapping.project_lex_reqs(reqs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure I follow this logic -- does it say that any expression that can be projected maintains the ordering?
What about non monotonic expressions like abs(x) ? If the input is orderered by x
-2
-1
0
1
2
The output will not be ordered by abs(xx)
2
1
0
1
2
(the same applies to date functions like extract(day from ts)
Perhaps we need to check FuncMonotonicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure I follow this logic -- does it say that any expression that can be projected maintains the ordering?
No, it just projects everything as if they are evaluated (in terms of ordering properties). For instance if ts is not ordered. We know that date_bin(ts) is not ordered. Similarly abs(x) is not ordered no matter x is ordered or not. update_ordering considers the properties of functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was trying to say that in general, for ProjectionExec and any node that computes expressions, I would expect that we would have to use FuncMonotonicity to determine if an expression of x is ordered by expr(x)
Maybe that would be a good thing to do as a follow on PR
Thanks, it will help us pinpoint exactly what is wrong |
…synnada-ai/arrow-datafusion into bug_fix/complex_aggregate_exprs
|
By the way, I want to stress that this PR is beneficial even if it doesn't solve the issue in #8043. This PR enables us to run some of the aggregate queries in ordered or partially ordered mode. As can be seen from the test result
Also after this PR I would expect that following query result mentioned in the #8043. would turn into (pointed Then enforce sorting wouldn't remove the |
|
I marked this PR as draft. The reason is that, I have realized that we may not need to have implicit projection pass to have this support. If so, having implicit projection is a bit, cumbersome. I will try to implement alternative algorithm. |
# Conflicts: # datafusion/core/src/physical_optimizer/enforce_distribution.rs # datafusion/core/src/physical_optimizer/replace_with_order_preserving_variants.rs # datafusion/physical-expr/src/equivalence.rs # datafusion/sqllogictest/test_files/groupby.slt
|
This is complete in PR8281 |
Which issue does this PR close?
Closes #8043. Part of #8064
Rationale for this change
Group by cannot understand ordering of the complex expression currently.
Assume that we know that column
tsis ordered. When some writes a query in the formGROUP BY (ts)we can understand thattsis ordered, then runAggregateExecinSortedmode. However, when some writesdate_bin(ts), we cannot understanddate_bin(ts)is indeed ordered giventsis ordered (Please note that datafusion has this mechanism. However, in group by we do not use this mechanism).This PR solves this problem.
The design is as follows, we apply an implicit projection before checking
Then existing code runs as before. This implicit projection enables us to treat complex expression as columns, where their result are calculated(Please note that this calculation is done only in terms of ordering properties. There is no computation).
Assume that existing ordering is
ts ASC. Also assume that some writesGROUP BY (date_bin(ts))in the query.Implicit projection do following projection:
ts -> col(ts), date_bin(ts)-> col(date_bin(ts)).This projection transforms existing ordering from
ts ASCto[ts ASC], [date_bin(ts) ASC]. Then existing algorithms works on this projected form.Please note that: Changes in the
replace_order_preserving_variants.rsfile only comes from schema changes. Previously some of the plans were invalid (such column should be at index 1, however it is at index 2). However, they were silent. Because these plans never executed. With this new algorithm, algorithm considers existing schema during projection also. Hence these silent bugs aroused. I also fixed them as part of this PR.What changes are included in this PR?
Are these changes tested?
Yes new tests are added
Are there any user-facing changes?