-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
DataFusion does not avg(distinct ..):
> select avg(distinct a), avg(distinct b) from values(1, 2) t(a, b);
Execution error: avg(DISTINCT) aggregations are not availableSomewhat confusingly DataFusion does appear to support a single distinct avg
> select avg(distinct a) from values(1, 2) t(a, b);
+-------------------+
| avg(DISTINCT t.a) |
+-------------------+
| 1.0 |
+-------------------+
1 row(s) fetched.
Elapsed 0.015 seconds.
But it does this by rewriting effectively rewriting the query select avg(distinct a) to select avg(a) from (select distinct a ...) in the SingleDistinctToGroupBy rule
Describe the solution you'd like
Support multile distinct avg aggregates:
datafusion/datafusion/functions-aggregate/src/average.rs
Lines 116 to 118 in 2d30334
| if acc_args.is_distinct { | |
| return exec_err!("avg(DISTINCT) aggregations are not available"); | |
| } |
Describe alternatives you've considered
Basically we could follow the model of COUNT(distinct ...) aggregates:
In this case, there is a separate accumulator type that is implemented for count distinct
Additional context
Add any other context or screenshots about the feature request here.