-
Notifications
You must be signed in to change notification settings - Fork 1.7k
avg(distinct) support for decimal types
#17560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… and remove unused sum_distinct module
|
I'm also interested in improving decimal support, I have a PR that includes support for the new arrow types (decimal 32/64), this PR seems easier to merge first and then I'll rebase my PR to including this improved support. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Jefffrey -- this makes sense to me
| self.target_precision, | ||
| self.target_scale, | ||
| )?; | ||
| // `distinct_count` returns `u64`, but `avg` expects `i256` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this strange but after poking around I see there isn't a direct conversion from u64 --> i256:
let count: i256 = self.sum_accumulator.distinct_count().into();error[E0277]: the trait bound `arrow::datatypes::i256: std::convert::From<usize>` is not satisfied
--> datafusion/functions-aggregate-common/src/aggregate/avg_distinct/decimal.rs:105:73
|
105 | let count: i256 = self.sum_accumulator.distinct_count().into();
| ^^^^ the trait `std::convert::From<usize>` is not implemented for `arrow::datatypes::i256`
|
| fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<FieldRef>> { | ||
| if args.is_distinct { | ||
| // Copied from datafusion_functions_aggregate::sum::Sum::state_fields | ||
| // Decimal accumulator actually uses a different precision during accumulation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* chore: mv `DistinctSumAccumulator` to common * feat: add avg distinct support for float64 type * chore: fmt * refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module * feat: add avg distinct support for float64 type * feat: add avg distinct support for decimal * feat: more test for avg distinct in rust api * Remove DataFrame API tests for avg(distinct) * Remove proto test * Fix merge errors * Refactoring * Minor cleanup * Decimal slt tests for avg(distinct) * Fix state_fields for decimal distinct avg --------- Co-authored-by: YuNing Chen <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]>
* chore: mv `DistinctSumAccumulator` to common * feat: add avg distinct support for float64 type * chore: fmt * refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module * feat: add avg distinct support for float64 type * feat: add avg distinct support for decimal * feat: more test for avg distinct in rust api * Remove DataFrame API tests for avg(distinct) * Remove proto test * Fix merge errors * Refactoring * Minor cleanup * Decimal slt tests for avg(distinct) * Fix state_fields for decimal distinct avg --------- Co-authored-by: Jeffrey Vo <[email protected]> Co-authored-by: YuNing Chen <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]>
Which issue does this PR close?
avg(distinct)support #2408Rationale for this change
This builds on top of the previous PR #15414 with some additional fixes. See previous PR for support on float types: #17255
What changes are included in this PR?
From original PR:
DecimalDistinctAvgAccumulatorimplementationAdditional changes made by me:
single_distinct_to_groupbydoesn't muddy the resultsstate_fieldsforAvgto account forDecimalDistinctAvgAccumulatorhaving a different data type compared to eventual return typeAre these changes tested?
Added slt tests.
Are there any user-facing changes?
No