Skip to content

Conversation

@Jefffrey
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This builds on top of the previous PR #15414 with some additional fixes. See previous PR for support on float types: #17255

What changes are included in this PR?

From original PR:

  • DecimalDistinctAvgAccumulator implementation

Additional changes made by me:

  • Redid SLT tests to ensure single_distinct_to_groupby doesn't muddy the results
  • Fixed state_fields for Avg to account for DecimalDistinctAvgAccumulator having a different data type compared to eventual return type

Are these changes tested?

Added slt tests.

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Sep 14, 2025
@AdamGS
Copy link
Contributor

AdamGS commented Sep 15, 2025

I'm also interested in improving decimal support, I have a PR that includes support for the new arrow types (decimal 32/64), this PR seems easier to merge first and then I'll rebase my PR to including this improved support.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jefffrey -- this makes sense to me

self.target_precision,
self.target_scale,
)?;
// `distinct_count` returns `u64`, but `avg` expects `i256`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this strange but after poking around I see there isn't a direct conversion from u64 --> i256:

                let count: i256 = self.sum_accumulator.distinct_count().into();
error[E0277]: the trait bound `arrow::datatypes::i256: std::convert::From<usize>` is not satisfied
   --> datafusion/functions-aggregate-common/src/aggregate/avg_distinct/decimal.rs:105:73
    |
105 |                 let count: i256 = self.sum_accumulator.distinct_count().into();
    |                                                                         ^^^^ the trait `std::convert::From<usize>` is not implemented for `arrow::datatypes::i256`
    |

fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<FieldRef>> {
if args.is_distinct {
// Copied from datafusion_functions_aggregate::sum::Sum::state_fields
// Decimal accumulator actually uses a different precision during accumulation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Jefffrey Jefffrey added this pull request to the merge queue Sep 18, 2025
Merged via the queue into apache:main with commit e2a5b57 Sep 18, 2025
28 checks passed
@Jefffrey Jefffrey deleted the pr_15414 branch September 18, 2025 00:14
@AdamGS AdamGS mentioned this pull request Sep 30, 2025
22 tasks
AdamGS pushed a commit to AdamGS/arrow-datafusion that referenced this pull request Oct 2, 2025
* chore: mv `DistinctSumAccumulator` to common

* feat: add avg distinct support for float64 type

* chore: fmt

* refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module

* feat: add avg distinct support for float64 type

* feat: add avg distinct support for decimal

* feat: more test for avg distinct in rust api

* Remove DataFrame API tests for avg(distinct)

* Remove proto test

* Fix merge errors

* Refactoring

* Minor cleanup

* Decimal slt tests for avg(distinct)

* Fix state_fields for decimal distinct avg

---------

Co-authored-by: YuNing Chen <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
alamb added a commit that referenced this pull request Oct 3, 2025
* chore: mv `DistinctSumAccumulator` to common

* feat: add avg distinct support for float64 type

* chore: fmt

* refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module

* feat: add avg distinct support for float64 type

* feat: add avg distinct support for decimal

* feat: more test for avg distinct in rust api

* Remove DataFrame API tests for avg(distinct)

* Remove proto test

* Fix merge errors

* Refactoring

* Minor cleanup

* Decimal slt tests for avg(distinct)

* Fix state_fields for decimal distinct avg

---------

Co-authored-by: Jeffrey Vo <[email protected]>
Co-authored-by: YuNing Chen <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

avg(distinct) support

5 participants