Skip to content

Conversation

codetyri0n
Copy link
Contributor

@codetyri0n codetyri0n commented Oct 1, 2025

Which issue does this PR close?

Rationale for this change

  • This PR brings about the miggration of the avg function from comet to the datafusion-spark crate.

What changes are included in this PR?

  • Code is largely the same, with minor tweaks as seen fit. Added a few tests in the corresponding slt file.

Are these changes tested?

  • Yes

Are there any user-facing changes?

  • No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Oct 1, 2025
@codetyri0n
Copy link
Contributor Author

CC : @andygrove

Self {
name: name.into(),
signature: Signature::user_defined(Immutable),
input_data_type: data_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can input data type vary? Seems to be only Float64 right now, will there be more options in the future? Same for return data type

Copy link
Contributor

@Jefffrey Jefffrey Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it is confusing to require the input & result data types as inputs here; considering input type should be controlled by signature/coerce_types() only, and result data type should be same as return_type() which apparently uses avg_return_type()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me why we do this - I think we can either add a patch to be more direct or merge it with Datafusion avg, whichever route is decided upon in the future

@codetyri0n codetyri0n changed the title Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add a few tests. Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. Oct 2, 2025
@codetyri0n codetyri0n requested a review from andygrove October 3, 2025 13:28
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if it is possible to merge this code with DataFusion avg, perhaps using generics to control the count type and bool flag for ansi mode in the future, to reduce duplication? Or would it be not worth the effort or are there more differences than just those two?

Self {
name: name.into(),
signature: Signature::user_defined(Immutable),
input_data_type: data_type,
Copy link
Contributor

@Jefffrey Jefffrey Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it is confusing to require the input & result data types as inputs here; considering input type should be controlled by signature/coerce_types() only, and result data type should be same as return_type() which apparently uses avg_return_type()

@alamb
Copy link
Contributor

alamb commented Oct 4, 2025

I do wonder if it is possible to merge this code with DataFusion avg, perhaps using generics to control the count type and bool flag for ansi mode in the future, to reduce duplication? Or would it be not worth the effort or are there more differences than just those two?

I suggest that initially we accept there is a second avg and as we consolidate more of this functionalty in datafusion-spark, we can do things like consolidate the implementations- More background here #15914 (comment)

Perhaps it would be good to file a ticket to track the idea of consolidation

@codetyri0n codetyri0n requested a review from Jefffrey October 5, 2025 19:42
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we'll just do any refactoring later

@alamb
Copy link
Contributor

alamb commented Oct 7, 2025

Thank you @Jefffrey and @codetyri0n

@alamb alamb added this pull request to the merge queue Oct 7, 2025
Merged via the queue into apache:main with commit f8e988f Oct 7, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants