Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. #17871

codetyri0n · 2025-10-01T20:55:08Z

Which issue does this PR close?

Changes brought in are related to [EPIC] Complete datafusion-spark Spark Compatible Functions #15914.

Rationale for this change

This PR brings about the miggration of the avg function from comet to the datafusion-spark crate.

What changes are included in this PR?

Code is largely the same, with minor tweaks as seen fit. Added a few tests in the corresponding slt file.

Are these changes tested?

Yes

Are there any user-facing changes?

No

codetyri0n · 2025-10-01T20:55:53Z

CC : @andygrove

datafusion/sqllogictest/test_files/spark/aggregate/avg.slt

datafusion/spark/src/function/aggregate/avg.rs

Jefffrey · 2025-10-02T02:42:37Z

datafusion/spark/src/function/aggregate/avg.rs

+        Self {
+            name: name.into(),
+            signature: Signature::user_defined(Immutable),
+            input_data_type: data_type,


Can input data type vary? Seems to be only Float64 right now, will there be more options in the future? Same for return data type

I still think it is confusing to require the input & result data types as inputs here; considering input type should be controlled by signature/coerce_types() only, and result data type should be same as return_type() which apparently uses avg_return_type()

It is unclear to me why we do this - I think we can either add a patch to be more direct or merge it with Datafusion avg, whichever route is decided upon in the future

Jefffrey

I do wonder if it is possible to merge this code with DataFusion avg, perhaps using generics to control the count type and bool flag for ansi mode in the future, to reduce duplication? Or would it be not worth the effort or are there more differences than just those two?

datafusion/spark/src/function/aggregate/avg.rs

Jefffrey · 2025-10-04T01:57:22Z

datafusion/spark/src/function/aggregate/avg.rs

+        Self {
+            name: name.into(),
+            signature: Signature::user_defined(Immutable),
+            input_data_type: data_type,


I still think it is confusing to require the input & result data types as inputs here; considering input type should be controlled by signature/coerce_types() only, and result data type should be same as return_type() which apparently uses avg_return_type()

alamb · 2025-10-04T10:46:42Z

I do wonder if it is possible to merge this code with DataFusion avg, perhaps using generics to control the count type and bool flag for ansi mode in the future, to reduce duplication? Or would it be not worth the effort or are there more differences than just those two?

I suggest that initially we accept there is a second avg and as we consolidate more of this functionalty in datafusion-spark, we can do things like consolidate the implementations- More background here #15914 (comment)

Perhaps it would be good to file a ticket to track the idea of consolidation

Jefffrey

I suppose we'll just do any refactoring later

alamb · 2025-10-07T18:02:58Z

Thank you @Jefffrey and @codetyri0n

Chore: Migrate avg from comet to datafusion-spark and add a few tests.

b1b9e6b

github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Oct 1, 2025

andygrove reviewed Oct 1, 2025

View reviewed changes

datafusion/sqllogictest/test_files/spark/aggregate/avg.slt Show resolved Hide resolved

Jefffrey reviewed Oct 2, 2025

View reviewed changes

CI Fix: Apply cargo format.

054fd13

codetyri0n changed the title ~~Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add a few tests.~~ Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. Oct 2, 2025

sriram added 3 commits October 3, 2025 00:28

CI Fix: Add coerce types function.

277bcd1

Add group by tests to the suite.

0bea9c2

Add doc highlighting differences with Spark.

f1f87cd

codetyri0n requested a review from andygrove October 3, 2025 13:28

Jefffrey reviewed Oct 4, 2025

View reviewed changes

CI assertion error fixes and improved docs.

0701476

codetyri0n requested a review from Jefffrey October 5, 2025 19:42

Merge branch 'main' into agg_avg

b0a51e5

Jefffrey approved these changes Oct 7, 2025

View reviewed changes

alamb added this pull request to the merge queue Oct 7, 2025

Merged via the queue into apache:main with commit f8e988f Oct 7, 2025
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. #17871

Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. #17871

codetyri0n commented Oct 1, 2025 •

edited

Loading

Uh oh!

codetyri0n commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Jefffrey Oct 2, 2025

Uh oh!

Jefffrey Oct 4, 2025 •

edited

Loading

Uh oh!

codetyri0n Oct 5, 2025

Uh oh!

Jefffrey left a comment

Uh oh!

Uh oh!

Jefffrey Oct 4, 2025 •

edited

Loading

Uh oh!

alamb commented Oct 4, 2025

Uh oh!

Jefffrey left a comment

Uh oh!

alamb commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. #17871

Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. #17871

Conversation

codetyri0n commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

codetyri0n commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Jefffrey Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codetyri0n Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jefffrey Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 4, 2025

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

codetyri0n commented Oct 1, 2025 •

edited

Loading

Jefffrey Oct 4, 2025 •

edited

Loading

Jefffrey Oct 4, 2025 •

edited

Loading