-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Feat: [datafusion-spark] Migrate avg from comet to datafusion-spark and add tests. #17871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CC : @andygrove |
Self { | ||
name: name.into(), | ||
signature: Signature::user_defined(Immutable), | ||
input_data_type: data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can input data type vary? Seems to be only Float64 right now, will there be more options in the future? Same for return data type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it is confusing to require the input & result data types as inputs here; considering input type should be controlled by signature/coerce_types()
only, and result data type should be same as return_type()
which apparently uses avg_return_type()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me why we do this - I think we can either add a patch to be more direct or merge it with Datafusion avg, whichever route is decided upon in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if it is possible to merge this code with DataFusion avg, perhaps using generics to control the count type and bool flag for ansi mode in the future, to reduce duplication? Or would it be not worth the effort or are there more differences than just those two?
Self { | ||
name: name.into(), | ||
signature: Signature::user_defined(Immutable), | ||
input_data_type: data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it is confusing to require the input & result data types as inputs here; considering input type should be controlled by signature/coerce_types()
only, and result data type should be same as return_type()
which apparently uses avg_return_type()
I suggest that initially we accept there is a second Perhaps it would be good to file a ticket to track the idea of consolidation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we'll just do any refactoring later
Thank you @Jefffrey and @codetyri0n |
Which issue does this PR close?
datafusion-spark
Spark Compatible Functions #15914.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?