-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Implement equals for stateful functions #16781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query.
kosiew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a suggestion and a question.
timsaucer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks very reasonable, but there is a lot of boiler plate code. I'm guessing there wasn't an easier way to add some derived traits for PartialEq and Hash for these structs instead. I'll try to set aside some time to go through each of these, but it's going to take a bit.
It would work for some, but not all. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @findepi and @kosiew and @timsaucer -- I think this PR makes sense to me and I like how explicit it is.
It took a while to get through 😅

I think it could be reduced in size somewhat by adding aliases to the default implementations.
My biggest concern is that in the future when we add new functions / modify existing ones we will forget to add a corresponding equals and hash implementations
While I also don't like the amount of almost-identical copy/paste code in this PR I don't really have any better suggestion and thus I think this PR is an improvement over main
To try and mitigage the issues, I suggest:
- Add a comment to all the implementations added in this PR like
// Implement equals because there are fields other than name and signatureand// implement hash because there are fields other than name and signature - Maybe update the comments in and
datafusion/datafusion/expr/src/udf.rs
Lines 697 to 706 in 4c877f6
/// Return true if this scalar UDF is equal to the other. /// /// Allows customizing the equality of scalar UDFs. /// Must be consistent with [`Self::hash_value`] and follow the same rules as [`Eq`]: /// /// - reflexive: `a.equals(a)`; /// - symmetric: `a.equals(b)` implies `b.equals(a)`; /// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`. /// /// By default, compares [`Self::name`] and [`Self::signature`]. to say when the functions should be overridden (for example "If your implementation has fields other than signature and name, such as aliases, you likely should implement equals and hash as well`)datafusion/datafusion/expr/src/udf.rs
Lines 711 to 716 in 4c877f6
/// Returns a hash value for this scalar UDF. /// /// Allows customizing the hash code of scalar UDFs. Similarly to [`Hash`] and [`Eq`], /// if [`Self::equals`] returns true for two UDFs, their `hash_value`s must be the same. /// /// By default, hashes [`Self::name`] and [`Self::signature`].
i'd rather see stuff like aliases and documentation not be part of ScalardUDFImpl at all. But yeah, i can add aliases there.
This is absolutely NOT improved by this PR. The problem remains unsolved.
I don't think it helps. |
remove these which compare signature, aliases, as the default handles these now
|
@alamb @timsaucer @kosiew would you like to take a look at new code pushed here since the time you last reviewed? |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the commits since I last reviewed and they look good to me -- thank you @findepi
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
* Implement equals for stateful functions Default implementation of `ScalarUDFImpl::equals`, `AggregateUDFImpl::equals` and `WindowUDFImpl::equals` is correct for stateless functions and those which only state is the `Signature`, which is most of the functions. This implements `equals` and `hash_value` for functions that have state other than `Signature` object. This fixes correctness issues which could occur when such stateful functions are used together in one query. * downgrade for MSRV * Improve doc * Update default UDF:: equals to compare aliases too * Update default UDF:: equals to compare type too (‼️ ) * remove now-obsoleted UDF equals/hash customizations remove these which compare signature, aliases, as the default handles these now * remove equals impl which compares name, signature -- default does that * cleanup imports (cherry picked from commit afd8235)
Default implementation of
ScalarUDFImpl::equals,AggregateUDFImpl::equalsandWindowUDFImpl::equalsis correct for stateless functions and those which only state is theSignature, which is most of the functions.This implements
equalsandhash_valuefor functions that have state other thanSignatureobject.This fixes correctness issues which could occur when such stateful functions are used together in one query.
ScalarUDFImplEquality Handling with Pointer-Based Default and Customizable Logic #16681