-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Derive UDAF equality from Eq, Hash #17067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The UDF comparison is expected to be reflexive. Require `Eq` for any uses of `udf_equals_hash` short-cut.
The wrapper implements PartialEq, Eq, Hash by forwarding to UDF impl equals and hash_value functions.
Reduce boilerplate in cases where implementation of
`AggregateUDFImpl::{equals,hash_value}` can be derived using standard
`Eq` and `Hash` traits.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes make sense to me and can be merged. Thanks @findepi
| /// corresponding methods on the UDF trait. | ||
| #[derive(Clone)] | ||
| #[allow(private_bounds)] // This is so that UdfEq can only be used with allowed pointer types (e.g. Arc), without allowing misuse. | ||
| pub struct UdfEq<Ptr: UdfPointer>(Ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a lot like pub struct PtrEq<Ptr: PointerType>(Ptr);
What is the reason we need both (and basically how would users know which to choose between)?
Maybe we can also add a comment with the answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As their names are supposed to suggest --
PtrEq is a general-purpose Arc wrapper bringing pointer-based Eq
UdfEq is wrapper around Arc<dyn UDF> bringing UDF::equals-based Eq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the names are good -- I just suggest that it would be easier to understand at a quick read if there are comments explaining the connection.
I don't feel this is required for merging this PR -- we can do it as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW In theory we shouldn't need this type.
Should be enough to implement PartialEq for dyn WindowUDFImpl + '_ and let the rest be derived.
However, the compiler errs out likes this:
error[E0507]: cannot move out of a shared reference
--> datafusion/expr/src/udwf.rs:479:5
|
477 | #[derive(Debug, PartialEq, Hash)]
| --------- in this derive macro expansion
478 | struct AliasedWindowUDFImpl {
479 | inner: Arc<dyn WindowUDFImpl>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ move occurs because value has type `std::sync::Arc<dyn udwf::WindowUDFImpl>`, which does not implement the `Copy` trait
|
help: clone the value to increment its reference count
|
479 | inner: Arc<dyn WindowUDFImpl>.clone(),
The suggestion fix is particularily funny, but the problem is real. It seems to be tracked by rust-lang/rust#31740 and the solution people seem be using is to have a newtype around the pointer... Exactly what we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timsaucer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good, but if you like I can put in a follow on PR to pass the hash call across the FFI boundary.
| impl PartialEq for ForeignAggregateUDF { | ||
| fn eq(&self, other: &Self) -> bool { | ||
| // FFI_AggregateUDF cannot be compared, so identity equality is the best we can do. | ||
| std::ptr::eq(self, other) | ||
| } | ||
| } | ||
| impl Eq for ForeignAggregateUDF {} | ||
| impl Hash for ForeignAggregateUDF { | ||
| fn hash<H: Hasher>(&self, state: &mut H) { | ||
| std::ptr::hash(self, state) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay to merge in, but I wonder if it would be better to add a hash function to the FFI and call it on the other side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely makes sense to me, but I am not well-equipped to address that. Thanks for offering to improve that as a follow-on.
Reduce boilerplate in cases where implementation of
AggregateUDFImpl::{equals,hash_value}can be derived using standardEqandHashtraits.This is code complexity reduction. Follows #17057
While valuable on its own, this also prepares for more automatic derivation of UDF equals/hash and/or removal of default implementations (which currently are error-prone) -- #16677
AggregateUDFImpl) equality from PartialEq, Hash #16866