Skip to content

Conversation

@findepi
Copy link
Member

@findepi findepi commented Aug 7, 2025

Reduce boilerplate in cases where implementation of AggregateUDFImpl::{equals,hash_value} can be derived using standard Eq and Hash traits.

This is code complexity reduction. Follows #17057

While valuable on its own, this also prepares for more automatic derivation of UDF equals/hash and/or removal of default implementations (which currently are error-prone) -- #16677

findepi added 3 commits August 7, 2025 11:13
The UDF comparison is expected to be reflexive. Require `Eq` for any
uses of `udf_equals_hash` short-cut.
The wrapper implements PartialEq, Eq, Hash by forwarding to UDF impl
equals and hash_value functions.
Reduce boilerplate in cases where implementation of
`AggregateUDFImpl::{equals,hash_value}` can be derived using standard
`Eq` and `Hash` traits.
@findepi findepi requested review from alamb and timsaucer August 7, 2025 09:38
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate proto Related to proto crate functions Changes to functions implementation ffi Changes to the ffi crate labels Aug 7, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes make sense to me and can be merged. Thanks @findepi

/// corresponding methods on the UDF trait.
#[derive(Clone)]
#[allow(private_bounds)] // This is so that UdfEq can only be used with allowed pointer types (e.g. Arc), without allowing misuse.
pub struct UdfEq<Ptr: UdfPointer>(Ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a lot like pub struct PtrEq<Ptr: PointerType>(Ptr);

What is the reason we need both (and basically how would users know which to choose between)?

Maybe we can also add a comment with the answer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As their names are supposed to suggest --
PtrEq is a general-purpose Arc wrapper bringing pointer-based Eq
UdfEq is wrapper around Arc<dyn UDF> bringing UDF::equals-based Eq

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the names are good -- I just suggest that it would be easier to understand at a quick read if there are comments explaining the connection.

I don't feel this is required for merging this PR -- we can do it as a follow on PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW In theory we shouldn't need this type.
Should be enough to implement PartialEq for dyn WindowUDFImpl + '_ and let the rest be derived.
However, the compiler errs out likes this:

error[E0507]: cannot move out of a shared reference
   --> datafusion/expr/src/udwf.rs:479:5
    |
477 | #[derive(Debug, PartialEq, Hash)]
    |                 --------- in this derive macro expansion
478 | struct AliasedWindowUDFImpl {
479 |     inner: Arc<dyn WindowUDFImpl>,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ move occurs because value has type `std::sync::Arc<dyn udwf::WindowUDFImpl>`, which does not implement the `Copy` trait
    |
help: clone the value to increment its reference count
    |
479 |     inner: Arc<dyn WindowUDFImpl>.clone(),

The suggestion fix is particularily funny, but the problem is real. It seems to be tracked by rust-lang/rust#31740 and the solution people seem be using is to have a newtype around the pointer... Exactly what we have.

Copy link
Member Author

@findepi findepi Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good, but if you like I can put in a follow on PR to pass the hash call across the FFI boundary.

Comment on lines +388 to +400
impl PartialEq for ForeignAggregateUDF {
fn eq(&self, other: &Self) -> bool {
// FFI_AggregateUDF cannot be compared, so identity equality is the best we can do.
std::ptr::eq(self, other)
}
}
impl Eq for ForeignAggregateUDF {}
impl Hash for ForeignAggregateUDF {
fn hash<H: Hasher>(&self, state: &mut H) {
std::ptr::hash(self, state)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay to merge in, but I wonder if it would be better to add a hash function to the FFI and call it on the other side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely makes sense to me, but I am not well-equipped to address that. Thanks for offering to improve that as a follow-on.

@findepi findepi merged commit ac3a573 into apache:main Aug 8, 2025
27 checks passed
@findepi findepi deleted the findepi/udaf branch August 8, 2025 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Derive UDAF (AggregateUDFImpl) equality from PartialEq, Hash

3 participants