Skip to content

Implement trait based API for defining AggregateUDF #8710

@alamb

Description

@alamb

Similarly to #8568

Is your feature request related to a problem or challenge?

The current way a user implements a AggregateUDF is awkward and very hard to extend in backwards compatible ways:

They must wade through several Arc<dyn<...> typedefs to figure out how to provide the type signature and implementation

impl AggregateUDF {

/// Create a new AggregateUDF 
pub fn new(
    name: &str,
    signature: &Signature,
    return_type: &Arc<dyn Fn(&[DataType]) -> Result<Arc<DataType>, DataFusionError> + Send + Sync>,
    accumulator: &Arc<dyn Fn(&DataType) -> Result<Box<dyn Accumulator>, DataFusionError> + Send + Sync>,
    state_type: &Arc<dyn Fn(&DataType) -> Result<Arc<Vec<DataType>>, DataFusionError> + Send + Sync>
) -> AggregateUDF {
...
}

Describe the solution you'd like

Follow the pattern in #8578

  1. Create a AggregateUDFImpl trait, and AggregateUDF::new_from_impl that creates an AggregateUDF from the impl
  2. Add an example in datafusion-examples/examples/advanced_udaf.rs of using this API

I am not sure why this API is implemented like it is (other than it was consistent with ScalarUDF). As a user I would expect to be able to use a trait object like this

like

struct MyUDF { 
..
}

impl AggregateUDFImpl for MyUDF {
  fn name(&self) -> &str, 
  fn return_type(&self) -> &DataType, 
...
}

Describe alternatives you've considered

No response

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions