Skip to content

Circular relationship when determining state fields for AggregateUDF #10785

@viirya

Description

@viirya

Is your feature request related to a problem or challenge?

I'm trying to update Comet to latest DataFusion: apache/datafusion-comet#403. One issue I found is about AggregateUDF.

For some aggregate expressions, their implementation moves to AggregateUDF, e.g., FirstValue and LastValue.

To create a AggregateFunctionExpr for them, we need to call create_aggregate_expr by providing some arguments like input_phy_exprs. input_phy_exprs is used to determine the UDF's return return type (i.e., AggregateUDF.return_type).

To get the input physical expressions for each aggregate expression, we need to get slice of all input expressions which is the state fields of the aggregate expression. But the current design of AggregateUDF doesn't provide its state fields, but it relies on AggregateFunctionExpr to call AggregateUDF.state_fields with StateFieldsArgs.

So it is a circular relationship for someone to create a AggregateFunctionExpr for a AggregateUDF:

  1. Create a AggregateUDF
  2. Call create_aggregate_expr to create AggregateFunctionExpr, by providing input_phy_exprs
  3. In order to get input_phy_exprs, we need to know how to slice input expressions, i.e., how many state fields for the UDF. Call AggregateUDF.state_fields. AggregateUDF.state_fields requires StateFieldsArgs from AggregateFunctionExpr, so we need to create AggregateFunctionExpr first (step 2).

I think this is an issue in the current design. AggregateUDF should determine its state fields by itself instead relying on AggregateFunctionExpr which is a wrapper of it.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions