- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Description
Is your feature request related to a problem or challenge?
I'm trying to update Comet to latest DataFusion: apache/datafusion-comet#403. One issue I found is about AggregateUDF.
For some aggregate expressions, their implementation moves to AggregateUDF, e.g., FirstValue and LastValue.
To create a AggregateFunctionExpr for them, we need to call create_aggregate_expr by providing some arguments like input_phy_exprs. input_phy_exprs is used to determine the UDF's return return type (i.e., AggregateUDF.return_type).
To get the input physical expressions for each aggregate expression, we need to get slice of all input expressions which is the state fields of the aggregate expression. But the current design of AggregateUDF doesn't provide its state fields, but it relies on AggregateFunctionExpr to call AggregateUDF.state_fields with StateFieldsArgs.
So it is a circular relationship for someone to create a AggregateFunctionExpr for a AggregateUDF:
- Create a AggregateUDF
- Call create_aggregate_exprto createAggregateFunctionExpr, by providinginput_phy_exprs
- In order to get input_phy_exprs, we need to know how to slice input expressions, i.e., how many state fields for the UDF. CallAggregateUDF.state_fields.AggregateUDF.state_fieldsrequiresStateFieldsArgsfromAggregateFunctionExpr, so we need to createAggregateFunctionExprfirst (step 2).
I think this is an issue in the current design. AggregateUDF should determine its state fields by itself instead relying on AggregateFunctionExpr which is a wrapper of it.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response