Compute ScalarFunction properties including `return_type` and `nullable` on creation

### Is your feature request related to a problem or challenge?

Continue on the discussion from https://github.com/apache/datafusion/pull/13756#discussion_r1887762971

We create scalar function with the defined function `udf` and inputs `args` and keep them in `Expr`. We then coerce the inputs in optimizer step with `TypeCoercionRewriter` and compute the `nullability` when we create physical expr `create_physical_expr`.

Both `return_type` and `nullability` are computed based on the inputs which are defined by `ScalarUDFImpl` in logical layer. As long as we have `udf` and `inputs: Vec<Expr>` we can decide whether it has a valid types and what implicit coercion should we do, and we can define the `return_type` and `nullablity` based on those information.

I think such properties can be decided as early as possible (when the function is created in logical layer).

Once we have `return_type`, I think we can avoid call of `data_types_with_scalar_udf` again in `ExprSchemable::get_type`, `coerce_arguments_for_signature_with_scalar_udf` and `create_physical_expr`

### Describe the solution you'd like

Compute the `return_type` and `nullable` as early as possible.

```rust
#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
pub struct ScalarFunction {
    /// The function
    pub func: Arc<crate::ScalarUDF>,
    /// List of expressions to feed to the functions as arguments
    pub args: Vec<Expr>,
    
    // New fields for ScalarFunction
    pub return_type: Vec<DataType>,
    pub nullable: bool,
}
```

`ScalarFunction::new_udf` is used to create function, need to double check if there is other way to create scalar function. We need to make sure they all go to the same computation.


Ideally we have something like
```rust
ScalarFunction::new(udf: Arc<crate::ScalarUDF>, args: Vec<Expr>) -> Result<()> {
  1. Check udf's signature against args including length check, type check, implicit coercion
  2. Call fun.return_type() to determine the return type (or even return Field that includes the nullability)
  
  return error if anything goes wrong
}
```

### Describe alternatives you've considered

_No response_

### Additional context

We create physical expr with logical input and schema which doesn't seem correct too.

```rust
/// Create a physical expression for the UDF.
pub fn create_physical_expr(
    fun: &ScalarUDF,
    input_phy_exprs: &[Arc<dyn PhysicalExpr>],
    input_schema: &Schema,
    args: &[Expr],
    input_dfschema: &DFSchema,
) -> Result<Arc<dyn PhysicalExpr>> {
    let input_expr_types = input_phy_exprs
        .iter()
        .map(|e| e.data_type(input_schema))
        .collect::<Result<Vec<_>>>()?;

    // verify that input data types is consistent with function's `TypeSignature`
    data_types_with_scalar_udf(&input_expr_types, fun)?;

    // Since we have arg_types, we dont need args and schema.
    let return_type =
        fun.return_type_from_exprs(args, input_dfschema, &input_expr_types)?;

    Ok(Arc::new(
        ScalarFunctionExpr::new(
            fun.name(),
            Arc::new(fun.clone()),
            input_phy_exprs.to_vec(),
            return_type,
        )
        .with_nullable(fun.is_nullable(args, input_dfschema)),
    ))
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compute ScalarFunction properties including `return_type` and `nullable` on creation #13825

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compute ScalarFunction properties including return_type and nullable on creation #13825

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Compute ScalarFunction properties including `return_type` and `nullable` on creation #13825