-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
Continue on the discussion from #13756 (comment)
We create scalar function with the defined function udf
and inputs args
and keep them in Expr
. We then coerce the inputs in optimizer step with TypeCoercionRewriter
and compute the nullability
when we create physical expr create_physical_expr
.
Both return_type
and nullability
are computed based on the inputs which are defined by ScalarUDFImpl
in logical layer. As long as we have udf
and inputs: Vec<Expr>
we can decide whether it has a valid types and what implicit coercion should we do, and we can define the return_type
and nullablity
based on those information.
I think such properties can be decided as early as possible (when the function is created in logical layer).
Once we have return_type
, I think we can avoid call of data_types_with_scalar_udf
again in ExprSchemable::get_type
, coerce_arguments_for_signature_with_scalar_udf
and create_physical_expr
Describe the solution you'd like
Compute the return_type
and nullable
as early as possible.
#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
pub struct ScalarFunction {
/// The function
pub func: Arc<crate::ScalarUDF>,
/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
// New fields for ScalarFunction
pub return_type: Vec<DataType>,
pub nullable: bool,
}
ScalarFunction::new_udf
is used to create function, need to double check if there is other way to create scalar function. We need to make sure they all go to the same computation.
Ideally we have something like
ScalarFunction::new(udf: Arc<crate::ScalarUDF>, args: Vec<Expr>) -> Result<()> {
1. Check udf's signature against args including length check, type check, implicit coercion
2. Call fun.return_type() to determine the return type (or even return Field that includes the nullability)
return error if anything goes wrong
}
Describe alternatives you've considered
No response
Additional context
We create physical expr with logical input and schema which doesn't seem correct too.
/// Create a physical expression for the UDF.
pub fn create_physical_expr(
fun: &ScalarUDF,
input_phy_exprs: &[Arc<dyn PhysicalExpr>],
input_schema: &Schema,
args: &[Expr],
input_dfschema: &DFSchema,
) -> Result<Arc<dyn PhysicalExpr>> {
let input_expr_types = input_phy_exprs
.iter()
.map(|e| e.data_type(input_schema))
.collect::<Result<Vec<_>>>()?;
// verify that input data types is consistent with function's `TypeSignature`
data_types_with_scalar_udf(&input_expr_types, fun)?;
// Since we have arg_types, we dont need args and schema.
let return_type =
fun.return_type_from_exprs(args, input_dfschema, &input_expr_types)?;
Ok(Arc::new(
ScalarFunctionExpr::new(
fun.name(),
Arc::new(fun.clone()),
input_phy_exprs.to_vec(),
return_type,
)
.with_nullable(fun.is_nullable(args, input_dfschema)),
))
}