Skip to content

Add support for lambda/higher order functions #14205

@gstvg

Description

@gstvg

Is your feature request related to a problem or challenge?

Some engines, like DuckDB and Clickhouse, supports lambda functions, like:

SELECT list_filter(numbers, x -> x > 40) as greater_than_40s FROM relation

There's already support for the syntax in sqlparser-rs

From #12206 (comment) by @jayzhan211

Describe the solution you'd like

One of

  1. Add Expr::Lambda{arg_names: Vec<String>, expr: Expr} variant
  2. Change ScalarFunction.args :
struct ScalarFunction {
    args: Vec<ScalarFunctionArgument>
    ... omitted
}

enum ScalarFunctionArgument {
    Expr(Expr),
    Lambda{ arg_names: Vec<String>, body: Expr }
}

Create a LambdaPhysicalExpr that holds the lambda physical expr and returns ScalarValue::Null in PhysicalExpr::evaluate or an error(similar to NoOp and UnkownColumn)
Add ScalarUDFImpl::lambdas_schema to return the types of the args of all lambda arguments it receives, so a input schema can be built and used to generated the LambdaPhysicalExpr

And then, one of:

  1. Make ScalarFunctionArgs generic over the arg type, with ColumnarValue as default, and add a new ScalarUDF method invoke_with_lambda_args / invoke_higher_order_with_args that receives ScalarFunctionArgs<ColumnarValueOrLambda> instead of ScalarFunctionArgs, and has a default implementation calling invoke_with_args, returning an error if any arg is an lambda. The lambda arg is created if any children PhysicalExpr is a LambdaPhysicalExpr
enum ColumnarValueOrLambda {
    Value(ColumnarValue),
    Lambda(&dyn PhysicalExpr)
}

struct ScalarFunctionArgs<T = ColumnarValue> {
    args: Vec<T>,
    ... omitted
}
  1. Add physical_exprs: Vec<Arc<dyn PhysicalExpr>> to ScalarFunctionArgs, so lambda expressions can be extracted from
  2. Add LambdaScalarUDF trait (Duplication with ScalarUDF, more work and more code to mantain, less flexible than 1, one more trait to document and to users to reason about)
  3. Change ScalarFunctionArgs.args to Vec<ColumnarValueOrLambda> instead of Vec<ColumnarValue> (A lot of breakage, including public)
  4. Add Lambda variant to ColumnarValue (Even more breakge than 2, and a lambda doesn't quite fit the concept of a columnar value)

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions