Skip to content

Error "entered unreachable code: NamedStructField should be rewritten in OperatorToFunction" after upgrade to 37 #10181

@alamb

Description

@alamb

NOTE -- Here is an example of how to make Expr::NamedStructField work in 37.1.0: #10183

Is your feature request related to a problem or challenge?

In 37.0.0 many of the built in functions have been migrated to UDFs as described on #8045 . The migration is completed in 38.0.0

One part of this change is that now certain Exprs must be rewritten into the appropriate functions. Most notably get_field that extracts a field from a Struct

Among other things this allows people to customize how Expr behaves: #7845 (comment) or in slack to return NULLs for rows that don't pass in maps

The rewrite happens automatically as part of the logical planner (in the Analyzer pass)

However if you bypass those passes it will not happen

Yeah you need to use the FunctionRewriter here (with the relevant rewriter registered) https://github.com/apache/arrow-datafusion/blob/0573f78c7e7a4d94c3204cee464b3860479e0afb/datafusion/optimizer/src/analyzer/function_rewrite.rs#L33

Example

An example from discord: link is:

  let schema = Schema::new(vec![
        Field::new("id", DataType::Utf8, true),
        Field::new(
            "props",
            DataType::Struct(Fields::from(vec![Field::new("a", DataType::Utf8, true)])),
            true,
        ),
    ]);

    println!("schema {:?}", schema);

    let df_schema = DFSchema::try_from(schema.clone()).unwrap();

    let plan = table_scan(Some("props_test"), &schema, None)?
        .filter(col("props").field("a").eq(lit("2021-02-02")))?
        .build()?;
    println!("logical plan {:?}", plan);
    let phys = DefaultPhysicalPlanner::default().create_physical_expr(&plan.expressions()[0], &df_schema, &SessionContext::new().state())?;
    println!("phys {:?}", phys);
    Ok(())

This returns an error "NamedStructField should be rewritten in OperatorToFunction"

Describe the solution you'd like

No response

Describe alternatives you've considered

One potential workaround is to call get_field directly rather than Expr::field

So instead of

    let plan = table_scan(Some("props_test"), &schema, None)?
        .filter(col("props").field("a").eq(lit("2021-02-02")))?
        .build()?;

call like

  let plan = table_scan(Some("props_test"), &schema, None)?
        .filter(get_field(col("props", "a")).eq(lit("2021-02-02")))?
        .build()?;

Additional context

@ion-elgreco is seeing the same issue in Delta-rs: #9904 (comment)

I tried it with 37.1.0 in delta-rs, but we still get this error: internal error: entered unreachable code: NamedStructField should be rewritten in OperatorToFunction, wasn't this regression fixed?

@westonpace brings it up in discord link

Another report in discord: link

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions