-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
NOTE -- Here is an example of how to make Expr::NamedStructField work in 37.1.0: #10183
Is your feature request related to a problem or challenge?
In 37.0.0 many of the built in functions have been migrated to UDFs as described on #8045 . The migration is completed in 38.0.0
One part of this change is that now certain Exprs must be rewritten into the appropriate functions. Most notably get_field that extracts a field from a Struct
Among other things this allows people to customize how Expr behaves: #7845 (comment) or in slack to return NULLs for rows that don't pass in maps
The rewrite happens automatically as part of the logical planner (in the Analyzer pass)
However if you bypass those passes it will not happen
Yeah you need to use the FunctionRewriter here (with the relevant rewriter registered) https://github.com/apache/arrow-datafusion/blob/0573f78c7e7a4d94c3204cee464b3860479e0afb/datafusion/optimizer/src/analyzer/function_rewrite.rs#L33
Example
An example from discord: link is:
let schema = Schema::new(vec![
Field::new("id", DataType::Utf8, true),
Field::new(
"props",
DataType::Struct(Fields::from(vec![Field::new("a", DataType::Utf8, true)])),
true,
),
]);
println!("schema {:?}", schema);
let df_schema = DFSchema::try_from(schema.clone()).unwrap();
let plan = table_scan(Some("props_test"), &schema, None)?
.filter(col("props").field("a").eq(lit("2021-02-02")))?
.build()?;
println!("logical plan {:?}", plan);
let phys = DefaultPhysicalPlanner::default().create_physical_expr(&plan.expressions()[0], &df_schema, &SessionContext::new().state())?;
println!("phys {:?}", phys);
Ok(())This returns an error "NamedStructField should be rewritten in OperatorToFunction"
Describe the solution you'd like
No response
Describe alternatives you've considered
One potential workaround is to call get_field directly rather than Expr::field
So instead of
let plan = table_scan(Some("props_test"), &schema, None)?
.filter(col("props").field("a").eq(lit("2021-02-02")))?
.build()?;call like
let plan = table_scan(Some("props_test"), &schema, None)?
.filter(get_field(col("props", "a")).eq(lit("2021-02-02")))?
.build()?;Additional context
@ion-elgreco is seeing the same issue in Delta-rs: #9904 (comment)
I tried it with 37.1.0 in delta-rs, but we still get this error: internal error: entered unreachable code: NamedStructField should be rewritten in OperatorToFunction, wasn't this regression fixed?
@westonpace brings it up in discord link
Another report in discord: link