-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
This bug, released in DataFusion 42.0.0 ,
Added a new check in the DefaultPhysicalPlanner that the schema of the output plan is the same as the input plan
datafusion/datafusion/core/src/physical_planner.rs
Lines 660 to 662 in 818ce3f
| if physical_input_schema != physical_input_schema_from_logical { | |
| return internal_err!("Physical input schema should be the same as the one converted from logical input schema."); | |
| } |
While @jayzhan211 's heroic efforts has this passing in all the DataFusion tests, it turned out this check failed on many downstream implementations:
- [EPIC] Schema metadata handling / bugs #12733 lists issues we hit during our upgrade in InfluxDB 3.0
- chore: upgrade to datafusion 43 delta-io/delta-rs#2886 @ion-elgreco and @rtyler and @Xuanwo while updating delta.rs to DataFusion 42.0.0
Downstream in InfluxDB 3.0 we turned the check into a warning in our fork to unblock our upgrade
We even made a patch release to try and get the delta-rs upgrade working:
But it is still failing when I write this (see delta-io/delta-rs#2886 (comment))
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { name: "element", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "__delta_rs_update_predicate", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, Some(Bare { table: "target" }), None, None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "__delta_rs_update_predicate", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, Some(Bare { table: "target" }), None, None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }.
Describe the solution you'd like
Note there is at least one open outstanding bug: #13010
I would like some way to disable this check to unblock upgrades in downstream crates.
Describe alternatives you've considered
I propose we add a new config value that lets downstream crates opt in / out of this check, similarly to datafusion.optimizer.skip_failed_rules (see Config Docs)
Something like:
datafusion.execution.validate_schema: If true, theDefaultPhysicalPlannerwill error if the input plan's schema does not exactly match the output plan.
Additional context
No response