Skip to content

Option to disable Physical input schema should be the same as the one converted from logical schema error #13065

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

This bug, released in DataFusion 42.0.0 ,

Added a new check in the DefaultPhysicalPlanner that the schema of the output plan is the same as the input plan

if physical_input_schema != physical_input_schema_from_logical {
return internal_err!("Physical input schema should be the same as the one converted from logical input schema.");
}

While @jayzhan211 's heroic efforts has this passing in all the DataFusion tests, it turned out this check failed on many downstream implementations:

Downstream in InfluxDB 3.0 we turned the check into a warning in our fork to unblock our upgrade

We even made a patch release to try and get the delta-rs upgrade working:

But it is still failing when I write this (see delta-io/delta-rs#2886 (comment))

Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { name: "element", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "__delta_rs_update_predicate", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, Some(Bare { table: "target" }), None, None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "sold", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "price_float", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "items_in_bucket", data_type: List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "deleted", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "__delta_rs_update_predicate", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, Some(Bare { table: "target" }), None, None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }.

Describe the solution you'd like

Note there is at least one open outstanding bug: #13010

I would like some way to disable this check to unblock upgrades in downstream crates.

Describe alternatives you've considered

I propose we add a new config value that lets downstream crates opt in / out of this check, similarly to datafusion.optimizer.skip_failed_rules (see Config Docs)

Something like:

  • datafusion.execution.validate_schema: If true, the DefaultPhysicalPlanner will error if the input plan's schema does not exactly match the output plan.

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions