Skip to content

Substrait plan read relation baseSchema does not include the struct with type information #12244

@richtia

Description

@richtia

Describe the bug

Datafusion produces substrait plans that do not include a struct with type information

                    "baseSchema": {
                      "names": [
                        "ps_partkey",
                        "ps_suppkey",
                        "ps_availqty",
                        "ps_supplycost",
                        "ps_comment"
                      ]
                    },

It should look more like this to be valid

                      "baseSchema": {
                        "names": ["PS_PARTKEY", "PS_SUPPKEY", "PS_AVAILQTY", "PS_SUPPLYCOST", "PS_COMMENT"],
                        "struct": {
                          "types": [{
                            "i64": {
                              "nullability": "NULLABILITY_REQUIRED"
                            }
                          }, {
                            "i64": {
                              "nullability": "NULLABILITY_REQUIRED"
                            }
                          }, {
                            "i64": {
                              "nullability": "NULLABILITY_REQUIRED"
                            }
                          }, {
                            "decimal": {
                              "scale": 2,
                              "precision": 15,
                              "nullability": "NULLABILITY_REQUIRED"
                            }
                          }, {
                            "string": {
                              "nullability": "NULLABILITY_REQUIRED"
                            }
                          }],
                          "nullability": "NULLABILITY_REQUIRED"
                        }
                      },

To Reproduce

Generate any substrait plan that includes a read relation and you'll be able to see that the plan output doesn't include the struct field with type information in the baseSchema.

base_schema is a NamedStruct
https://substrait.io/relations/logical_relations/#__tabbed_1_1

https://substrait.io/types/named_structs/

Expected behavior

No response

Additional context

You can also vaidate plans by running them through the substrait-validator

import substrait_validator as sv
import substrait.gen.proto.plan_pb2 as plan_pb2
from datafusion import SessionContext
from datafusion import substrait as ss

ctx = SessionContext()
substrait_proto = plan_pb2.Plan()
substrait_plan = ss.serde.serialize_to_plan(sql_query, ctx)
substrait_plan_bytes = substrait_plan.encode()

config = sv.Config()
sv.check_plan_valid(substrait_plan_bytes, config)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions