Skip to content

Make DfSchema wrap SchemaRef #4680

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently DataFusion has two ways to represent a schema, Schema and DFSchema. The former is the representation used by arrow, and in most components. DFSchema appears to "enhance" an arrow schema with the notion of a qualifier.

I'm not entirely sure of the history of this split, but to the uninitiated the split is confusing and frustrating. It also results in a non-trivial amount of schema munging logic to convert to/from the relevant representations

Describe the solution you'd like

I would like to change DfSchema to be

struct DfSchema {
    schema: SchemaRef,
    fields: Vec<DfFieldMetadata>
}

struct DfFieldMetadata {
    qualifier: Option<String>,
}

We could then make DfSchema automatically deref to SchemaRef, or at the very least implement AsRef<SchemaRef>, avoiding a lot of code that ends up looking like

let schema: Schema = self.plan.schema().as_ref().into();
Arc::new(schema)

Components wishing to combine the information can easily zip the two together, we could even assist this by adding

struct DfField<'a> {
    field: &'a Field,
    metadata: &'a DfFieldMetadata
}

impl DfSchema {

    fn df_fields() -> impl Iterator<Item=DfField<'_>> + '_ {
        self.arrow_schema.fields().iter().zip(&self.fields).map(|(field, metadata)| DfField { field, metadata })
    }
}

Describe alternatives you've considered

We could not do this

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions