-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently DataFusion has two ways to represent a schema, Schema and DFSchema. The former is the representation used by arrow, and in most components. DFSchema appears to "enhance" an arrow schema with the notion of a qualifier.
I'm not entirely sure of the history of this split, but to the uninitiated the split is confusing and frustrating. It also results in a non-trivial amount of schema munging logic to convert to/from the relevant representations
Describe the solution you'd like
I would like to change DfSchema to be
struct DfSchema {
schema: SchemaRef,
fields: Vec<DfFieldMetadata>
}
struct DfFieldMetadata {
qualifier: Option<String>,
}
We could then make DfSchema automatically deref to SchemaRef, or at the very least implement AsRef<SchemaRef>, avoiding a lot of code that ends up looking like
let schema: Schema = self.plan.schema().as_ref().into();
Arc::new(schema)
Components wishing to combine the information can easily zip the two together, we could even assist this by adding
struct DfField<'a> {
field: &'a Field,
metadata: &'a DfFieldMetadata
}
impl DfSchema {
fn df_fields() -> impl Iterator<Item=DfField<'_>> + '_ {
self.arrow_schema.fields().iter().zip(&self.fields).map(|(field, metadata)| DfField { field, metadata })
}
}
Describe alternatives you've considered
We could not do this
Additional context
Add any other context or screenshots about the feature request here.