Skip to content

Add support for Arrow Extension Types #4472

@wjones127

Description

@wjones127

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Extension types are annotated in field metadata. This works well with record batches, but when exporting/importing an array over the C data interface, the extension type metadata is lost.

The C++ implementation solves this by having an ExtensionType class and always exports that metadata over C data interface here:

https://github.com/apache/arrow/blob/b9aec9ad2b655817b8925462e4e2dd6973807e23/cpp/src/arrow/c/bridge.cc#L243-L252

Describe the solution you'd like

I'd propose adding a new enum variant to DataType:

struct ExtensionType {
   name: String,
   metadata: String,
   storage_type: Box<DataType>,
}

enum DataType {
    ...
    ExtensionType(ExtensionType)
}

Then make sure the C data interface implementation handles exporting and importing this type.

Describe alternatives you've considered

We could add an extension type registry like C++, but that seems heavier than we really need.

Additional context

https://arrow.apache.org/docs/format/CDataInterface.html#extension-arrays

Previous discussions:

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions