Skip to content

Incorporate object_store into arrow-rs repository #2030

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As described in influxdata/object_store_rs#41 and on https://lists.apache.org/thread/l2103pl85xkyq10c96z73d5t68f6tthd there appears to be consensus for donating the object_store implementation used in datafusion (and eventually maybe in the parquet reader) to Apache

This ticket tracks the actual work required to do so the current status is transparent and can be followed by those who are interested

Here is the copy/pasted Rationale from influxdata/object_store_rs#41 for clarity

Rationale

  1. A common, high quality object store abstraction for communicating with various remote object stores is useful for a range of projects and usecases.
  2. A library with a common API to access remote object stores is directly aligned with the Arrow mission of providing building blocks for modern high performance analytics systems
  3. The clear governance of Apache Arrow offers the best chance to build a unified and strong community around this crate, hopefully both increasing its adoption and attracting community contributions for its long term evolution and maintenance

Background

Object stores are increasing important for analytic systems as more data is located in such systems; @yjshen donated an object store abstraction to Arrow Datafusion to allow Datafusion to read from local files, S3, hdfs, and others. In apache/datafusion#2489 the DataFusion community is proposing migrating from this original object store abstraction, part of the DataFusion project (part of apache arrow) to the code in this crate.

Provenance

The code in this crate was originally developed by InfluxData, largely by @carols10cents, for InfluxDB IOx. @tustvold has since extracted the code and released it as its own crate. Upon consideration, as described above, for the long term health of both this code and the arrow-rs and arrow-datafusion projects, moving it to be an official part of Arrow would be beneficial and we would like to donate it to the community

There is additional background here apache/datafusion#2677 (comment)

Plan

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions