-
Couldn't load subscription status.
- Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As described in influxdata/object_store_rs#41 and on https://lists.apache.org/thread/l2103pl85xkyq10c96z73d5t68f6tthd there appears to be consensus for donating the object_store implementation used in datafusion (and eventually maybe in the parquet reader) to Apache
This ticket tracks the actual work required to do so the current status is transparent and can be followed by those who are interested
Here is the copy/pasted Rationale from influxdata/object_store_rs#41 for clarity
Rationale
- A common, high quality object store abstraction for communicating with various remote object stores is useful for a range of projects and usecases.
- A library with a common API to access remote object stores is directly aligned with the Arrow mission of providing building blocks for modern high performance analytics systems
- The clear governance of Apache Arrow offers the best chance to build a unified and strong community around this crate, hopefully both increasing its adoption and attracting community contributions for its long term evolution and maintenance
Background
Object stores are increasing important for analytic systems as more data is located in such systems; @yjshen donated an object store abstraction to Arrow Datafusion to allow Datafusion to read from local files, S3, hdfs, and others. In apache/datafusion#2489 the DataFusion community is proposing migrating from this original object store abstraction, part of the DataFusion project (part of apache arrow) to the code in this crate.
Provenance
The code in this crate was originally developed by InfluxData, largely by @carols10cents, for InfluxDB IOx. @tustvold has since extracted the code and released it as its own crate. Upon consideration, as described above, for the long term health of both this code and the arrow-rs and arrow-datafusion projects, moving it to be an official part of Arrow would be beneficial and we would like to donate it to the community
There is additional background here apache/datafusion#2677 (comment)
Plan
- Create a PR in arrow-rs with the proposed donation: Donate
object_storecode from object_store_rs to arrow-rs #2081 - IP clearance form (e.g. follow https://incubator.apache.org/ip-clearance/): IP clearance for object_store_rs donation arrow-rs-object-store#256
- Hold a vote on the mailing list https://lists.apache.org/thread/9dc2bbyy9wnvm7z6ndzhxoq9l3cx5t3p
- Acceptance vote: https://lists.apache.org/thread/5gfmvlf7lhs6lrcb0l8syq9s6dvs0mo8
- Merge Donate
object_storecode from object_store_rs to arrow-rs #2081 to arrow-rs - Port
object_storeintegration tests, use github actions #2148 - Port feat: Add stream upload (multi-part upload) influxdata/object_store_rs#20 to arrow-rs: Port Add stream upload (multi-part upload) #2147
- Verify there are no other changes to port from
object_store_rsto arrow-rs - Make a comment on all outstanding PRs to object_store_rs about new home and offer to help port over
- Move outstanding tickets from object_store_rs to arrow-rs
- PR to clean up / archive https://github.com/influxdata/object_store_rs and point at the new location - docs: remove code, and redirect to apache arrow site influxdata/object_store_rs#52
- Add appropriate crates.rs ownership for https://crates.io/crates/object_store
- Figure out and document the versioning and release cadence for object_store arrow-rs-object-store#248
- Write a blog post about the donation and future hopes for
object_store: https://www.influxdata.com/blog/rust-object-store-donation/