Skip to content

Consider adopting IOx ObjectStore abstraction #2489

@wjones127

Description

@wjones127

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In another issue @alamb and @tustvold suggested we might want to use the IOx ObjectStore implementation.

A few nice points I'll mention about the IOx one:

  • They have some nice path utilities, including a CloudPath struct. That seems nicer than the current one with &str paths.
  • Has implementations for S3, GCS, Azure Blob Storage included in the repo. There is no HDFS support yet.
  • Has implementations of put() for writing. There doesn't seem to be streaming write support (multi-part upload).

There are a few differences in the API:

Current API: https://github.com/apache/arrow-datafusion/blob/dfdeb42d7d646cffcf3cff26beefcecffc6cbe62/data-access/src/object_store/mod.rs#L77

IOx API: https://github.com/influxdata/influxdb_iox/blob/94e9ac610acfb94870154d976f66a4d4111b5668/object_store/src/lib.rs#L74

  • The IOx list() implementation evaluated prefixes on path segments: "Prefixes are evaluated on a path segment basis, i.e. foo/bar/ is a prefix of foo/bar/x but not of foo/bar_baz/x."
  • IOx doesn't have a synchronous read implementation.

There of course exist other repos that this has implications for:

From what I've seen, it seems like we could reasonably shift to simply use the IOx ObjectStore. But if there's a good reason, we could also reuse useful parts of the implementation to keep the existing API.

cc @matthewmturner @kyotoYaho @roeap

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions