Skip to content
This repository was archived by the owner on Jul 27, 2022. It is now read-only.
This repository was archived by the owner on Jul 27, 2022. It is now read-only.

AsyncWrite over multi-part upload #9

@wjones127

Description

@wjones127

One idea I was exploring in datafusion-contrib/datafusion-objectstore-s3#54 was implementing the AsyncWrite trait as an abstraction over multi-part upload. Does that seem like an agreeable addition to this crate?

Multi-part uploads are helpful when uploading large files. For example, you can write parquet files one row group at a time, uploading each row groups data as a part (though more likely there is some buffering in between to get good part sizes). This is the approach taken in Arrow C++ S3 FileSystem. In fact, if we could even upload parts in parallel for better throughput in some scenarios (something AWS recommends).

It seems that GCS supports this through their S3-compatible API (docs) and Azure Blob store has some notion of "block blobs" that might be applicable (docs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions