Skip to content

Conversation

PiotrSrebrny
Copy link

Which issue does this PR close?

Are these changes tested?

Changes where not tested and should be covered by existing.

Are there any user-facing changes?

The patch removes BufWriter from TrackedWrite, thus ArrowWrite no longer wraps the provided Write with it. If the underlying writer performs poorly with small, repeated writes (e.g., a TCP socket), a user should add the BufWriter wrapping for its writer for better performance.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 3, 2025
@tustvold
Copy link
Contributor

tustvold commented Oct 6, 2025

I think I would prefer to:

  • Fix flush
  • Add a separate option to create an unbuffered writer

See #8534 (comment)

@PiotrSrebrny
Copy link
Author

I am fine with this, I can add new_unbuffered() function to TrackedWrite

@PiotrSrebrny
Copy link
Author

I will have to add unbuffered option to the WriterProperties that gets propagated from ArrowWriter to the SerializedFileWriter to make it work.

@PiotrSrebrny PiotrSrebrny force-pushed the remove-BufWriter-from-TrackedWrite branch 2 times, most recently from 8a8361d to da97dd1 Compare October 6, 2025 10:17
@PiotrSrebrny PiotrSrebrny changed the title [Parquet] Remove BufWriter from TrackedWrite [Parquet] Add Unbuffered writer to TrackedWrite Oct 6, 2025
@PiotrSrebrny PiotrSrebrny force-pushed the remove-BufWriter-from-TrackedWrite branch from da97dd1 to 6496cb3 Compare October 6, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Parquet] ArrowWriter flush does not work
2 participants