-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Parquet] Reuse buffer in ByteViewArrayDecoderPlain
#6930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ByteViewArrayDecoderPlain ByteViewArrayDecoderPlain
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XiangpengHao -- this looks great to me
arrow-buffer/src/buffer/immutable.rs
Outdated
|
|
||
| /// Auxiliary method to create a new Buffer | ||
| /// | ||
| /// This is convenient for converting a [`bytes::Bytes`] to a [`Buffer`] without copying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
FYI @kylebarron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this instead be a From implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to recommend a From implementation in addition to the explicit function
My rationale is that Rust experts are likely to find the From implementation, but for beginners/intermediate, having an explicit method is easier to find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name is very confusing, as it isn't clear what external means. This confusion is not helped by the fact arrow has a Bytes type that isn't currently public but the from_bytes method is.
Just adding a From implementation is a way to decouple this PR from cleaning this mess up #6754
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some ideas of how to document / make this easier to find -- be back shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a proposal to add From impls and documentation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! Let's wait #6939 to be merged and I'll rebase to that commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rebase to that commit
Done!
Co-authored-by: Raphael Taylor-Davies <[email protected]>
bc838a4 to
0111ae6
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @XiangpengHao
|
Thanks again everyone! |
* reuse buffer in view array * Update parquet/src/arrow/array_reader/byte_view_array.rs Co-authored-by: Raphael Taylor-Davies <[email protected]> * use From<Bytes> instead --------- Co-authored-by: Raphael Taylor-Davies <[email protected]>
* reuse buffer in view array * Update parquet/src/arrow/array_reader/byte_view_array.rs Co-authored-by: Raphael Taylor-Davies <[email protected]> * use From<Bytes> instead --------- Co-authored-by: Raphael Taylor-Davies <[email protected]>
* reuse buffer in view array * Update parquet/src/arrow/array_reader/byte_view_array.rs Co-authored-by: Raphael Taylor-Davies <[email protected]> * use From<Bytes> instead --------- Co-authored-by: Raphael Taylor-Davies <[email protected]>
Which issue does this PR close?
Part of #6921
Rationale for this change
Avoid adding a new buffer to the stringview buffer if the buffer-to-be-add is the same as the last buffer in the buffer list.
This is typically not a problem as we usually have large continuous reads. But it becomes a problem with row-level filtering where we can have hundreds/thousands of small reads over the same buffer.
What changes are included in this PR?
Are there any user-facing changes?