-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
@XiangpengHao found this on #6031 and I am breaking it into its own ticket
let block_id = output.append_block(self.buf.clone().into());I thought the into() will convert the Bytes into array_buffer::Buffer() without copying the data.
However, self.buf is bytes::Bytes, not arrow_buffer::Bytes (they confusingly having the same name). The consequence is that the code above will use this From impl
arrow-rs/arrow-buffer/src/buffer/immutable.rs
Lines 361 to 370 in 3ce8e84
| impl<T: AsRef<[u8]>> From<T> for Buffer { | |
| fn from(p: T) -> Self { | |
| // allocate aligned memory buffer | |
| let slice = p.as_ref(); | |
| let len = slice.len(); | |
| let mut buffer = MutableBuffer::new(len); | |
| buffer.extend_from_slice(slice); | |
| buffer.into() | |
| } | |
| } |
Which results in a copy of data
To avoid the copy, the code is changed to
// Here we convert `bytes::Bytes` into `arrow_buffer::Bytes`, which is zero copy
// Then we convert `arrow_buffer::Bytes` into `arrow_buffer:Buffer`, which is also zero copy
let bytes = arrow_buffer::Buffer::from_bytes(self.data.clone().into());
let block_id = output.append_block(bytes);Describe the solution you'd like
We should do something to prevent future very subtle mistakes (and possibly remove other instances of such mistakes in arrow-rs)
Describe alternatives you've considered
One thing we could do is to remove any From impls for Buffer that copy data and instead use an explicit function name. This would make it more explicit when copying was occuring, providing more control and making mistakes such as fixed in #6031 less likely
Something like this, for example
impl Buffer {
/// Creating a `Buffer` instance by copying the memory from a `AsRef<[u8]>` into a newly
/// allocated memory region.
pub fn new_from_slice<T: AsRef<[u8]>>(data: T) -> Self {
...
}Additional context