Skip to content

Remove From impls that copy data into arrow_buffer::Buffer #6033

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

@XiangpengHao found this on #6031 and I am breaking it into its own ticket

let block_id = output.append_block(self.buf.clone().into());

I thought the into() will convert the Bytes into array_buffer::Buffer() without copying the data.

However, self.buf is bytes::Bytes, not arrow_buffer::Bytes (they confusingly having the same name). The consequence is that the code above will use this From impl

impl<T: AsRef<[u8]>> From<T> for Buffer {
fn from(p: T) -> Self {
// allocate aligned memory buffer
let slice = p.as_ref();
let len = slice.len();
let mut buffer = MutableBuffer::new(len);
buffer.extend_from_slice(slice);
buffer.into()
}
}

Which results in a copy of data

To avoid the copy, the code is changed to

        // Here we convert `bytes::Bytes` into `arrow_buffer::Bytes`, which is zero copy
        // Then we convert `arrow_buffer::Bytes` into `arrow_buffer:Buffer`, which is also zero copy
        let bytes = arrow_buffer::Buffer::from_bytes(self.data.clone().into());
        let block_id = output.append_block(bytes);

Describe the solution you'd like
We should do something to prevent future very subtle mistakes (and possibly remove other instances of such mistakes in arrow-rs)

Describe alternatives you've considered
One thing we could do is to remove any From impls for Buffer that copy data and instead use an explicit function name. This would make it more explicit when copying was occuring, providing more control and making mistakes such as fixed in #6031 less likely

Something like this, for example

impl Buffer { 
  /// Creating a `Buffer` instance by copying the memory from a `AsRef<[u8]>` into a newly
  /// allocated memory region.
  pub fn new_from_slice<T: AsRef<[u8]>>(data: T) -> Self { 
   ...
}

Additional context

Metadata

Metadata

Assignees

Labels

api-changeChanges to the arrow APIenhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions