-
Notifications
You must be signed in to change notification settings - Fork 33
Description
I'm opening up this issue to discuss the appropriate representation for a buffer (i.e. an arbitrary contiguous sequence of bytes) in terminus-store. This discussion will help me to get an understanding for the motivation and mechanics of the current approach and to probe for reactions to an alternative approach, which I propose at the end. Please feel free to comment on anything or to correct my understanding if necessary.
Currently, the predominant view of a buffer appears to be M: AsRef<[u8]>. This type implies two things:
- A given
data: Mhas the operationdata.as_ref()that returns&[u8]. This gives a read-only view of a buffer that can be shared between threads without the option of writing to it. - The struct containing the
data: Mowns the value referencing the buffer. There is no borrowing of references here.
This appears to have been changed from a previously predominant view of a buffer as a slice: data: &'a [u8] (1deedbf, bf6416b, ad7dd42, e5a50a0, c6a14f9). This view meant:
- The
data: &'a [u8]cannot be shared between threads. - The view into the data lasts no longer than the buffer's owner, who has the
'alifetime.
Now, given that the buffers currently seem to be backed by one of the two following structs:
pub struct SharedVec(pub Arc<Vec<u8>>);pub struct SharedMmap(Option<Arc<FileBacking>>);
which both have Arc, I presume that the data is being shared read-only between threads. (I'm actually not yet clear on where the sharing is occurring, so if you want to enlighten me, I'd appreciate it!) If there was no sharing, I think the slice approach is better, since (a) there is less runtime work to manage usage of the buffers and (b) the type system keeps track of the lifetimes.
I think using M: AsRef<[u8]> is somewhat painful as schema for typing a buffer. It's too general and leads to trait bounds such as M: 'static + AsRef<[u8]> + Clone + Send + Sync in many places.
After doing some research, I think something like Bytes from the bytes crate would work better. Bytes is a thread-shareable container representing a contiguous sequence of bytes. It satisfies 'static + AsRef<[u8]> + Clone + Send + Sync. It also supports operations like split_to and split_off, which I think would work well when you want to segment a buffer into different representations. Replacing data: M with data: Bytes would make many of the trait bounds disappear.
Unfortunately, Bytes does not support memmap::Mmap, which means it would not suit terminus-store's current usage of AsRef<[u8]>. However, I've already implemented an adaptation of Bytes that does support memmap::Mmap. Others have, too. See tokio-rs/bytes#359.
Here are some questions prompted by the above:
- What's the best way to represent a contiguous sequence of bytes in
terminus-store? - Does it need to be read-only?
- Does it need to be shared between threads?
- Would it be useful to use a less general type than
AsRef<[u8]>? Could that type be astructinstead of a set of trait bounds?