meta: Expose `Bytes` vtable #437

carllerche · 2020-10-20T17:09:49Z

At some point in the future, we will want to expose the vtable backing Bytes. This issue tracks relevant discussion as well as API issues that relate to the vtable.

Convert Bytes -> BytesMut 0.5 release prevents going back from Bytes to BytesMut #350
mmap [Question] Next steps about vtable and mmap? #359
try_unsplit Add Bytes::try_unsplit() #287

The text was updated successfully, but these errors were encountered:

rrichardson · 2023-01-13T22:59:05Z

I've been researching a bunch of relevant tickets and solutions for the last couple days, I'm going to attempt to summarize all of the issues/PRs so that we can discuss them in one place and hopefully move forward with a solution.

Introduction

Bytes has become the de-facto ref-counted Buffer for asynchronous services and service frameworks. Interop with most high-level frameworks in the Tokio ecosystem requires using Bytes. However, Bytes features a limited set of allocation modes, so it is incompatible with many high-performance memory management systems. Recent interest in io_uring and it's shared ownership model has highlighted this. Also, there have been a couple requests made to interop with Memory Mapped data as well. In addition, Recycling Buffer Pools/Freelists need a way to "drop" themselves back into their pool instead of De-allocating.

Why should I use Bytes over an Arc<Vec<u8>>? The answer is in Bytes' vtable. It allows differently behaving Bytes instances to offer the same homogenous interface. This enables copy-on-write functionality, as well as supporting common operations on buffers that Rust won't normally allow, such as splitting a partially filled buffer to hand to a consumer while the other portion of the buffer continues to be written.

One might go so far as to say that Bytes is actually two orthogonal sets of functionality under one struct

Buffer IO Operations (Read/Write/Cursor)
Buffer Lifecycle Management.

Level Setting

There have been repeated requests for 2 umbrellas of operations:

Constructing RefCounted instances that enclose 3rdPartyBuffers
1. 3rdPartyBuffer -> Bytes
2. 3rdPartyBuffer -> BytesMut
3. BytesMut -> 3rdPartyBuffer
4. Bytes -> 3rdPartyBuffer
Conversion and Interop between Bytes and BytesMut
1. split (checked and unchecked)
  1. (Bytes -> BytesMut
  2. BytesMut -> Bytes
2. unsplit (allocating, try_*)
  1. (Bytes|BytesMut) -> (Bytes | BytesMut) -> (Bytes | BytesMut)
3. try_mut (Bytes -> BytesMut)

Current VTable Implementation

The VTable implementation currently looks like:

struct Vtable {
    /// fn(data, ptr, len)  to increment a refcount (which may require restructuring)
    pub clone: unsafe fn(&AtomicPtr<()>, *const u8, usize) -> Bytes,
    /// fn(data, ptr, len) To convert a Bytes to a Vec<u>
    pub to_vec: unsafe fn(&AtomicPtr<()>, *const u8, usize) -> Vec<u8>,
    /// fn(data, ptr, len) To decrement or deallocate a Bytes instance
    pub drop: unsafe fn(&mut AtomicPtr<()>, *const u8, usize),
}

New Functionality Required

As I will explain below: Exposing the vtable itself is not quite sufficient to meet the new requirements because it has no way to:

Construct a Bytes* instance from a 3rdPartyBuffer.
Unwrap a Bytes* instance back into a 3rdPartyBuffer.

As we all know, exposing a plain struct as a public interface is generally considered a bad idea. If we ever need to add/remove/alter fields to the vtable, it will break the API.

In addition, there is some buffer lifecycle management code that might be better to be stored in the VTable (see BytesMut::resize and Bytes::truncate)

For this reason, we need some added functionality that is specific to each underlying buffer type. (Either in the vtable or somewhere nearby)

resize :: (*instance, ptr, len) -> Result<(), usize)
1. This should be called before the actual slice resizing, to prepare the underlying buffer.
2. It should return Err if it is unable to accomodate (some buffer schemes can't resize)
to_byte :: (*instance) -> (*instance, ptr, len)
1. supply the necessary parts of a Byte or ByteMut so the Bytes impl can c
from_byte: `(*instance, ptr, len) -> T
1. Given the components of a Bytes struct, re-create the original 3rdPartyBuffer
(optionally) type_id :: TypeId
1. We'd use this to ensure that the 3rdPartyBuffer that was the source, is also the dest if we want to "downcast" this instance back to the 3rdPartyBuffer

Proposed Designs

There are 2 lobes to this:

How to expose the functionality in a 3rd Party API
How should the Bytes/Mut API change to reflect the T <--> Bytes/Mut functionality.

Item 2 should be rather trivial, and would likely be subject to bike-shedding, so this post will only discuss 1.

1. VTableBuilder Proposal

This was proposed by @Matthias247 in #287 (comment)

This mentions explicitly constructing a VTable object, which we may not want, we could alternatively use the Vtable builder as an alternate Bytes constructor, so we don't have to make the VTable struct public.

    let mybytes = Bytes::with_vtable()
        .clone_fn(my_clone_fn)
        .resize_fn(my_resize_fn)
        .drop_fn(my_drop_fn)
        .from_bytes_fn(MyBufferType::from_bytes_parts)
        .build(const* MyBufferType);

The idea would be to offer an API that allows the user to supply fn impls/pointers for each of the known fields of the vtable, but we would still be free to add/alter fields in the VTable.

Pros

Easiest solution to implement for the Bytes crate.

Cons

Exposes quite a few internal details of the VTable. If we do need to alter VTable, it could still break compatibility.
The actual implementations of the functions could be tricky to implement. as an example

2. dyn Trait Approach

This was proposed by several people, the most compelling example was by @quark-zju in their implementation of mini-bytes. #359 (comment)

This implementation very simply and elegantly allows a Byte to use the buffer contents of a 3rdPartyBuffer as well as control its lifetime in a refcounted way, and with very little hassle on the part of the 3rd party.

(Note that the above example doesn't defer the clone/drop to the 3rdPartyBuffer instance. So it can't offer the optimizations that Bytes currently offers, however, that can easily be added)

The idea is this:

    trait ManagedBuffer {
        fn  get_slice() -> &[u8]; 
        fn inc_ref() -> Option<Self>;
        fn dec_ref() -> Option<Self>;
        fn resize(&self, usize) -> Option<Self>; 
        ... 
    }
    
    struct Bytes {
        ptr: *const u8,
        len: usize,
        owner: Box<dyn ManagedBuffer>,
    }
    
    impl Bytes {
        pub fn from_buffer_manager(bman: impl BufferManager);
    }

So you impl BufferManager for your 3rdPartyBuffer, and use it to construct

Others have suggested a similar approach, but to store the object as an Arc<dyn Any + Send + Sync + 'static> so that downcasting can be simply accomplished by the Arc implementation

Pros

Intuitive for most developers
Simplifies the implementation of Bytes
Lets rustc manage the VTable, so we don't have to.
The operation to convert Bytes back into 3rdPartyBuffer is trivial and universal.

Cons

Significant change vs the current model.
Lifecycle management might git a bit more complicated.

3. Trait as VTable Builder Approach

This is sort of a hybrid of approaches 1 and 2. It offers more flexibility than both, but at the possible expense of greater complexity. This was implemented in PR (#567) by @HyeonuPark .

This approach features an expanded VTable, as well as a stateless Trait design. This differs from approach 2 because the trait exists to provide functionality to the VTable, as well as provide functionality to re-construct/downcast the Bytes type back to T.

Note that the above PR features a clone method that looks like: unsafe fn clone(data: &AtomicPtr<()>, ptr: *const u8, len: usize) -> Bytes; With the current Bytes API, this would require that the implementor of the BytesImpl trait know the internal construction of a Bytes object. That can be fixed with the addition of a fn from_parts(...) -> Self method to the Bytes impl.

Pros

No mention of the VTable to the public.
A expanding the trait for this won't break compatibility (if the new methods have default impls)

Cons

Higher complexity for both the Bytes impl and the 3rd party.

4. The In-a-Perfect-World Approach

As mentioned in the intro, Bytes encompasses two somewhat orthogonal sets of functionality. Buffer Lifecycle Management, and Buffer Read/Write operations.

This design follows the analogy of the Future and FutureExt traits. The impl Future does the dirty work of concurrency management, but can be implemented for a wide variety of use-cases. Then the FutureExt trait offers the high-level functionality.

So all Buffer lifecycle management, copy on write etc, would be implemented by the impl of trait ManagedBuffer, then the high-level IO functionality would be implemented in trait BytesExt.

The ManagedBuffer design would be similar to proposal 2, except that it would be designed in such a way that the implementation of ManagedBuffer is 100% responsible for itself, e.g. clone, drop, resize, into_parts, from_parts. It would be completely self contained.

In order to maintain compatibility with the existing API, Bytes and BytesMut would become thin wrappers that store a dyn ManagedBuffer and then provide a facade over the functions provided by BytesExt.

The implementor of ManagedBuffer could be any 3rd party type.

So it'd look something like:

pub trait ManagedBuffer {
        fn from_parts() -> Self; 
        fn into_parts(this: Self) -> (...);
        fn clone() -> Self;
        fn drop() -> Self
        fn resize(usize) -> Self
}

pub trait ByteExt: ManagedBuffer {
    fn slice() -> Self {
        ...
    }
    fn len() -> usize { 
        ....
    }
    ....
}

impl<T> ByteExt for T
where
    T: ManagedBuffer + ?Sized


struct StaticBuffer {
    buf: &'static [u8]
} 

// this is where the STATIC_VTABLE impl would go.. more or less
impl ManagedBuffer for StaticBuffer {
 ....
}

...

rrichardson · 2023-01-17T17:05:13Z

pinging @ipoupaille for comment, since they made a PR (#558) for VTable exposure as well.

chenyuanrun · 2023-02-11T07:20:41Z

I need this to avoid memory copy between rust and c++ code.

djkoloski · 2023-12-03T15:43:13Z

A couple people are asking for ways to convert rkyv's AlignedVec into an axum::body::Body without copying the bytes. It looks like custom Bytes constructors are required to do this.

maxburke · 2024-03-19T18:40:19Z

Hi! I'd also like to throw an up-vote in on this issue. I've got some memory mapped files I would like to expose a zero-copy Bytes interface on, and right now the only way I've been able to do that is by using Bytes::with_vtable. So, if it could be made public, perhaps behind a feature gate, it would make things easier.

scottlamb · 2024-03-19T20:21:57Z

I've got some memory mapped files I would like to expose a zero-copy Bytes interface on,

YMMV depending on whether your files are disk-backed or SSD-backed, whether you can mlock them (in terms of affording the RAM and the syscall cost), and whether you're accessing the Bytes from an async thread, but...my experience has been this combo is a bad idea because it causes unpredictable slow major faults that stall the event-driven threads for a long time. More here.

maxburke · 2024-04-02T21:42:07Z

YMMV depending on whether your files are disk-backed or SSD-backed, whether you can mlock them (in terms of affording the RAM and the syscall cost), and whether you're accessing the Bytes from an async thread, but...my experience has been this combo is a bad idea because it causes unpredictable slow major faults that stall the event-driven threads for a long time. More here.

I think I've got a pretty good idea of the risks; in our case we have a cache fronting an object store, so if we miss we're going to hit S3 (or equivalent) anyways, which is much slower than faulting from SSD, which all our machines have for local storage.

JackKelly · 2024-04-03T11:33:40Z

I'd also like to up-vote the idea of exposing vtable! I'll explain my use-case, and my requirements, in the hope that this is relevant to the current discussion about exposing the vtable...

Background on my use-case

I'm working on a set of crates for high-speed I/O using io_uring with O_DIRECT. The aim is to load millions of small chunks per second from locally-attached SSDs. My I/O crate will return a crossbeam::channel of chunks. In a subsequent crate, a Rayon threadpool will consume this channel of chunks, and process the chunks. I'm not planning to use async/await.

`O_DIRECT` and buffer alignment

The O_DIRECT flag means that, when a file is read, the data skips the Linux page cache and the data is read directly from disk into the user-space memory buffer. When using O_DIRECT, Linux may require that the memory buffer's start address and end address be aligned to the filesystem's logical block-size¹. Different file systems have different logical block-sizes, and different alignment requirements, so we won't know the alignment requirements until runtime. For example, ext4 (on my system) requires the start and end of the buffer to be aligned to 512-bytes.

Buffer recycling

I'd also like to support parallel decompression of chunks of compressed data, where we can recycle buffers after data has been decompressed.

Requirements

I'd love to be able to use Bytes for my use-case, because Bytes is so well-known in the Rust ecosystem.

But, at the moment, I can't use Bytes (and, instead, I'm considering implementing my own minimal ref-counted buffer).

To be able to use Bytes, I'd need Bytes to allow my code to set the alignment at runtime. I guess this functionality could be exposed in two ways:

Bytes could implement something like fn Bytes::new_aligned(len: usize, alignment: usize) -> Bytes.
Bytes could be constructed from an existing buffer. e.g. by implementing something like Bytes::from_raw_parts, but - crucially - the user would need to pass alignment information to Bytes.

Crucially, in both of these situations, vtable.drop would need to know the alignment. This is because std::alloc::dealloc needs to be passed the same layout that was used to allocate the buffer.

I'd also love Bytes to have a mechanism for recycling buffers. But, for my use-case, buffer recycling is currently a "nice-to-have", not a "must-have".

Which "proposed designs" would enable my use-case?

Proposal name	Enables my use-case?	Explanation
1. VTableBuilder Proposal	❌	As this currently stands, this proposal doesn't appear to be able to store an `alignment: usize`. We could hardcode an alignment in `my_drop_fn` but that won't work for my use-case because I won't know the alignment until runtime. So, unfortunately, this proposal doesn't satisfy my requirements, AFAICT.
2. dyn Trait Approach	✅	Looks good! I can store `alignment` in a custom struct (and my custom struct would `impl ManagedBuffer`).
3. Trait as VTable Builder Approach	❌?	I must admit I haven't fully wrapped my head around this approach yet! But I don't think this would support arbitrary alignment (because `BytesImpl::from_bytes_parts` doesn't accept an `alignment`). But I could be wrong!
4. The In-a-Perfect-World Approach	✅	I think this would work for my use-case (for the same reasons as proposal 2 would work).

Maybe my use-case isn't a good fit for `Bytes`?

Maybe my use-case is a bad fit for Bytes 🙂.

My use-case doesn't need some of the main features that Bytes offers (different implementations of the same API, growing/truncating buffers). And I do need some features that Bytes doesn't yet have (setting the alignment; recycling buffers).

So, please don't feel bad if your response to my comment is "Bytes shouldn't enable this use-case" 😄!

Our use case
We use an in-memory key-value cache looking something like LruHashMap<Digest, Bytes>. GRPC requests (via tonic) then frequently ask for the blob at Digest and we can just make a refcopy of the Bytes struct and pass it directly to tonic to send off. This is super useful, because we care a lot about reducing data copies.

Lately we've been having memory fragmentation issues because these Bytes are often very small (~50% of the data is less than 500 bytes). We would like to instead be able to allocate a large slab of memory (lets say 30GB) and then when a Bytes object that is going to live in this LruHashMap is allocated, we can have the real pointer point to a place in the slab that a non-global allocator (like mimalloc or jemalloc) will manage for us. If there is not enough space in the slab, we can start evicting items from the LruHashMap until there is enough space for the allocator to give the requested pointer+size out.

This should be fairly simple to implement if we could have control of the underlying pointer that Bytes uses & the destructor.

amunra · 2024-10-23T14:36:19Z

Hello, I've hit a nasty segfault in our software since we were (without success) working around the limitation of not being able to create Bytes from an mmap.

Due to perf constraits we could not de-optimise the code and create allocated copies in memory, so I've bit the bullet and I'm hoping that you will find #742 acceptable.

I've taken a different path to addressing this issue: Previous attempts have been very generic, but risked exposing much of Bytes's internal VTable implementation.

My implementation takes a simpler approach of just keeping an owner object around for the duration that is necessary.

I hope you may find this useful. I'm hoping it's in a shape that can be merged in - or close enough with some feedback and edits.

Thank you!

carllerche mentioned this issue Oct 20, 2020

Add Bytes::try_unsplit() #287

Open

HyeonuPark mentioned this issue Aug 20, 2022

Custom vtable API #567

Draft

rklaehn mentioned this issue Sep 30, 2022

Allow creating custom Bytes instances using constructor that takes an Arc<dyn Any> #571

Closed

rklaehn mentioned this issue Feb 3, 2023

Trade Bytes for a Trait aws/s2n-quic#1619

Open

rrichardson mentioned this issue Feb 7, 2023

Introduce SharedBuf trait for Bytes VTable #596

Open

Darksonn mentioned this issue Feb 24, 2023

Bytes::with_vtable Can it be opened for external use #526

Closed

rklaehn mentioned this issue Jun 21, 2023

async local io n0-computer/iroh#1115

Merged

tustvold mentioned this issue Aug 16, 2023

Add safe zero-copy conversion from bytes::Bytes (#4254) apache/arrow-rs#4260

Merged

This was referenced Mar 26, 2024

Implement Bytes::from_raw_parts #684

Closed

Maybe use Bytes within AlignedBuffer JackKelly/light-speed-io#112

Closed

aleksdmladenovic mentioned this issue Jun 26, 2024

Resolved Bytes vtable TraceMachina/nativelink#1057

Closed

4 tasks

robert3005 mentioned this issue Jul 12, 2024

Replace tokio Bytes with Bytes implementation that can have arbitrary alignment spiraldb/vortex#454

Open

Dushistov mentioned this issue Jul 21, 2024

allow sending messages with static or shared data snapview/tungstenite-rs#175

Open

LDeakin mentioned this issue Aug 27, 2024

Initial direct I/O support in FilesystemStore LDeakin/zarrs#58

Merged

5 tasks

amunra mentioned this issue Oct 23, 2024

feat: Bytes::from_owner #742

Merged

kylebarron mentioned this issue Oct 23, 2024

Use 'static lifetime in BoxFuture for all object-store APIs apache/arrow-rs#6587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta: Expose `Bytes` vtable #437

meta: Expose `Bytes` vtable #437

carllerche commented Oct 20, 2020

rrichardson commented Jan 13, 2023 •

edited

Loading

rrichardson commented Jan 17, 2023

chenyuanrun commented Feb 11, 2023

djkoloski commented Dec 3, 2023

maxburke commented Mar 19, 2024

scottlamb commented Mar 19, 2024

maxburke commented Apr 2, 2024

JackKelly commented Apr 3, 2024 •

edited

Loading

allada commented Apr 22, 2024 •

edited by seanmonstar

Loading

amunra commented Oct 23, 2024 •

edited

Loading

meta: Expose Bytes vtable #437

meta: Expose Bytes vtable #437

Comments

carllerche commented Oct 20, 2020

rrichardson commented Jan 13, 2023 • edited Loading

Introduction

Level Setting

Current VTable Implementation

New Functionality Required

Proposed Designs

1. VTableBuilder Proposal

Pros

Cons

2. dyn Trait Approach

Pros

Cons

3. Trait as VTable Builder Approach

Pros

Cons

4. The In-a-Perfect-World Approach

rrichardson commented Jan 17, 2023

chenyuanrun commented Feb 11, 2023

djkoloski commented Dec 3, 2023

maxburke commented Mar 19, 2024

scottlamb commented Mar 19, 2024

maxburke commented Apr 2, 2024

JackKelly commented Apr 3, 2024 • edited Loading

Background on my use-case

O_DIRECT and buffer alignment

Buffer recycling

Requirements

Which "proposed designs" would enable my use-case?

Maybe my use-case isn't a good fit for Bytes?

Related

Footnotes

allada commented Apr 22, 2024 • edited by seanmonstar Loading

amunra commented Oct 23, 2024 • edited Loading

meta: Expose `Bytes` vtable #437

meta: Expose `Bytes` vtable #437

rrichardson commented Jan 13, 2023 •

edited

Loading

JackKelly commented Apr 3, 2024 •

edited

Loading

`O_DIRECT` and buffer alignment

Maybe my use-case isn't a good fit for `Bytes`?

allada commented Apr 22, 2024 •

edited by seanmonstar

Loading

amunra commented Oct 23, 2024 •

edited

Loading