Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta: Expose Bytes vtable #437

Open
carllerche opened this issue Oct 20, 2020 · 10 comments
Open

meta: Expose Bytes vtable #437

carllerche opened this issue Oct 20, 2020 · 10 comments

Comments

@carllerche
Copy link
Member

At some point in the future, we will want to expose the vtable backing Bytes. This issue tracks relevant discussion as well as API issues that relate to the vtable.

@rrichardson
Copy link
Contributor

rrichardson commented Jan 13, 2023

I've been researching a bunch of relevant tickets and solutions for the last couple days, I'm going to attempt to summarize all of the issues/PRs so that we can discuss them in one place and hopefully move forward with a solution.

Introduction

Bytes has become the de-facto ref-counted Buffer for asynchronous services and service frameworks. Interop with most high-level frameworks in the Tokio ecosystem requires using Bytes. However, Bytes features a limited set of allocation modes, so it is incompatible with many high-performance memory management systems. Recent interest in io_uring and it's shared ownership model has highlighted this. Also, there have been a couple requests made to interop with Memory Mapped data as well. In addition, Recycling Buffer Pools/Freelists need a way to "drop" themselves back into their pool instead of De-allocating.

Why should I use Bytes over an Arc<Vec<u8>>? The answer is in Bytes' vtable. It allows differently behaving Bytes instances to offer the same homogenous interface. This enables copy-on-write functionality, as well as supporting common operations on buffers that Rust won't normally allow, such as splitting a partially filled buffer to hand to a consumer while the other portion of the buffer continues to be written.

One might go so far as to say that Bytes is actually two orthogonal sets of functionality under one struct

  1. Buffer IO Operations (Read/Write/Cursor)
  2. Buffer Lifecycle Management.

Level Setting

There have been repeated requests for 2 umbrellas of operations:

  1. Constructing RefCounted instances that enclose 3rdPartyBuffers
    1. 3rdPartyBuffer -> Bytes
    2. 3rdPartyBuffer -> BytesMut
    3. BytesMut -> 3rdPartyBuffer
    4. Bytes -> 3rdPartyBuffer
  2. Conversion and Interop between Bytes and BytesMut
    1. split (checked and unchecked)
      1. (Bytes -> BytesMut
      2. BytesMut -> Bytes
    2. unsplit (allocating, try_*)
      1. (Bytes|BytesMut) -> (Bytes | BytesMut) -> (Bytes | BytesMut)
    3. try_mut (Bytes -> BytesMut)

Current VTable Implementation

The VTable implementation currently looks like:

struct Vtable {
    /// fn(data, ptr, len)  to increment a refcount (which may require restructuring)
    pub clone: unsafe fn(&AtomicPtr<()>, *const u8, usize) -> Bytes,
    /// fn(data, ptr, len) To convert a Bytes to a Vec<u>
    pub to_vec: unsafe fn(&AtomicPtr<()>, *const u8, usize) -> Vec<u8>,
    /// fn(data, ptr, len) To decrement or deallocate a Bytes instance
    pub drop: unsafe fn(&mut AtomicPtr<()>, *const u8, usize),
}

New Functionality Required

As I will explain below: Exposing the vtable itself is not quite sufficient to meet the new requirements because it has no way to:

  1. Construct a Bytes* instance from a 3rdPartyBuffer.
  2. Unwrap a Bytes* instance back into a 3rdPartyBuffer.

As we all know, exposing a plain struct as a public interface is generally considered a bad idea. If we ever need to add/remove/alter fields to the vtable, it will break the API.

In addition, there is some buffer lifecycle management code that might be better to be stored in the VTable (see BytesMut::resize and Bytes::truncate)

For this reason, we need some added functionality that is specific to each underlying buffer type. (Either in the vtable or somewhere nearby)

  1. resize :: (*instance, ptr, len) -> Result<(), usize)
    1. This should be called before the actual slice resizing, to prepare the underlying buffer.
    2. It should return Err if it is unable to accomodate (some buffer schemes can't resize)
  2. to_byte :: (*instance) -> (*instance, ptr, len)
    1. supply the necessary parts of a Byte or ByteMut so the Bytes impl can c
  3. from_byte: `(*instance, ptr, len) -> T
    1. Given the components of a Bytes struct, re-create the original 3rdPartyBuffer
  4. (optionally) type_id :: TypeId
    1. We'd use this to ensure that the 3rdPartyBuffer that was the source, is also the dest if we want to "downcast" this instance back to the 3rdPartyBuffer

Proposed Designs

There are 2 lobes to this:

  1. How to expose the functionality in a 3rd Party API
  2. How should the Bytes/Mut API change to reflect the T <--> Bytes/Mut functionality.

Item 2 should be rather trivial, and would likely be subject to bike-shedding, so this post will only discuss 1.

1. VTableBuilder Proposal

This was proposed by @Matthias247 in #287 (comment)

This mentions explicitly constructing a VTable object, which we may not want, we could alternatively use the Vtable builder as an alternate Bytes constructor, so we don't have to make the VTable struct public.

    let mybytes = Bytes::with_vtable()
        .clone_fn(my_clone_fn)
        .resize_fn(my_resize_fn)
        .drop_fn(my_drop_fn)
        .from_bytes_fn(MyBufferType::from_bytes_parts)
        .build(const* MyBufferType);

The idea would be to offer an API that allows the user to supply fn impls/pointers for each of the known fields of the vtable, but we would still be free to add/alter fields in the VTable.

Pros

  1. Easiest solution to implement for the Bytes crate.

Cons

  1. Exposes quite a few internal details of the VTable. If we do need to alter VTable, it could still break compatibility.
  2. The actual implementations of the functions could be tricky to implement. as an example

2. dyn Trait Approach

This was proposed by several people, the most compelling example was by @quark-zju in their implementation of mini-bytes. #359 (comment)

This implementation very simply and elegantly allows a Byte to use the buffer contents of a 3rdPartyBuffer as well as control its lifetime in a refcounted way, and with very little hassle on the part of the 3rd party.

(Note that the above example doesn't defer the clone/drop to the 3rdPartyBuffer instance. So it can't offer the optimizations that Bytes currently offers, however, that can easily be added)

The idea is this:

    trait ManagedBuffer {
        fn  get_slice() -> &[u8]; 
        fn inc_ref() -> Option<Self>;
        fn dec_ref() -> Option<Self>;
        fn resize(&self, usize) -> Option<Self>; 
        ... 
    }
    
    struct Bytes {
        ptr: *const u8,
        len: usize,
        owner: Box<dyn ManagedBuffer>,
    }
    
    impl Bytes {
        pub fn from_buffer_manager(bman: impl BufferManager);
    }

So you impl BufferManager for your 3rdPartyBuffer, and use it to construct

Others have suggested a similar approach, but to store the object as an Arc<dyn Any + Send + Sync + 'static> so that downcasting can be simply accomplished by the Arc implementation

Pros

  1. Intuitive for most developers
  2. Simplifies the implementation of Bytes
  3. Lets rustc manage the VTable, so we don't have to.
  4. The operation to convert Bytes back into 3rdPartyBuffer is trivial and universal.

Cons

  1. Significant change vs the current model.
  2. Lifecycle management might git a bit more complicated.

3. Trait as VTable Builder Approach

This is sort of a hybrid of approaches 1 and 2. It offers more flexibility than both, but at the possible expense of greater complexity. This was implemented in PR (#567) by @HyeonuPark .

This approach features an expanded VTable, as well as a stateless Trait design. This differs from approach 2 because the trait exists to provide functionality to the VTable, as well as provide functionality to re-construct/downcast the Bytes type back to T.

Note that the above PR features a clone method that looks like: unsafe fn clone(data: &AtomicPtr<()>, ptr: *const u8, len: usize) -> Bytes; With the current Bytes API, this would require that the implementor of the BytesImpl trait know the internal construction of a Bytes object. That can be fixed with the addition of a fn from_parts(...) -> Self method to the Bytes impl.

Pros

  1. No mention of the VTable to the public.
  2. A expanding the trait for this won't break compatibility (if the new methods have default impls)

Cons

  1. Higher complexity for both the Bytes impl and the 3rd party.

4. The In-a-Perfect-World Approach

As mentioned in the intro, Bytes encompasses two somewhat orthogonal sets of functionality. Buffer Lifecycle Management, and Buffer Read/Write operations.

This design follows the analogy of the Future and FutureExt traits. The impl Future does the dirty work of concurrency management, but can be implemented for a wide variety of use-cases. Then the FutureExt trait offers the high-level functionality.

So all Buffer lifecycle management, copy on write etc, would be implemented by the impl of trait ManagedBuffer, then the high-level IO functionality would be implemented in trait BytesExt.

The ManagedBuffer design would be similar to proposal 2, except that it would be designed in such a way that the implementation of ManagedBuffer is 100% responsible for itself, e.g. clone, drop, resize, into_parts, from_parts. It would be completely self contained.

In order to maintain compatibility with the existing API, Bytes and BytesMut would become thin wrappers that store a dyn ManagedBuffer and then provide a facade over the functions provided by BytesExt.

The implementor of ManagedBuffer could be any 3rd party type.

So it'd look something like:

pub trait ManagedBuffer {
        fn from_parts() -> Self; 
        fn into_parts(this: Self) -> (...);
        fn clone() -> Self;
        fn drop() -> Self
        fn resize(usize) -> Self
}

pub trait ByteExt: ManagedBuffer {
    fn slice() -> Self {
        ...
    }
    fn len() -> usize { 
        ....
    }
    ....
}

impl<T> ByteExt for T
where
    T: ManagedBuffer + ?Sized


struct StaticBuffer {
    buf: &'static [u8]
} 

// this is where the STATIC_VTABLE impl would go.. more or less
impl ManagedBuffer for StaticBuffer {
 ....
}

... 

@rrichardson
Copy link
Contributor

pinging @ipoupaille for comment, since they made a PR (#558) for VTable exposure as well.

@chenyuanrun
Copy link

I need this to avoid memory copy between rust and c++ code.

@djkoloski
Copy link

A couple people are asking for ways to convert rkyv's AlignedVec into an axum::body::Body without copying the bytes. It looks like custom Bytes constructors are required to do this.

@maxburke
Copy link

Hi! I'd also like to throw an up-vote in on this issue. I've got some memory mapped files I would like to expose a zero-copy Bytes interface on, and right now the only way I've been able to do that is by using Bytes::with_vtable. So, if it could be made public, perhaps behind a feature gate, it would make things easier.

@scottlamb
Copy link

I've got some memory mapped files I would like to expose a zero-copy Bytes interface on,

YMMV depending on whether your files are disk-backed or SSD-backed, whether you can mlock them (in terms of affording the RAM and the syscall cost), and whether you're accessing the Bytes from an async thread, but...my experience has been this combo is a bad idea because it causes unpredictable slow major faults that stall the event-driven threads for a long time. More here.

@maxburke
Copy link

maxburke commented Apr 2, 2024

YMMV depending on whether your files are disk-backed or SSD-backed, whether you can mlock them (in terms of affording the RAM and the syscall cost), and whether you're accessing the Bytes from an async thread, but...my experience has been this combo is a bad idea because it causes unpredictable slow major faults that stall the event-driven threads for a long time. More here.

I think I've got a pretty good idea of the risks; in our case we have a cache fronting an object store, so if we miss we're going to hit S3 (or equivalent) anyways, which is much slower than faulting from SSD, which all our machines have for local storage.

@JackKelly
Copy link

JackKelly commented Apr 3, 2024

I'd also like to up-vote the idea of exposing vtable! I'll explain my use-case, and my requirements, in the hope that this is relevant to the current discussion about exposing the vtable...

Background on my use-case

I'm working on a set of crates for high-speed I/O using io_uring with O_DIRECT. The aim is to load millions of small chunks per second from locally-attached SSDs. My I/O crate will return a crossbeam::channel of chunks. In a subsequent crate, a Rayon threadpool will consume this channel of chunks, and process the chunks. I'm not planning to use async/await.

O_DIRECT and buffer alignment

The O_DIRECT flag means that, when a file is read, the data skips the Linux page cache and the data is read directly from disk into the user-space memory buffer. When using O_DIRECT, Linux may require that the memory buffer's start address and end address be aligned to the filesystem's logical block-size1. Different file systems have different logical block-sizes, and different alignment requirements, so we won't know the alignment requirements until runtime. For example, ext4 (on my system) requires the start and end of the buffer to be aligned to 512-bytes.

Buffer recycling

I'd also like to support parallel decompression of chunks of compressed data, where we can recycle buffers after data has been decompressed.

Requirements

I'd love to be able to use Bytes for my use-case, because Bytes is so well-known in the Rust ecosystem.

But, at the moment, I can't use Bytes (and, instead, I'm considering implementing my own minimal ref-counted buffer).

To be able to use Bytes, I'd need Bytes to allow my code to set the alignment at runtime. I guess this functionality could be exposed in two ways:

  1. Bytes could implement something like fn Bytes::new_aligned(len: usize, alignment: usize) -> Bytes.
  2. Bytes could be constructed from an existing buffer. e.g. by implementing something like Bytes::from_raw_parts, but - crucially - the user would need to pass alignment information to Bytes.

Crucially, in both of these situations, vtable.drop would need to know the alignment. This is because std::alloc::dealloc needs to be passed the same layout that was used to allocate the buffer.

I'd also love Bytes to have a mechanism for recycling buffers. But, for my use-case, buffer recycling is currently a "nice-to-have", not a "must-have".

Which "proposed designs" would enable my use-case?

Proposal name Enables my use-case? Explanation
1. VTableBuilder Proposal As this currently stands, this proposal doesn't appear to be able to store an alignment: usize. We could hardcode an alignment in my_drop_fn but that won't work for my use-case because I won't know the alignment until runtime. So, unfortunately, this proposal doesn't satisfy my requirements, AFAICT.
2. dyn Trait Approach Looks good! I can store alignment in a custom struct (and my custom struct would impl ManagedBuffer).
3. Trait as VTable Builder Approach ❌? I must admit I haven't fully wrapped my head around this approach yet! But I don't think this would support arbitrary alignment (because BytesImpl::from_bytes_parts doesn't accept an alignment). But I could be wrong!
4. The In-a-Perfect-World Approach I think this would work for my use-case (for the same reasons as proposal 2 would work).

Maybe my use-case isn't a good fit for Bytes?

Maybe my use-case is a bad fit for Bytes 🙂.

My use-case doesn't need some of the main features that Bytes offers (different implementations of the same API, growing/truncating buffers). And I do need some features that Bytes doesn't yet have (setting the alignment; recycling buffers).

So, please don't feel bad if your response to my comment is "Bytes shouldn't enable this use-case" 😄!

Related

Footnotes

  1. See the notes on O_DIRECT at the bottom of the open(2) man page.

@allada
Copy link

allada commented Apr 22, 2024

I'd also like to express a desire to have the ability to take over control of how memory is managed in the Bytes struct (without a template allocator).

Our use case
We use an in-memory key-value cache looking something like LruHashMap<Digest, Bytes>. GRPC requests (via tonic) then frequently ask for the blob at Digest and we can just make a refcopy of the Bytes struct and pass it directly to tonic to send off. This is super useful, because we care a lot about reducing data copies.

Lately we've been having memory fragmentation issues because these Bytes are often very small (~50% of the data is less than 500 bytes). We would like to instead be able to allocate a large slab of memory (lets say 30GB) and then when a Bytes object that is going to live in this LruHashMap is allocated, we can have the real pointer point to a place in the slab that a non-global allocator (like mimalloc or jemalloc) will manage for us. If there is not enough space in the slab, we can start evicting items from the LruHashMap until there is enough space for the allocator to give the requested pointer+size out.

This should be fairly simple to implement if we could have control of the underlying pointer that Bytes uses & the destructor.

@amunra
Copy link
Contributor

amunra commented Oct 23, 2024

Hello, I've hit a nasty segfault in our software since we were (without success) working around the limitation of not being able to create Bytes from an mmap.

Due to perf constraits we could not de-optimise the code and create allocated copies in memory, so I've bit the bullet and I'm hoping that you will find #742 acceptable.

I've taken a different path to addressing this issue: Previous attempts have been very generic, but risked exposing much of Bytes's internal VTable implementation.

My implementation takes a simpler approach of just keeping an owner object around for the duration that is necessary.

I hope you may find this useful. I'm hoping it's in a shape that can be merged in - or close enough with some feedback and edits.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants