From 6fb7a281fbe6bd7d4c609a4c4e849d67b37e0bfd Mon Sep 17 00:00:00 2001 From: John Millikin Date: Wed, 29 Nov 2023 13:18:18 +0900 Subject: [PATCH 1/2] RFC: Trait for `!Sized` thin pointers --- text/3536-unsized-thin-pointers.md | 286 +++++++++++++++++++++++++++++ 1 file changed, 286 insertions(+) create mode 100644 text/3536-unsized-thin-pointers.md diff --git a/text/3536-unsized-thin-pointers.md b/text/3536-unsized-thin-pointers.md new file mode 100644 index 00000000000..e731b07f566 --- /dev/null +++ b/text/3536-unsized-thin-pointers.md @@ -0,0 +1,286 @@ +- Feature Name: `unsized_thin_pointers` +- Start Date: 2023-11-29 +- RFC PR: [rust-lang/rfcs#3536](https://github.com/rust-lang/rfcs/pull/3536) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Enable user code to define dynamically-sized thin pointers. Such types are +`!Sized`, but references to them are pointer-sized (i.e. not "fat pointers"). +The implementation of [`core::mem::size_of_val()`][size_of_val] delegates to +a new `core::mem::DynSized` trait at runtime. + +[size_of_val]: https://doc.rust-lang.org/core/mem/fn.size_of_val.html + +# Motivation +[motivation]: #motivation + +Enable ergonomic and efficient references to dynamically-sized values that +are capable of computing their own size. + +It should be possible to declare a Rust type that is `!Sized`, but has +references that are pointer-sized and therefore only require a single register +on most architectures. + +In particular this RFC aims to support a common pattern in other low-level +languages, such as C, where a value may consist of a fixed-layout header +followed by dynamically-sized data: + +```c +struct __attribute__((aligned(8))) request { + uint32_t size; + uint16_t id; + uint16_t flags; + /* uint8_t request_data[]; */ +}; + +void handle_request(struct request *req) { /* ... */ } +``` + +This pattern is used frequently in zero-copy APIs that transmit structured data +between address spaces of differing trust levels. + +# Background +[motivation]: #motivation + +There are currently two approved RFCs that cover similar functionality: +* [RFC 1861] adds `extern type` for declaring types that are opaque to Rust's + type system. One of the capabilities available to extern types is that they + can be embedded into a `struct` as the last field, and that `struct` will + become an unsized type with thin references. + + Stabilizing `extern type` is currently blocked on questions of how to handle + Rust layout intrinsics such as [`core::mem::size_of_val()`][size_of_val] and + [`core::mem::align_of_val()`][align_of_val] for fully opaque types. + +* [RFC 2580] adds traits and intrinsics for custom DSTs either with or without + associated "fat pointer" metadata. A custom DST with thin references can be + represented as `Pointee`. + + Stabilizing custom DSTs is currently blocked on multiple questions involving + the content and representation of complex metadata, such as `&dyn` vtables. + +In both of these cases the ability to declare custom DSTs with thin references +is a minor footnote to the overall feature, and stabilization is blocked by +issues unrelated to thin-pointer DSTs. + +The objective of this RFC is to extract custom thin-pointer DSTs into its own +feature, which would hopefully be free of known issues and could be stabilized +without significant changes to the compiler or ecosystem. + +[RFC 1861]: https://rust-lang.github.io/rfcs/1861-extern-types.html +[RFC 2580]: https://rust-lang.github.io/rfcs/2580-ptr-meta.html + +[align_of_val]: https://doc.rust-lang.org/core/mem/fn.align_of_val.html + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +The unsafe trait `core::mem::DynSized` may be implemented for a `!Sized` type +to configure how the size of a value is computed from a reference. References +to a type that implements `DynSized` are not required to store the value size +as pointer metadata. + +If a type that implements `DynSized` has no other associated pointer metadata +(such as a vtable), then references to that type will have the same size and +layout as a normal pointer. + +```rust +#[repr(C, align(8))] +struct Request { + size: u32, + id: u16, + flags: u16, + data: [u8], +} + +unsafe impl core::mem::DynSized for Request { + fn size_of_val(&self) -> usize { + usize::try_from(self.size).unwrap_or(usize::MAX) + } +} + +// size_of::<&Request>() == size_of::<*const ()>() +``` + +The `DynSized` trait has a single required method, `size_of_val()`, which +has the same semantics as `core::mem::size_of_val()`. + +```rust +// core::mem +pub unsafe trait DynSized { + // Returns the size of the pointed-to value in bytes. + fn size_of_val(&self) -> usize; +} +``` + +It is an error to `impl DynSized` for a type that is `Sized`. In other words, +the following code is invalid: + +```rust +#[repr(C, align(8))] +struct SizedRequest { + size: u32, + id: u16, + flags: u16, + data: [u8; 1024], +} + +// Compiler error: `impl DynSized` on a type that isn't `!Sized`. +unsafe impl core::mem::DynSized for SizedRequest { + fn size_of_val(&self) -> usize { + usize::try_from(self.size).unwrap_or(usize::MAX) + } +} +``` + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The `core::mem::DynSized` trait acts as a signal to the compiler that the +size of a value can be computed dynamically by the user-provided trait +implementation. If references to that type would otherwise be of the layout +`(ptr, usize)` due to being `!Sized`, then they can be reduced to `ptr`. + +The `DynSized` trait does not _guarantee_ that a type will have thin pointers, +it merely enables it. This definition is intended to be compatible with RFC +2580, in that types with complex pointer metadata would continue to have fat +pointers. Such types may choose to implement `DynSized` by extracting their +custom pointer metadata from `&self`. + +Implementing `DynSized` does not affect alignment, so the questions of how to +handle unknown alignments of RFC 1861 `extern type` DSTs do not apply. + +In current Rust, a DST used as a `struct` field must be the final field of the +`struct`. This restriction remains unchanged, as the offsets of any fields after +a DST would be impossible to compute statically. +- This also implies that any given `struct` may have at most one field that + implements `DynSized`. + +A `struct` with a field that implements `DynSized` will also implicitly +implement `DynSized`. The implicit implementation of `DynSized` computes the +size of the struct up until the `DynSized` field, and then adds the result of +calling `DynSized::size_of_val()` on the final field. +- This implies it's not permitted to manually `impl DynSize` for a type that + contains a field that implements `DynSize`. + +# Drawbacks +[drawbacks]: #drawbacks + +## Mutability of value sizes + +If the size of a value is stored in the value itself, then that implies it can +change at runtime. + +```rust +struct MutableSize { size: usize } +unsafe impl core::mem::DynSized for MutableSize { + fn size_of_val(&self) -> usize { self.size } +} + +let mut v = MutableSize { size: 8 }; +println!("{:?}", core::mem::size_of_val(&v)); // prints "8" +v.size = 16; +println!("{:?}", core::mem::size_of_val(&v)); // prints "16" +``` + +There may be existing code that assumes `size_of_val()` is constant for a given +value, which is true in today's Rust due to the nature of fat pointers, but +would no longer be true if `size_of_val()` is truly dynamic. + +Alternatively, the API contract for `DynSized` implementations could require +that the result of `size_of_val()` not change for the lifetime of the allocated +object. This would likely be true for nearly all interesting use cases, and +would let `DynSized` values be stored in a `Box`. + +## Compatibility with existing fat-pointer DSTs + +It may be desirable for certain existing stabilized DSTs to implement +`DynSized` -- for example, it is a natural fit for the planned redefinition of +[`&core::ffi::CStr`][cstr] as a thin pointer. + +[cstr]: https://doc.rust-lang.org/core/ffi/struct.CStr.html + +Such a change to existing types might be backwards-incompatible for code that +embeds those types as a `struct` field, because it would change the reference +layout. For example, the following code compiles in stable Rust v1.73 but would +be a compilation error if `&CStr` does not have the same layout as `&[u8]`. + +```rust +struct ContainsCStr { + cstr: core::ffi::CStr, +} +impl ContainsCStr { + fn as_bytes(&self) -> &[u8] { + unsafe { core::mem::transmute(self) } + } +} +``` + +The above incompatibility of a redefined `&CStr` exists regardless of this RFC, +but it's worth noting that implementing `DynSized` would be a backwards +incompatible change for existing DSTs. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This design is less generic than some of the alternatives (including custom DSTs +and extern types), but has the advantage being much more tightly scoped and +therefore is expected to have no major blockers. It directly addresses one of +the pain points for use of Rust in a low-level performance-sensitive codebase, +while avoiding large-scale language changes to the extent possible. + +Without this change, people will continue to either use thick-pointer DSTs +(reducing performance relative to C), or write Rust types that claim to be +`Sized` but actually aren't (the infamous `_data: [u8; 0]` hack). + +# Prior art +[prior-art]: #prior-art + +The canonical prior art is the C language idiom of a `struct` that's implicitly +followed by a dynamically-sized value. This idiom was standardized in C99 under +the term "flexible array member": + +> As a special case, the last element of a structure with more than one named +> member may have an incomplete array type; this is called a flexible array +> member. [...] However, when a `.` (or `->`) operator has a left operand that +> is (a pointer to) a structure with a flexible array member and the right +> operand names that member, it behaves as if that member were replaced with the +> longest array (with the same element type) that would not make the structure +> larger than the object being accessed; + +The use of flexible array members (either with C99 syntax or not) is widespread +in C APIs, especially when sending structured data between processes ([IPC]) or +between a process and the kernel. For example, the Linux kernel's [FUSE] +protocol communicates with userspace via length-prefixed dynamically-sized +request/response buffers. + +They're also common when implementing low-level network protocols, which have +length-delimited frames comprising a fixed-layout header followed by a variable +amount of payload data. + +[IPC]: https://en.wikipedia.org/wiki/Inter-process_communication +[FUSE]: https://www.kernel.org/doc/html/v6.3/filesystems/fuse.html + +In the context of Rust, the two RFCs mentioned earlier both cover thin-pointer +DSTs as part of their more general extensions to the Rust type system: +- [RFC 1861: `extern_types`](https://rust-lang.github.io/rfcs/1861-extern-types.html) +- [RFC 2580: `ptr_metadata`](https://rust-lang.github.io/rfcs/2580-ptr-meta.html) + +Also, there have been non-approved RFC proposals involving thin-pointer DSTs: +- [[rfcs/pull#709] truly unsized types](https://github.com/rust-lang/rfcs/pull/709) +- [[rfcs/pull#1524] Custom Dynamically Sized Types](https://github.com/rust-lang/rfcs/pull/1524) +- [[rfcs/pull#2255] More implicit bounds (?Sized, ?DynSized, ?Move)](https://github.com/rust-lang/rfcs/issues/2255) + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +None so far + +# Future possibilities +[future-possibilities]: #future-possibilities + +None so far. Further exploration of opaque types and/or custom pointer metadata +already has separate dedicated RFCs. This one is just to get an MVP for types +that should be `!Sized` without fat pointers. From 4624b5e9fba05cb5bee546f86c97dae1f92fbb5a Mon Sep 17 00:00:00 2001 From: John Millikin Date: Fri, 1 Dec 2023 19:32:16 +0900 Subject: [PATCH 2/2] Add links to C++ FAM proposals. --- text/3536-unsized-thin-pointers.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/text/3536-unsized-thin-pointers.md b/text/3536-unsized-thin-pointers.md index e731b07f566..cd346bf43de 100644 --- a/text/3536-unsized-thin-pointers.md +++ b/text/3536-unsized-thin-pointers.md @@ -263,6 +263,11 @@ amount of payload data. [IPC]: https://en.wikipedia.org/wiki/Inter-process_communication [FUSE]: https://www.kernel.org/doc/html/v6.3/filesystems/fuse.html +C++ does not currently support flexible array members; proposals to add them +(or similar functionality) include: +- [Exploring classes of runtime size](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4025.pdf) +- [Flexible Array Members for C++](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1039r0.html) + In the context of Rust, the two RFCs mentioned earlier both cover thin-pointer DSTs as part of their more general extensions to the Rust type system: - [RFC 1861: `extern_types`](https://rust-lang.github.io/rfcs/1861-extern-types.html)