-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Trait for !Sized
thin pointers
#3536
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,291 @@ | ||
- Feature Name: `unsized_thin_pointers` | ||
- Start Date: 2023-11-29 | ||
- RFC PR: [rust-lang/rfcs#3536](https://github.com/rust-lang/rfcs/pull/3536) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Enable user code to define dynamically-sized thin pointers. Such types are | ||
`!Sized`, but references to them are pointer-sized (i.e. not "fat pointers"). | ||
The implementation of [`core::mem::size_of_val()`][size_of_val] delegates to | ||
a new `core::mem::DynSized` trait at runtime. | ||
|
||
[size_of_val]: https://doc.rust-lang.org/core/mem/fn.size_of_val.html | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Enable ergonomic and efficient references to dynamically-sized values that | ||
are capable of computing their own size. | ||
|
||
It should be possible to declare a Rust type that is `!Sized`, but has | ||
references that are pointer-sized and therefore only require a single register | ||
on most architectures. | ||
|
||
In particular this RFC aims to support a common pattern in other low-level | ||
languages, such as C, where a value may consist of a fixed-layout header | ||
followed by dynamically-sized data: | ||
|
||
```c | ||
struct __attribute__((aligned(8))) request { | ||
uint32_t size; | ||
uint16_t id; | ||
uint16_t flags; | ||
/* uint8_t request_data[]; */ | ||
}; | ||
|
||
void handle_request(struct request *req) { /* ... */ } | ||
``` | ||
|
||
This pattern is used frequently in zero-copy APIs that transmit structured data | ||
between address spaces of differing trust levels. | ||
|
||
# Background | ||
[motivation]: #motivation | ||
|
||
There are currently two approved RFCs that cover similar functionality: | ||
* [RFC 1861] adds `extern type` for declaring types that are opaque to Rust's | ||
type system. One of the capabilities available to extern types is that they | ||
can be embedded into a `struct` as the last field, and that `struct` will | ||
become an unsized type with thin references. | ||
|
||
Stabilizing `extern type` is currently blocked on questions of how to handle | ||
Rust layout intrinsics such as [`core::mem::size_of_val()`][size_of_val] and | ||
[`core::mem::align_of_val()`][align_of_val] for fully opaque types. | ||
|
||
* [RFC 2580] adds traits and intrinsics for custom DSTs either with or without | ||
associated "fat pointer" metadata. A custom DST with thin references can be | ||
represented as `Pointee<Metadata = ()>`. | ||
|
||
Stabilizing custom DSTs is currently blocked on multiple questions involving | ||
the content and representation of complex metadata, such as `&dyn` vtables. | ||
|
||
In both of these cases the ability to declare custom DSTs with thin references | ||
is a minor footnote to the overall feature, and stabilization is blocked by | ||
issues unrelated to thin-pointer DSTs. | ||
|
||
The objective of this RFC is to extract custom thin-pointer DSTs into its own | ||
feature, which would hopefully be free of known issues and could be stabilized | ||
without significant changes to the compiler or ecosystem. | ||
|
||
[RFC 1861]: https://rust-lang.github.io/rfcs/1861-extern-types.html | ||
[RFC 2580]: https://rust-lang.github.io/rfcs/2580-ptr-meta.html | ||
|
||
[align_of_val]: https://doc.rust-lang.org/core/mem/fn.align_of_val.html | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
The unsafe trait `core::mem::DynSized` may be implemented for a `!Sized` type | ||
to configure how the size of a value is computed from a reference. References | ||
to a type that implements `DynSized` are not required to store the value size | ||
as pointer metadata. | ||
|
||
If a type that implements `DynSized` has no other associated pointer metadata | ||
(such as a vtable), then references to that type will have the same size and | ||
layout as a normal pointer. | ||
|
||
```rust | ||
#[repr(C, align(8))] | ||
struct Request { | ||
size: u32, | ||
id: u16, | ||
flags: u16, | ||
data: [u8], | ||
} | ||
|
||
unsafe impl core::mem::DynSized for Request { | ||
fn size_of_val(&self) -> usize { | ||
usize::try_from(self.size).unwrap_or(usize::MAX) | ||
} | ||
} | ||
|
||
// size_of::<&Request>() == size_of::<*const ()>() | ||
``` | ||
|
||
The `DynSized` trait has a single required method, `size_of_val()`, which | ||
has the same semantics as `core::mem::size_of_val()`. | ||
|
||
```rust | ||
// core::mem | ||
pub unsafe trait DynSized { | ||
// Returns the size of the pointed-to value in bytes. | ||
fn size_of_val(&self) -> usize; | ||
} | ||
``` | ||
|
||
It is an error to `impl DynSized` for a type that is `Sized`. In other words, | ||
the following code is invalid: | ||
|
||
```rust | ||
#[repr(C, align(8))] | ||
struct SizedRequest { | ||
size: u32, | ||
id: u16, | ||
flags: u16, | ||
data: [u8; 1024], | ||
} | ||
|
||
// Compiler error: `impl DynSized` on a type that isn't `!Sized`. | ||
unsafe impl core::mem::DynSized for SizedRequest { | ||
fn size_of_val(&self) -> usize { | ||
usize::try_from(self.size).unwrap_or(usize::MAX) | ||
} | ||
} | ||
``` | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
The `core::mem::DynSized` trait acts as a signal to the compiler that the | ||
size of a value can be computed dynamically by the user-provided trait | ||
implementation. If references to that type would otherwise be of the layout | ||
`(ptr, usize)` due to being `!Sized`, then they can be reduced to `ptr`. | ||
|
||
The `DynSized` trait does not _guarantee_ that a type will have thin pointers, | ||
it merely enables it. This definition is intended to be compatible with RFC | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If the fat pointer is
The intention of the wording here is that the optimization is only valid if the type's references are thick pointers due to being There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think this is well defined. It's not just an optimisation it will also affect things like FFI so it needs to be very clear when it is applied. What do you mean by "only valid if the type's references are thick pointers due to being !Sized" even There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm using "optimization" in the same sense as the niche optimization that lets The niche optimization is guaranteed for certain type combinations, but the exact rules are not fully specified and it's not guaranteed that any two types with human-discernable niches can be optimized.
Yes, hence Consider the following example program: use core::mem;
struct Request {
len_le: u32,
_data: [u8],
}
impl Request {
fn new<'a>(data: &'a [u8]) -> Option<&'a Request> {
if data.len() < mem::size_of::<u32>() { return None; }
let data_ptr = data.as_ptr();
if (data_ptr as usize) % mem::align_of::<u32>() != 0 { return None; }
let req: &'a Self = unsafe {
mem::transmute(core::slice::from_raw_parts(data_ptr, 0))
};
if data.len() != req.len() { return None; }
Some(req)
}
fn len(&self) -> usize {
usize::try_from(u32::from_le(self.len_le)).unwrap_or(usize::MAX)
}
fn as_bytes(&self) -> &[u8] {
let len = self.len();
unsafe {
let data_ptr = mem::transmute::<&Self, &[u8]>(self).as_ptr();
mem::transmute(core::slice::from_raw_parts(data_ptr, len))
}
}
}
fn main() {
#[repr(C, align(4))]
struct Aligned<const N: usize>([u8; N]);
let req_data = Aligned([9, 0, 0, 0, 1, 2, 3, 4, 5]);
let request = Request::new(&req_data.0[..]).unwrap();
println!("request.len(): {:?}", request.len());
println!("request.as_bytes(): {:?}", request.as_bytes());
} The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But there is a need for that extra usize, if I do |
||
2580, in that types with complex pointer metadata would continue to have fat | ||
pointers. Such types may choose to implement `DynSized` by extracting their | ||
custom pointer metadata from `&self`. | ||
|
||
Implementing `DynSized` does not affect alignment, so the questions of how to | ||
handle unknown alignments of RFC 1861 `extern type` DSTs do not apply. | ||
|
||
In current Rust, a DST used as a `struct` field must be the final field of the | ||
`struct`. This restriction remains unchanged, as the offsets of any fields after | ||
a DST would be impossible to compute statically. | ||
- This also implies that any given `struct` may have at most one field that | ||
implements `DynSized`. | ||
|
||
A `struct` with a field that implements `DynSized` will also implicitly | ||
implement `DynSized`. The implicit implementation of `DynSized` computes the | ||
size of the struct up until the `DynSized` field, and then adds the result of | ||
calling `DynSized::size_of_val()` on the final field. | ||
- This implies it's not permitted to manually `impl DynSize` for a type that | ||
contains a field that implements `DynSize`. | ||
Comment on lines
+161
to
+166
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately I think this is UB as written, consider There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe there's another trait that needs to exist to express the concept that the size of a value can be determined just from its pointer: trait SizeOfValFromPtr {
fn size_of_val_from_ptr(&self) -> usize;
}
impl<T: Sized> SizeOfValFromPtr<T> { ... }
impl<T: ?Sized + Pointee<Metadata = usize>> SizeOfValFromPtr<T> { ... } There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, I might be missing something, but how would a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The more I think about it, the more I think I need to look again at why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The lang team has expressed an unwillingness to add more ? bounds as they're weird and hard to teach. It's possible that decision might be changed with a good enough argument but I'd be surprised. |
||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
## Mutability of value sizes | ||
|
||
If the size of a value is stored in the value itself, then that implies it can | ||
change at runtime. | ||
|
||
```rust | ||
struct MutableSize { size: usize } | ||
unsafe impl core::mem::DynSized for MutableSize { | ||
fn size_of_val(&self) -> usize { self.size } | ||
} | ||
|
||
let mut v = MutableSize { size: 8 }; | ||
println!("{:?}", core::mem::size_of_val(&v)); // prints "8" | ||
v.size = 16; | ||
println!("{:?}", core::mem::size_of_val(&v)); // prints "16" | ||
``` | ||
|
||
There may be existing code that assumes `size_of_val()` is constant for a given | ||
value, which is true in today's Rust due to the nature of fat pointers, but | ||
would no longer be true if `size_of_val()` is truly dynamic. | ||
|
||
Alternatively, the API contract for `DynSized` implementations could require | ||
that the result of `size_of_val()` not change for the lifetime of the allocated | ||
object. This would likely be true for nearly all interesting use cases, and | ||
would let `DynSized` values be stored in a `Box`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing written here would prevent DynSized values being stored in Box (either via coercion or from_raw) so this does need to work or be banned somehow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After giving it some thought, Users that need to alloc+free a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In what situation could you change the size of an object? You'd have to be very careful to stay inside your allocation. In that situation it feels like you have a separate length and |
||
|
||
## Compatibility with existing fat-pointer DSTs | ||
|
||
It may be desirable for certain existing stabilized DSTs to implement | ||
`DynSized` -- for example, it is a natural fit for the planned redefinition of | ||
[`&core::ffi::CStr`][cstr] as a thin pointer. | ||
|
||
[cstr]: https://doc.rust-lang.org/core/ffi/struct.CStr.html | ||
|
||
Such a change to existing types might be backwards-incompatible for code that | ||
embeds those types as a `struct` field, because it would change the reference | ||
layout. For example, the following code compiles in stable Rust v1.73 but would | ||
be a compilation error if `&CStr` does not have the same layout as `&[u8]`. | ||
|
||
```rust | ||
struct ContainsCStr { | ||
cstr: core::ffi::CStr, | ||
} | ||
impl ContainsCStr { | ||
fn as_bytes(&self) -> &[u8] { | ||
unsafe { core::mem::transmute(self) } | ||
} | ||
} | ||
``` | ||
|
||
The above incompatibility of a redefined `&CStr` exists regardless of this RFC, | ||
but it's worth noting that implementing `DynSized` would be a backwards | ||
incompatible change for existing DSTs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is it backwards incompatibility specifically? I'm not sure we guarantee the layout of pointers to DSTs atm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might be misunderstanding the question, but the backwards-incompatibility is that the transmutability of DSTs is exposed to user code and depended on heavily by third-party libraries that use type punning for type-state stuff. It's guaranteed that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you point to the documentation that says that a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://doc.rust-lang.org/reference/type-layout.html#the-transparent-representation guarantees that https://doc.rust-lang.org/nomicon/exotic-sizes.html#dynamically-sized-types-dsts specifies that a slice reference is a pointer to the start of the slice plus an element count. A pointer to A potential alternative is to have a special attribute that must be paired with #[repr(dyn_sized)] // enables *and requires* a `DynSized` impl
struct Foo([u8]);
unsafe impl DynSized for Foo { ... } Changing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The docs so clearly say that you should not rely on the internal representation of Though that blurb does seem to contradict itself with the "(the |
||
|
||
# Rationale and alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you considered just having everything automatically implement In this world:
|
||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
This design is less generic than some of the alternatives (including custom DSTs | ||
and extern types), but has the advantage being much more tightly scoped and | ||
therefore is expected to have no major blockers. It directly addresses one of | ||
the pain points for use of Rust in a low-level performance-sensitive codebase, | ||
while avoiding large-scale language changes to the extent possible. | ||
|
||
Without this change, people will continue to either use thick-pointer DSTs | ||
(reducing performance relative to C), or write Rust types that claim to be | ||
`Sized` but actually aren't (the infamous `_data: [u8; 0]` hack). | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
The canonical prior art is the C language idiom of a `struct` that's implicitly | ||
followed by a dynamically-sized value. This idiom was standardized in C99 under | ||
the term "flexible array member": | ||
|
||
> As a special case, the last element of a structure with more than one named | ||
> member may have an incomplete array type; this is called a flexible array | ||
> member. [...] However, when a `.` (or `->`) operator has a left operand that | ||
> is (a pointer to) a structure with a flexible array member and the right | ||
> operand names that member, it behaves as if that member were replaced with the | ||
> longest array (with the same element type) that would not make the structure | ||
> larger than the object being accessed; | ||
|
||
The use of flexible array members (either with C99 syntax or not) is widespread | ||
in C APIs, especially when sending structured data between processes ([IPC]) or | ||
between a process and the kernel. For example, the Linux kernel's [FUSE] | ||
protocol communicates with userspace via length-prefixed dynamically-sized | ||
request/response buffers. | ||
|
||
They're also common when implementing low-level network protocols, which have | ||
length-delimited frames comprising a fixed-layout header followed by a variable | ||
amount of payload data. | ||
|
||
[IPC]: https://en.wikipedia.org/wiki/Inter-process_communication | ||
[FUSE]: https://www.kernel.org/doc/html/v6.3/filesystems/fuse.html | ||
|
||
C++ does not currently support flexible array members; proposals to add them | ||
(or similar functionality) include: | ||
- [Exploring classes of runtime size](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4025.pdf) | ||
- [Flexible Array Members for C++](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1039r0.html) | ||
|
||
In the context of Rust, the two RFCs mentioned earlier both cover thin-pointer | ||
DSTs as part of their more general extensions to the Rust type system: | ||
- [RFC 1861: `extern_types`](https://rust-lang.github.io/rfcs/1861-extern-types.html) | ||
- [RFC 2580: `ptr_metadata`](https://rust-lang.github.io/rfcs/2580-ptr-meta.html) | ||
|
||
Also, there have been non-approved RFC proposals involving thin-pointer DSTs: | ||
- [[rfcs/pull#709] truly unsized types](https://github.com/rust-lang/rfcs/pull/709) | ||
- [[rfcs/pull#1524] Custom Dynamically Sized Types](https://github.com/rust-lang/rfcs/pull/1524) | ||
- [[rfcs/pull#2255] More implicit bounds (?Sized, ?DynSized, ?Move)](https://github.com/rust-lang/rfcs/issues/2255) | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
None so far | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
None so far. Further exploration of opaque types and/or custom pointer metadata | ||
already has separate dedicated RFCs. This one is just to get an MVP for types | ||
that should be `!Sized` without fat pointers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? In your header example having a
data: [u8]
as a trailing member forces that data to not have any struct padding in it (because it's undef), I guess you could usedata: MaybeUninit<[u8]>
, but wouldn't it be simpler to allow this on any struct? Which I think would allow:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to allow code like that. The behavior of
size_of_val()
on anUnknownVariant
would be ambiguous.Restricting
DynSized
to be!Sized
means that there's no question about which definition of "value size" is being requested -- it's the one determinable from the reference.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to know how you'd expect people to write code like that then. Would you have to duplicate the fields of
Header
in all the variants?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect code something like this:
The
Request
struct is!Sized
, each of the variant structs areSized
-- in this example you basically have a taggedunion
where you've reserved the right to have future variants contain[u8]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I think that works, I suspect in most cases you'd want
data: [MaybeUninit<u8>]
to deal with padding, but I agree that works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to support something like this, maybe something like:
and treat it a bit like an enum with user defined layout, where the fat pointer metadata would encode which union field is active (with a "non selected" case for when you've yet to determine the variant). Implicit in that is that the dynamic size is header + size of selected variant field, rather than size of the overall union.
(Obviously today that would be treated as a normal struct with a union field, so you'd need some way to indicate this magic DST variant thing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding, but I don't see how that would work. Consider the case of an IPv4 packet -- the data isn't any particular format, it's just
[u8]
, and the packet header can be used to discover the data length.Your solution would imply a structure like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that was a digression that doesn't have much to do with this rfc.