From f0b49f071df36d00de6e944e4d2185352a48f240 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 29 Oct 2022 16:57:38 +0200 Subject: [PATCH 01/20] add maybe-dangling RFC --- text/0000-maybe-dangling.md | 247 ++++++++++++++++++++++++++++++++++++ 1 file changed, 247 insertions(+) create mode 100644 text/0000-maybe-dangling.md diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md new file mode 100644 index 00000000000..c04a768e211 --- /dev/null +++ b/text/0000-maybe-dangling.md @@ -0,0 +1,247 @@ +# `maybe_dangling` + +- Feature Name: `maybe_dangling` +- Start Date: 2022-09-30 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Declare that references and `Box` inside a new `MaybeDangling` type do not need to satisfy any memory-dependent validity properties (such as `dereferenceable` and `noalias`). + +# Motivation +[motivation]: #motivation + +### Example 1 + +Sometimes one has to work with references or boxes that either are already deallocated, or might get deallocated too early. +This comes up particularly often with `ManuallyDrop`. +For example, the following code is UB at the time of writing this RFC: + +```rust= +fn id(x: T) -> T { x } + +fn unsound(x: Box) + let mut x = ManuallyDrop::new(x); + unsafe { x.drop() }; + id(x); // or `let y = x;` or `mem::forget(x);`. +} + +unsound(Box::new(42)); +``` +It is unsound because we are passing a dangling `ManuallyDrop>` to `id`. +In terms of invariants required by the language ("validity invariants"), `ManuallyDrop` is a regular `struct`, so all its fields have to be valid, but that means the `Box` needs to valid, so in particular it must point to allocated memory -- but when `id` is invoked, the `Box` has already been deallocated. +Given that `ManuallyDrop` is specifically designed to allow dropping the `Box` early, this is a big footgun (that people do [run into in practice](https://github.com/rust-lang/miri/issues/1508)). + +### Example 2 + +There exist more complex versions of this problem, relating to a subtle aspect of the (currently poorly documented) aliasing requirements of Rust: +when a reference is passed to a function as an argument (including nested in a struct), then that reference must remain live throughout the function. +(In LLVM terms: we are annotating that reference with `dereferenceable`, which means "dereferenceable for the entire duration of this function call"). In [issue #101983](https://github.com/rust-lang/rust/issues/101983), this leads to a bug in `scoped_thread`. +There we have a function that invokes a user-supplied `impl FnOnce` closure, roughly like this: +```rust= +// Not showing all the `'lifetime` tracking, the point is that +// this closure might live shorter than `thread`. +fn thread(control: ..., closure: impl FnOnce() + 'lifetime) { + closure(); + control.signal_done(); + // A lot of time can pass here. +} +``` +The closure has a non-`'static` lifetime, meaning clients can capture references to on-stack data. +The surrounding code ensure that `'lifetime` lasts at least until `signal_done` is triggered, which ensures that the closure never accesses dangling data. + +However, note that `thread` continues to run even after `signal_done`! Now consider what happens if the closure captures a reference of lifetime `'lifetime`: +- The type of `closure` is a struct (the implicit unnameable closure type) with a `&'lifetime mut T` field. + References passed to a function must be live for the entire duration of the call. +- The closure runs, `signal_done` runs. + Then -- potentially -- this thread gets scheduled away and the main thread runs, seeing the signal and returning to the user. + Now `'lifetime` ends and the memory the reference points to might be deallocated. +- Now we have UB! The reference that as passed to `thread` with the promise of remaining live for the entire duration of the function, actually got deallocated while the function still runs. Oops. + +### Example 3 + +As a third example, consider a type that wants to store a "pointer together with some data borrowed from that pointer", like the `owning_ref` crate. This will usually boil down to something like this: + +```rust= +unsafe trait StableDeref: Deref {} + +struct OwningRef> { + buffer: T, + ref_: NonNull, // conceptually borrows from `buffer`. +} +``` + +Such a type is unsound when `T` is `&mut U` or `Box` because those types are assumed by the compiler to be unique, so any time `OwningRef` is passed around, the compiler can assume that `buffer` is a unique pointer -- an assumption that this code breaks because `ref_` points to the same memory! + +### Goal of this RFC + +The goal of this RFC is to +- make the first example UB-free without code changes +- make the second example UB-free without needing to add `unsafe` code +- make it possible to define a type like the third example + +(Making the 2nd example UB-free without code changes would incur cost across the ecosystem, see the alternatives discussed below.) + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +To handle situations like this, Rust has a special type called `MaybeDangling

`: +references and boxes in `P` do *not* have to be dereferenceable or follow aliasing guarantees. +They still have to be non-null and aligned, and it has to at least be *possible* that there exists valid data behind that reference (i.e., `MaybeDangling<&!>` is still invalid), but the rules are relaxed when compared with just a plain `P`. +Also note that safe code can still generally assume that every `MaybeDangling

` it encounters is a valid `P`, but within unsafe code this makes it possible to store data of arbitrary type without making reference guarantees (this is similar to `ManuallyDrop`). + +The `ManuallyDrop` type internally wraps `T` in a `MaybeDangling`. + +This means that the first example is actually fine: +the dangling `Box` was passed inside a `ManuallyDrop`, so there is no UB. + +The 2nd example can be fixed by passing the closure in a `MaybeDangling`: +```rust= +// Argument is passed as `MaybeDangling` since we might actually keep +// it around after its lifetime ends (at which point the caller can +// start dropping memory it points to). +fn thread(control: ..., closure: MaybeDangling) { + closure.into_inner()(); + control.signal_done(); + // A lot of time can pass here. +} +``` + +The 3rd example can be fixed by storing the `buffer` inside a `MaybeDangling`, which disables its aliasing requirements: + +```rust= +struct OwningRef> { + buffer: MaybeDangling, + ref_: NonNull, // conceptually borrows from `buffer`. +} +``` + +As long as the `buffer` field is not used, the pointer stored in `ref_` will remain valid. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The standard library contains a type `MaybeDangling

` that is safely convertible with `P` (i.e., the safety invariant is the same), and that has all the same niches as `P`, but that does allow passing around dangling boxes and references within unsafe code. + +"Behavior considered undefined" is adjusted as follows: + +```diff +- * Breaking the pointer aliasing rules. `&mut T` and `&T` follow LLVM’s +- scoped noalias model, except if the &T contains an UnsafeCell. ++ * Breaking the pointer aliasing rules. `Box`, `&mut T` and `&T` follow LLVM’s ++ scoped noalias model, except for `UnsafeCell<_>` inside the `T`. ++ References must not be dangling while they are live, again except for ++ `UnsafeCell<_>` inside the `T`. (The exact liveness duration is not ++ specified, but it is certainly upper-bounded by the syntactic lifetime ++ assigned by the borrow checker. When a reference is passed to a function, ++ it is live at least as long as that function call.) All this also ++ applies when values of these types are passed in a field of a compund ++ type, except behind pointer indirections and when the pointers or ++ references are inside `MaybeDangling`. +[...] + * Producing an invalid value, even in private fields and locals. + "Producing" a value happens any time a value is assigned to or + read from a place, passed to a function/primitive operation or + returned from a function/primitive operation. The following + values are invalid (at their respective type): +[...] +- * A reference or Box that is dangling, unaligned, or points to an +- invalid value. ++ * A reference or `Box` that is unaligned or null, or whose pointee ++ type `T` is uninhabited. Furthermore, except when this value occurs ++ inside a `MaybeDangling`, if the reference/`Box` is dangling or points ++ to an invalid value, it is itself invalid. +``` + +*Note: this might seem to alter the aliasing rules compared to the current reference more than just by adding a `MaybeDangling` exception (specifically when it talks about the liveness duration of references), but really it just clarifies semnatics we have applied since Rust 1.0, and incorporates [#98017](https://github.com/rust-lang/rust/pull/98017).* + +Another way to think about this is: most types only have "by-value" requirements for their validity, i.e., they only require that the bit pattern be of a certain shape. +References and boxes are the sole exception, they also require some properties of the memory they point to (e.g., they need to be dereferenceable). +`MaybeDangling` is a way to "truncate" `T` to its by-value invariant, which changes nothing for most types, but means that references and boxes are allowed as long as their bit patterns are fine (aligned and non-null) and as long as there *conceivably could be* a state of memory that makes them valid (`T` is inhabited). + +codegen is adjusted as follows: + +- When computing LLVM attributes, we traverse through newtypes such that `Newtype<&mut i32>` is marked as `dereferenceable(4) noalias aligned(4)`. + When traversing below `MaybeDangling`, no memory-related attributes such as `dereferenceable` or `noalias` are emitted. Other value-related attributes such as `aligned` are still emitted. (Really this happens as part of computing the `ArgAttributes` in the function ABI, and that is the code that needs to be adjusted.) + +Miri is adjusted as follows: + +- During Stacked Borrows retagging, when recursively traversing the value to search for references and boxes to retag, we stop the traversal when encountering a `MaybeDangling`. + (Note that by default, Miri will not do any such recursion, and only retag bare references. + But that is not sound, given that we do emit `noalias` for newtyped references and boxes. + The `-Zmiri-retag-fields` flag makes retagging "peer into" compound types to retag all references it can find. + This flag needs to become the default to make Miri actually detect all UB in the LLVM IR we generate. This RFC says that that traversal stops at `MaybeDangling`.) + +# Drawbacks +[drawbacks]: #drawbacks + +- For users of `ManuallyDrop` that don't need this exceptions, we might miss optimizations if we start allowing example 1. +- We are accumulating quite a few of these marker types to control various aspect of Rust's validity and aliasing rules: + we already have `UnsafeCell` and `MaybeUninit`, and we are likely going to need a "mutable reference version" of `UnsafeCell` to properly treat self-referential types. + It's easy to get lost in this sea of types and mix up what exactly they are acting on and how. + In particular, it is easy to think that one should do `&mut MaybeDangling` (which is useless, it should be `MaybeDangling<&mut T>`) -- this type applies in the exact opposite way compared to `UnsafeCell` (where one uses `&UnsafeCell`, and `UnsafeCell<&T>` is useless). + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- The most obvious alternative is to declare `ManuallyDrop` to be that magic type with the memory model exception. + This has the disadvantage that one risks memory leaks when all one wants to do is pass around data of some `T` without upholding reference liveness. + For instance, the third example would have to remember to call `drop` on the `buffer`. +- The other alternative is to change the memory model such that the example code is fine as-is. + There are several variants of this: + - [Make all examples legal] All newtype wrappers behave the way `MaybeDangling` is specified in this RFC. + This means it is impossible to do zero-cost newtype-wrapping of references and boxes, which is against the Rust value of zero-cost abstractions. + It is also a non-compositional surprise for type semantics to be altered through a newtype wrapper. + - [Make examples 1+2 legal] Or we leave newtype wrappers untouched, but rule that boxes (and references) don't actually have to be dereferenceable. + This is just listed for completeness' sake, removing all those optimizations is unlikely to make our codegen folks happy. It is also insufficient for example 3, which is about aliasing, not dereferencability. + - [Make only the 2nd example legal] We could remove the part about references always being live for at least as long as the functions they are passed to. + This corresponds to replacing the LLVM `dereferenceable` attribute by a (planned by not yet implemented) `dereferenceable-on-entry`, which matches the semantics of references in C++. + But that does not solve the problem of the `MaybeUninit>` footgun, i.e., the first example. + (We would have to change the rules for `Box` for that, saying it does not need to be dereferenceable at all.) + Nor does it help the 3rd example. + Also this loses some very desirable optimizations, such as + ```rust + fn foo(x: &i32) -> i32 { + let val = *x; + bar(); + return val; // optimize to `*v`, avoid saving `val` across the call. + } + ``` + Under the adjusted rules, `x` could stop being live in the middle of the execution of `foo`, so it might not be live any more when the `return` is executed. + Therefore the compiler is not allowed to insert a new use of `x` there. +- We could more directly expose ways to manipulate the underlying LLVM attributes (`dereferenceable`, `noalias`) using by-value wrappers. + (When adjusting the pointee type, such as in `&UnsafeCell`, we already provide a bunch of fine-grained control.) + However there exist other backends, and LLVM attributes were designed for C/C++/Swift, not Rust. The author would argue that we should first think of the semantics we want, and then find ways to best express them in LLVM, not the other way around. + And while situations are conceivable where one wants to disable only `noalias` or only `dereferenceable`, it is unclear whether they are worth the extra complexity. + (On the pointee side, Rust used to have a `Unique` type, that still exists internally in the standard library, which was intended to provide `noalias` without any form of `dereferenceable`. It was deemed better to not expose this.) +- Instead of saying that all fields of all compound types still must abide by the aliasing rules, we could restrict this to fields of `repr(transparent)` types. + That would solve the 2nd and 3rd example without any code changes. + It would make it impossible to package up multiple references (in a struct with multiple reference-typed fields) in a way that their aliasing guarantees are still in full force. + Right now, we actually *do* emit `noalias` for the 2nd and 3rd example, so codegen of existing types would have to be changed under this alternative. + It would not help for the first example. +- Finally we could do nothing and declare all examples as intentional UB. + The 2nd and 3rd example could use `MaybeUninit` to pass around the closure / the buffer in a UB-free way. + That will however require `unsafe` code, and leaves `ManuallyDrop>` with its footgun (1st example). + +# Prior art +[prior-art]: #prior-art + +The author cannot think of prior art in other languages; the issue arises because of Rust's unique combination of strong safety guarantees with low-level types such as `ManuallyDrop` that manage memory allocation in a very precise way. + +Inside Rust, we do have precedent for wrapper types altering language semantics; most prominently, there are `UnsafeCell` and `MaybeUninit`. +Notice that `UnsafeCell` acts "behind references" while `MaybeDangling`, like `MaybeUninit`, acts "around references": `MaybeDangling<&T>` vs `&UnsafeCell`. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- What should the type be called? + `MaybeDangling` is somewhat misleading since the safety invariant still requires everything to be dereferenceable, only the validity invariant is relaxed. + This is a bit like `ManuallyDrop` which supports dropping via an `unsafe` function but its safety invariant says that the data is not dropped (so that it can implement `Deref` and `DerefMut` and a safe `into_inner`). + Furthermore, the type also allows maybe-aliasing references, not just maybe-dangling references. + +# Future possibilities +[future-possibilities]: #future-possibilities + +- None that the author can think of -- this arguably closes a gap in our ability to express and manipulate the aliasing guarantees of types that are being passed around. From d28f05243880a6ae3931c565d1e05fec0d7c4ac1 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 31 Oct 2022 11:13:16 +0100 Subject: [PATCH 02/20] fix markdown quirks --- text/0000-maybe-dangling.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index c04a768e211..b1e27173163 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -19,7 +19,7 @@ Sometimes one has to work with references or boxes that either are already deall This comes up particularly often with `ManuallyDrop`. For example, the following code is UB at the time of writing this RFC: -```rust= +```rust fn id(x: T) -> T { x } fn unsound(x: Box) @@ -40,7 +40,7 @@ There exist more complex versions of this problem, relating to a subtle aspect o when a reference is passed to a function as an argument (including nested in a struct), then that reference must remain live throughout the function. (In LLVM terms: we are annotating that reference with `dereferenceable`, which means "dereferenceable for the entire duration of this function call"). In [issue #101983](https://github.com/rust-lang/rust/issues/101983), this leads to a bug in `scoped_thread`. There we have a function that invokes a user-supplied `impl FnOnce` closure, roughly like this: -```rust= +```rust // Not showing all the `'lifetime` tracking, the point is that // this closure might live shorter than `thread`. fn thread(control: ..., closure: impl FnOnce() + 'lifetime) { @@ -64,7 +64,7 @@ However, note that `thread` continues to run even after `signal_done`! Now consi As a third example, consider a type that wants to store a "pointer together with some data borrowed from that pointer", like the `owning_ref` crate. This will usually boil down to something like this: -```rust= +```rust unsafe trait StableDeref: Deref {} struct OwningRef> { @@ -98,7 +98,7 @@ This means that the first example is actually fine: the dangling `Box` was passed inside a `ManuallyDrop`, so there is no UB. The 2nd example can be fixed by passing the closure in a `MaybeDangling`: -```rust= +```rust // Argument is passed as `MaybeDangling` since we might actually keep // it around after its lifetime ends (at which point the caller can // start dropping memory it points to). @@ -111,7 +111,7 @@ fn thread(control: ..., closure: MaybeDangling) { The 3rd example can be fixed by storing the `buffer` inside a `MaybeDangling`, which disables its aliasing requirements: -```rust= +```rust struct OwningRef> { buffer: MaybeDangling, ref_: NonNull, // conceptually borrows from `buffer`. From baf3d9ca6b9111570bdb9f26bc1a32cb99502ef3 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 31 Oct 2022 11:18:22 +0100 Subject: [PATCH 03/20] to Deref or not to Deref --- text/0000-maybe-dangling.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index b1e27173163..0ce577ca12a 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -240,6 +240,7 @@ Notice that `UnsafeCell` acts "behind references" while `MaybeDangling`, like `M `MaybeDangling` is somewhat misleading since the safety invariant still requires everything to be dereferenceable, only the validity invariant is relaxed. This is a bit like `ManuallyDrop` which supports dropping via an `unsafe` function but its safety invariant says that the data is not dropped (so that it can implement `Deref` and `DerefMut` and a safe `into_inner`). Furthermore, the type also allows maybe-aliasing references, not just maybe-dangling references. +- Should `MaybeDangling` implement `Deref` and `DerefMut` like `ManuallyDrop` does, or should accessing the inner data be more explicit since that is when the aliasing and validity requirements do come back in full force? # Future possibilities [future-possibilities]: #future-possibilities From c147c8a566378a909443807d04993157d4081e44 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 31 Oct 2022 11:32:21 +0100 Subject: [PATCH 04/20] rebase the reference diff --- text/0000-maybe-dangling.md | 34 ++++++++++++++++------------------ 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 0ce577ca12a..92aa8292a39 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -128,24 +128,22 @@ The standard library contains a type `MaybeDangling

` that is safely convertib "Behavior considered undefined" is adjusted as follows: ```diff -- * Breaking the pointer aliasing rules. `&mut T` and `&T` follow LLVM’s -- scoped noalias model, except if the &T contains an UnsafeCell. -+ * Breaking the pointer aliasing rules. `Box`, `&mut T` and `&T` follow LLVM’s -+ scoped noalias model, except for `UnsafeCell<_>` inside the `T`. -+ References must not be dangling while they are live, again except for -+ `UnsafeCell<_>` inside the `T`. (The exact liveness duration is not -+ specified, but it is certainly upper-bounded by the syntactic lifetime -+ assigned by the borrow checker. When a reference is passed to a function, -+ it is live at least as long as that function call.) All this also -+ applies when values of these types are passed in a field of a compund -+ type, except behind pointer indirections and when the pointers or -+ references are inside `MaybeDangling`. + * Breaking the [pointer aliasing rules]. `Box`, `&mut T` and `&T` follow LLVM’s + scoped noalias model, except if the `&T` contains an [`UnsafeCell`]. + References must not be dangling while they are live. (The exact liveness + duration is not specified, but it is certainly upper-bounded by the syntactic + lifetime assigned by the borrow checker. When a reference is passed to a + function, it is live at least as long as that function call, again except if + the `&T` contains an [`UnsafeCell`].) All this also applies when values of + these types are passed in a (nested) field of a compound type, but not behind +- pointer indirections. ++ pointer indirections and also not for values inside a `MaybeDangling<_>`. [...] - * Producing an invalid value, even in private fields and locals. - "Producing" a value happens any time a value is assigned to or - read from a place, passed to a function/primitive operation or - returned from a function/primitive operation. The following - values are invalid (at their respective type): + * Producing an invalid value, even in private fields and locals. + "Producing" a value happens any time a value is assigned to or + read from a place, passed to a function/primitive operation or + returned from a function/primitive operation. The following + values are invalid (at their respective type): [...] - * A reference or Box that is dangling, unaligned, or points to an - invalid value. @@ -155,7 +153,7 @@ The standard library contains a type `MaybeDangling

` that is safely convertib + to an invalid value, it is itself invalid. ``` -*Note: this might seem to alter the aliasing rules compared to the current reference more than just by adding a `MaybeDangling` exception (specifically when it talks about the liveness duration of references), but really it just clarifies semnatics we have applied since Rust 1.0, and incorporates [#98017](https://github.com/rust-lang/rust/pull/98017).* +*Note: this diff is based on [an updated version of the referece](https://github.com/rust-lang/reference/pull/1290).* Another way to think about this is: most types only have "by-value" requirements for their validity, i.e., they only require that the bit pattern be of a certain shape. References and boxes are the sole exception, they also require some properties of the memory they point to (e.g., they need to be dereferenceable). From 982b51d7134f2af1c4070663709afd5c027479e8 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 2 Nov 2022 09:08:10 +0100 Subject: [PATCH 05/20] trait interaction --- text/0000-maybe-dangling.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 92aa8292a39..d9efc64be2d 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -124,6 +124,7 @@ As long as the `buffer` field is not used, the pointer stored in `ref_` will rem [reference-level-explanation]: #reference-level-explanation The standard library contains a type `MaybeDangling

` that is safely convertible with `P` (i.e., the safety invariant is the same), and that has all the same niches as `P`, but that does allow passing around dangling boxes and references within unsafe code. +`MaybeDangling` propagates auto traits and has (at least) `derive(Copy, Clone, Debug)`. "Behavior considered undefined" is adjusted as follows: From 3c7735aebc04cb71233538876a2ba5d2507284dd Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 4 Nov 2022 18:33:13 +0100 Subject: [PATCH 06/20] fix typo Co-authored-by: teor --- text/0000-maybe-dangling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index d9efc64be2d..1cbe11faf8c 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -205,7 +205,7 @@ Miri is adjusted as follows: fn foo(x: &i32) -> i32 { let val = *x; bar(); - return val; // optimize to `*v`, avoid saving `val` across the call. + return val; // optimize to `*x`, avoid saving `val` across the call. } ``` Under the adjusted rules, `x` could stop being live in the middle of the execution of `foo`, so it might not be live any more when the `return` is executed. From f1064b0f6b43962e857278b088fd949bbc68ae0f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 6 Nov 2022 17:27:26 +0100 Subject: [PATCH 07/20] fix typo Co-authored-by: bl-ue <54780737+bl-ue@users.noreply.github.com> --- text/0000-maybe-dangling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 1cbe11faf8c..dc8f39cc5dc 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -154,7 +154,7 @@ The standard library contains a type `MaybeDangling

` that is safely convertib + to an invalid value, it is itself invalid. ``` -*Note: this diff is based on [an updated version of the referece](https://github.com/rust-lang/reference/pull/1290).* +*Note: this diff is based on [an updated version of the reference](https://github.com/rust-lang/reference/pull/1290).* Another way to think about this is: most types only have "by-value" requirements for their validity, i.e., they only require that the bit pattern be of a certain shape. References and boxes are the sole exception, they also require some properties of the memory they point to (e.g., they need to be dereferenceable). From 53d8d841df9db0e2b81f97ff4977e41aac13355d Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 2 Jan 2023 13:25:09 +0100 Subject: [PATCH 08/20] clarification --- text/0000-maybe-dangling.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index dc8f39cc5dc..641d13b92a8 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -89,6 +89,7 @@ The goal of this RFC is to To handle situations like this, Rust has a special type called `MaybeDangling

`: references and boxes in `P` do *not* have to be dereferenceable or follow aliasing guarantees. +This applies inside nested references/boxes inside `P` as well. They still have to be non-null and aligned, and it has to at least be *possible* that there exists valid data behind that reference (i.e., `MaybeDangling<&!>` is still invalid), but the rules are relaxed when compared with just a plain `P`. Also note that safe code can still generally assume that every `MaybeDangling

` it encounters is a valid `P`, but within unsafe code this makes it possible to store data of arbitrary type without making reference guarantees (this is similar to `ManuallyDrop`). From 481f2cafb4319de85685a61d3c297d1a316fac6e Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 19 Jul 2023 20:46:14 +0200 Subject: [PATCH 09/20] add some real-world examples --- text/0000-maybe-dangling.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 641d13b92a8..37b958aa84c 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -84,6 +84,10 @@ The goal of this RFC is to (Making the 2nd example UB-free without code changes would incur cost across the ecosystem, see the alternatives discussed below.) +The examples described above are far from artificial, here are some real-world crates that need `MaybeDangling` to ensure their soundness (some currently crudely work-around that problem with `MaybeUninit` but that is really not satisfying): +- [Yoke](https://github.com/unicode-org/icu4x/issues/3696) and [Yoke again](https://github.com/unicode-org/icu4x/issues/2095) (the first needs opting-out of `dereferenceable` for the yoke, the latter needs opting-out of `noalias` for both yoke and cart) +- [ouroboros](https://github.com/joshua-maros/ouroboros/issues/88) + # Guide-level explanation [guide-level-explanation]: #guide-level-explanation From f5c12b302a0605ece5c7511a13a55139f653142b Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 1 Aug 2023 12:16:33 +0200 Subject: [PATCH 10/20] add comparison with some related types --- text/0000-maybe-dangling.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 37b958aa84c..d17fcbf5901 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -178,6 +178,13 @@ Miri is adjusted as follows: The `-Zmiri-retag-fields` flag makes retagging "peer into" compound types to retag all references it can find. This flag needs to become the default to make Miri actually detect all UB in the LLVM IR we generate. This RFC says that that traversal stops at `MaybeDangling`.) +### Comparison with some other types that affect aliasing + +- `UnsafeCell`: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. `&UnsafeCell` is special. `UnsafeCell<&T>` (by-val, fully owned) is not special at all and basically like `&T`; `&mut UnsafeCell` is also not special. +- [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467): disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafeAliased` is special. `UnsafeAliased<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafeAliased` is also not special. +- `MaybeDangling`: disables aliasing and dereferencable *of all references (and boxes) directly inside it*, i.e. `MaybeDangling<&[mut] T>` is special. `&[mut] MaybeDangling` is not special at all and basically like `&[mut] T`. + + # Drawbacks [drawbacks]: #drawbacks From 4d2393184818215e69eec8cec315960c0c32f08b Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 24 Aug 2023 07:44:08 +0200 Subject: [PATCH 11/20] attempts at clarification --- text/0000-maybe-dangling.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index d17fcbf5901..ba5efba18c1 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -94,8 +94,10 @@ The examples described above are far from artificial, here are some real-world c To handle situations like this, Rust has a special type called `MaybeDangling

`: references and boxes in `P` do *not* have to be dereferenceable or follow aliasing guarantees. This applies inside nested references/boxes inside `P` as well. -They still have to be non-null and aligned, and it has to at least be *possible* that there exists valid data behind that reference (i.e., `MaybeDangling<&!>` is still invalid), but the rules are relaxed when compared with just a plain `P`. +They still have to be non-null and aligned, and it has to at least be *possible* that there exists valid data behind that reference (i.e., `MaybeDangling<&!>` is still invalid). Also note that safe code can still generally assume that every `MaybeDangling

` it encounters is a valid `P`, but within unsafe code this makes it possible to store data of arbitrary type without making reference guarantees (this is similar to `ManuallyDrop`). +In other words, `MaybeDangling

` is entirely like `P`, except that the rules that relate to the contents of memory that pointers in `P` point to (dereferencability and aliasing restrictions) are suspended when the pointers are not being actively used. +You can think of the `P` as being "suspended" or "inert". The `ManuallyDrop` type internally wraps `T` in a `MaybeDangling`. @@ -129,7 +131,7 @@ As long as the `buffer` field is not used, the pointer stored in `ref_` will rem [reference-level-explanation]: #reference-level-explanation The standard library contains a type `MaybeDangling

` that is safely convertible with `P` (i.e., the safety invariant is the same), and that has all the same niches as `P`, but that does allow passing around dangling boxes and references within unsafe code. -`MaybeDangling` propagates auto traits and has (at least) `derive(Copy, Clone, Debug)`. +`MaybeDangling

` propagates auto traits, drops the `P` when it is dropped, and has (at least) `derive(Copy, Clone, Debug)`. "Behavior considered undefined" is adjusted as follows: @@ -200,6 +202,7 @@ Miri is adjusted as follows: - The most obvious alternative is to declare `ManuallyDrop` to be that magic type with the memory model exception. This has the disadvantage that one risks memory leaks when all one wants to do is pass around data of some `T` without upholding reference liveness. For instance, the third example would have to remember to call `drop` on the `buffer`. + This alternative has the advantage that we avoid introducing another type, and it is future-compatible with factoring that aspect of `ManuallyDrop` into a dedicated type in the future. - The other alternative is to change the memory model such that the example code is fine as-is. There are several variants of this: - [Make all examples legal] All newtype wrappers behave the way `MaybeDangling` is specified in this RFC. @@ -248,10 +251,11 @@ Notice that `UnsafeCell` acts "behind references" while `MaybeDangling`, like `M [unresolved-questions]: #unresolved-questions - What should the type be called? - `MaybeDangling` is somewhat misleading since the safety invariant still requires everything to be dereferenceable, only the validity invariant is relaxed. + `MaybeDangling` is somewhat misleading since the safety invariant still requires everything to be dereferenceable, only the requirement of dereferencability and noalias is relaxed. This is a bit like `ManuallyDrop` which supports dropping via an `unsafe` function but its safety invariant says that the data is not dropped (so that it can implement `Deref` and `DerefMut` and a safe `into_inner`). Furthermore, the type also allows maybe-aliasing references, not just maybe-dangling references. -- Should `MaybeDangling` implement `Deref` and `DerefMut` like `ManuallyDrop` does, or should accessing the inner data be more explicit since that is when the aliasing and validity requirements do come back in full force? + Other possible names might be things like `InertPointers` or `SuspendedPointers`. +- Should `MaybeDangling` implement `Deref` and `DerefMut` like `ManuallyDrop` does, or should accessing the inner data be more explicit since that is when the aliasing and dereferencability requirements do come back in full force? # Future possibilities [future-possibilities]: #future-possibilities From 8f5f5bf51619d7327595361555fc52a4821b8861 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 28 Aug 2023 15:50:10 +0200 Subject: [PATCH 12/20] fix syntax error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Bartłomiej Maryńczak --- text/0000-maybe-dangling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index ba5efba18c1..9b703f9bcd3 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -22,7 +22,7 @@ For example, the following code is UB at the time of writing this RFC: ```rust fn id(x: T) -> T { x } -fn unsound(x: Box) +fn unsound(x: Box) { let mut x = ManuallyDrop::new(x); unsafe { x.drop() }; id(x); // or `let y = x;` or `mem::forget(x);`. From 2077313332dc260f1352eb870251a4bf55dfd634 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 12 Sep 2023 19:09:03 +0200 Subject: [PATCH 13/20] add possible alternative: attribute instead of type --- text/0000-maybe-dangling.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 9b703f9bcd3..fe2d92a236d 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -203,7 +203,12 @@ Miri is adjusted as follows: This has the disadvantage that one risks memory leaks when all one wants to do is pass around data of some `T` without upholding reference liveness. For instance, the third example would have to remember to call `drop` on the `buffer`. This alternative has the advantage that we avoid introducing another type, and it is future-compatible with factoring that aspect of `ManuallyDrop` into a dedicated type in the future. -- The other alternative is to change the memory model such that the example code is fine as-is. +- Another tempting alternative is to attach the special meaning not to a type, but an attribute. + We could have a `#[maybe_dangling]` attribute that can be attached to ADTs, such that references and `Box` inside that type are not required to be dereferenceable or non-aliasing as the type gets moved around. + This has the advantage that user can attach the attribute to their own type and directly access the fields, so e.g. `MyType` can have a `Box` field and all of the magic of `Box` is still available, + but the type can be moved around freely without worrying about aliasing. For the compiler and Miri implementation this would barely make a difference; + we would simply stop recursing into fields when encountering any type with that attribute (rather than only stopping when encountering the magic `MaybeDangling` type). +- Another alternative is to change the memory model such that the example code is fine as-is. There are several variants of this: - [Make all examples legal] All newtype wrappers behave the way `MaybeDangling` is specified in this RFC. This means it is impossible to do zero-cost newtype-wrapping of references and boxes, which is against the Rust value of zero-cost abstractions. From 572f23bc7d254c0701f66a09919da264ac7547b4 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 27 Sep 2023 20:18:37 +0200 Subject: [PATCH 14/20] fix wording --- text/0000-maybe-dangling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index fe2d92a236d..c93419ce3b1 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -256,7 +256,7 @@ Notice that `UnsafeCell` acts "behind references" while `MaybeDangling`, like `M [unresolved-questions]: #unresolved-questions - What should the type be called? - `MaybeDangling` is somewhat misleading since the safety invariant still requires everything to be dereferenceable, only the requirement of dereferencability and noalias is relaxed. + `MaybeDangling` is somewhat misleading since the *safety* invariant still requires everything to be dereferenceable, only the *validity* requirement of dereferenceability and noalias is relaxed. This is a bit like `ManuallyDrop` which supports dropping via an `unsafe` function but its safety invariant says that the data is not dropped (so that it can implement `Deref` and `DerefMut` and a safe `into_inner`). Furthermore, the type also allows maybe-aliasing references, not just maybe-dangling references. Other possible names might be things like `InertPointers` or `SuspendedPointers`. From c5a4988d4444be81818afb4001c83ae8b8723f5b Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 29 Sep 2023 16:54:32 +0200 Subject: [PATCH 15/20] future possibility: attribute / Box magic --- text/0000-maybe-dangling.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index c93419ce3b1..8cf06c53a2e 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -203,11 +203,6 @@ Miri is adjusted as follows: This has the disadvantage that one risks memory leaks when all one wants to do is pass around data of some `T` without upholding reference liveness. For instance, the third example would have to remember to call `drop` on the `buffer`. This alternative has the advantage that we avoid introducing another type, and it is future-compatible with factoring that aspect of `ManuallyDrop` into a dedicated type in the future. -- Another tempting alternative is to attach the special meaning not to a type, but an attribute. - We could have a `#[maybe_dangling]` attribute that can be attached to ADTs, such that references and `Box` inside that type are not required to be dereferenceable or non-aliasing as the type gets moved around. - This has the advantage that user can attach the attribute to their own type and directly access the fields, so e.g. `MyType` can have a `Box` field and all of the magic of `Box` is still available, - but the type can be moved around freely without worrying about aliasing. For the compiler and Miri implementation this would barely make a difference; - we would simply stop recursing into fields when encountering any type with that attribute (rather than only stopping when encountering the magic `MaybeDangling` type). - Another alternative is to change the memory model such that the example code is fine as-is. There are several variants of this: - [Make all examples legal] All newtype wrappers behave the way `MaybeDangling` is specified in this RFC. @@ -265,4 +260,6 @@ Notice that `UnsafeCell` acts "behind references" while `MaybeDangling`, like `M # Future possibilities [future-possibilities]: #future-possibilities -- None that the author can think of -- this arguably closes a gap in our ability to express and manipulate the aliasing guarantees of types that are being passed around. +- One issue with this proposal is the "yet another wrapper type" syndrome, which leads to lots of syntactic salt and also means one loses the special `Box` magic (such as moving out of fields). + This could be mitigated by either providing an attribute that attaches `MaybeDangling` semantics to an arbitrary type, or by making `Box` magic more widely available (`DerefMove`/`DerefPure`-style traits). + Both of these are largely orthogonal to `MaybeDangling` though, and we'd probably want the `MaybeDangling` type as the "canonical" type for this even if the attribute existed (e.g., for cases like example 2). From 64bf786d281bdbab3e27ec91c1058ea3d0fa3c24 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 7 Dec 2023 07:16:08 +0100 Subject: [PATCH 16/20] mention the maybe-dangling crate --- text/0000-maybe-dangling.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 8cf06c53a2e..8a7f09ba1e5 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -247,6 +247,9 @@ The author cannot think of prior art in other languages; the issue arises becaus Inside Rust, we do have precedent for wrapper types altering language semantics; most prominently, there are `UnsafeCell` and `MaybeUninit`. Notice that `UnsafeCell` acts "behind references" while `MaybeDangling`, like `MaybeUninit`, acts "around references": `MaybeDangling<&T>` vs `&UnsafeCell`. +There is a [crate](https://docs.rs/maybe-dangling) offering these semantics on stable Rust via `MaybeUninit`. +(This is not "prior" art, it was published after this RFC came out. "Related work" would be more apt. Alas, the RFC template forces this structure on us.) + # Unresolved questions [unresolved-questions]: #unresolved-questions From d6bfbe76ab79403f75ebc56624a5c3b3477db403 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 30 Apr 2024 16:41:19 +0200 Subject: [PATCH 17/20] UnsafeAliased -> UnsafePinned --- text/0000-maybe-dangling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-maybe-dangling.md b/text/0000-maybe-dangling.md index 8a7f09ba1e5..34677800204 100644 --- a/text/0000-maybe-dangling.md +++ b/text/0000-maybe-dangling.md @@ -183,7 +183,7 @@ Miri is adjusted as follows: ### Comparison with some other types that affect aliasing - `UnsafeCell`: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. `&UnsafeCell` is special. `UnsafeCell<&T>` (by-val, fully owned) is not special at all and basically like `&T`; `&mut UnsafeCell` is also not special. -- [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467): disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafeAliased` is special. `UnsafeAliased<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafeAliased` is also not special. +- [`UnsafePinned`](https://github.com/rust-lang/rfcs/pull/3467): disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafePinned` is special. `UnsafePinned<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafePinned` is also not special. - `MaybeDangling`: disables aliasing and dereferencable *of all references (and boxes) directly inside it*, i.e. `MaybeDangling<&[mut] T>` is special. `&[mut] MaybeDangling` is not special at all and basically like `&[mut] T`. From be7581cc6697e242b46f3e741a5191b9969d0ef2 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 9 May 2024 14:43:43 +0200 Subject: [PATCH 18/20] set RFC number --- text/{0000-maybe-dangling.md => 3336-maybe-dangling.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-maybe-dangling.md => 3336-maybe-dangling.md} (99%) diff --git a/text/0000-maybe-dangling.md b/text/3336-maybe-dangling.md similarity index 99% rename from text/0000-maybe-dangling.md rename to text/3336-maybe-dangling.md index 34677800204..ff0b2629d89 100644 --- a/text/0000-maybe-dangling.md +++ b/text/3336-maybe-dangling.md @@ -2,7 +2,7 @@ - Feature Name: `maybe_dangling` - Start Date: 2022-09-30 -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3336](https://github.com/rust-lang/rfcs/pull/3336) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From ee08c28c7524152612bdc357b0c66c91d6e15bac Mon Sep 17 00:00:00 2001 From: Travis Cross Date: Mon, 20 May 2024 07:57:26 +0000 Subject: [PATCH 19/20] Cleanup trailing whitespace --- text/3336-maybe-dangling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/3336-maybe-dangling.md b/text/3336-maybe-dangling.md index ff0b2629d89..b0c4766cc31 100644 --- a/text/3336-maybe-dangling.md +++ b/text/3336-maybe-dangling.md @@ -106,8 +106,8 @@ the dangling `Box` was passed inside a `ManuallyDrop`, so there is no UB. The 2nd example can be fixed by passing the closure in a `MaybeDangling`: ```rust -// Argument is passed as `MaybeDangling` since we might actually keep -// it around after its lifetime ends (at which point the caller can +// Argument is passed as `MaybeDangling` since we might actually keep +// it around after its lifetime ends (at which point the caller can // start dropping memory it points to). fn thread(control: ..., closure: MaybeDangling) { closure.into_inner()(); From d04f509aa28d9bc9e014e0adea7ab990d9be6fde Mon Sep 17 00:00:00 2001 From: Travis Cross Date: Mon, 20 May 2024 07:57:34 +0000 Subject: [PATCH 20/20] Prepare RFC 3336 to be merged The FCP for RFC 3336 has completed. Let's prepare it to be merged. --- text/3336-maybe-dangling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3336-maybe-dangling.md b/text/3336-maybe-dangling.md index b0c4766cc31..22f19787653 100644 --- a/text/3336-maybe-dangling.md +++ b/text/3336-maybe-dangling.md @@ -3,7 +3,7 @@ - Feature Name: `maybe_dangling` - Start Date: 2022-09-30 - RFC PR: [rust-lang/rfcs#3336](https://github.com/rust-lang/rfcs/pull/3336) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) +- Tracking Issue: [rust-lang/rust#118166](https://github.com/rust-lang/rust/issues/118166) # Summary [summary]: #summary