Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Unsafe Set Enum Discriminants #3727

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

jamesmunns
Copy link
Member

@jamesmunns jamesmunns commented Nov 8, 2024

Summary

This RFC proposes a way to write the discriminant of an enum when building it "from scratch". This introduces two new library components, an unsafe set_discriminant function, and a discriminant_of! macro, which can be used with any enum, regardless of repr or any other details.

This RFC is a follow-on to the offset_of_enum feature, tracked by rust-lang/rust#120141.

Rendered

@ehuss ehuss added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Nov 8, 2024
@jamesmunns

This comment was marked as resolved.

@ehuss ehuss added T-lang Relevant to the language team, which will review and decide on the RFC. T-opsem Relevant to the operational semantics team, which will review and decide on the RFC. labels Nov 8, 2024

When this function is called, it MUST be called AFTER fully initializing the variant of the enum completely, as in some cases it may be necessary to read back these values. This is discussed more in the next section.

Semantically, `set_discriminant` is specified to optionally write the discriminant (when necessary), and read-back the discriminant. If the read-back discriminant does not match the expected value, then the behavior is undefined.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung I would appreciate if you could let me know if I got this right! I don't totally understand the "mandatory readback", so if this could be stated better, please feel free to propose alternate wording.

Copy link
Contributor

@traviscross traviscross Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good also for this (or the following) section to be extended with more details about why this is the correct semantic. I think I see the case for and agree that setting a MaybeUninit<Option<&u8>> to Some with set_discriminant should represent an assertion that the relevant bits aren't zero at the point of that call, but it's natural to wonder about this, and so it'd be good to describe it in some detail.

Copy link
Member

@RalfJung RalfJung Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds about right, though I would use some of the terms a bit differently. Specifically, set_discriminant will never read or write anything but the discriminant. However, enum layout is unspecified, and the discriminant can overlap with arbitrary values of the body.

It may be worth spelling out exactly what the safety requirements are: when writing the discriminant of variant V, then all fields of V that overlap with the discriminant must be set to values that satisfy the language invariant of their type. In practice, we don't guarantee which fields overlap with the discriminant, so all fields must be properly initialized.

This can replace the discussion of "read back" etc, that is just an implementation detail. Likewise, you can keep the example setting None, but I would frame it differently: because None has no fields, the requirement is trivially met. So the instruction is not "responsible for also initializing the whole body of the variant", all it really has to ensure is that future reads of the discriminant behave correctly.

It'd be good also for this (or the following) section to be extended with more details about why this is the correct semantic.

The reason we introduced the readback is to make it so that

SetDiscriminant(place, variant_idx);
_x = GetDiscriminant(place);

can be optimized to

SetDiscriminant(place, variant_idx);
_x = variant_idx;

This seems like an obviously desirable transformation, so we must ensure that e.g. writing the Some discriminant can only return successfully if the next GetDiscriminant indeed would return Some.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung Would the optimization you have in mind be impeded if we merely say that any enum value which is not wrapped in MaybeUninit is valid? Ie, that initialization of the variant and fields may happen in any order, but that initialization must be complete by the time that MaybeUninit::assume_init is called? That would permit set_discriminant to not need to constrain the existing content of the enum since it would basically just be a fancy ptr::write, and it would make it easier to reason about for callers. I assume there's a reason that this simpler approach isn't viable, but I'm curious what that reason is.

Copy link
Member

@RalfJung RalfJung Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question. It seems you are proposing to just not do the optimization. But we think we should reserve the possibility to do that optimization, and then it follows immediately that we need the UB.

EDIT: Ah, I misread "impeded". Yes, it would be heavily impeded, the optimization just becomes impossible.

IMO it'd also be rather strange to say that setting the discriminant of Option<&T> to Some on an all-null value is just a NOP and permitted. The programmer asked us to do something nonsensical.

@jamesmunns
Copy link
Member Author

Addressed all review comments so far. Link to the diff of changes since the initial version: https://github.com/rust-lang/rfcs/pull/3727/files/5c74b691972d36d30e27f253e8e2673d9930316f..eb486ef529eea98b7d2d9b0ffcd2254ac16a4ea0

@coolreader18
Copy link

coolreader18 commented Nov 14, 2024

What's the current legality of initializing niche-optimized enums?

let mut x: Option<NonNull<u8>> = None;
(&raw mut x).cast::<NonNull<u8>>().write(NonNull::dangling()); // UB?

I suppose there are guarantees made about the layout of Option for FFI purposes, but this feels... sketchy (though miri says it's fine, including with more complex niches like char). That to say, unless these semantics are already defined, I feel like the section about calling set_discriminant on niche-optimized enums should be strengthened from "encouraged for explicitness" to "should"/"must". Maybe with an exception for Option/Result-like enums with a single niche at 0, since FFI-compatibility guarantees are already made for those, but I'd err on not allowing it if possible.

@matthieu-m
Copy link

I was thinking about the (possible) future alternative receivers for set_discriminant:

  • Having to call .as_mut_ptr() is not so bad, what is more annoying is the loss of guarantees. Yes my &mut MaybeUninit has given me a non-null & sufficiently-aligned pointer, thank you very much. This is the true boilerplate.
  • Having set_discriminant_mu and set_discriminant_nn (or whatever) feels... heavy. Remembering the specific names is going to be annoying.
  • A SetDiscriminant::set_discriminant trait or a set_discriminant<T: SealedDiscriminant>(t: T, ..) method muddle safety guarantees: it makes it harder, at the call site, to ensure that the expected receiver is used, and it makes it brittle with regard to refactorings. Change the "receiver" from NonNull<T> to *mut T? Make sure the pointer is non-null now!
  • Inherent methods on NonNull and MaybeUninit directly is another possibility. If just called set_discriminant they would have some of the disadvantages of the trait-based approach, but for MaybeUninit a write_discriminant inherent method could be more in keeping with the current write... API and the different name would prevent a maintenance mistake.

So, all in all, for now I see two alternatives I like:

  • Don't introduce any variant. Bit boilerplatey in safety guarantees, but very explicit, and explicit is not so bad for unsafe.
  • Only introduce a single variant: MaybeUninit::<T>::write_discriminant(&mut self, disc: Discriminant<T>). It's natural to have, and offers a lot of guarantees that *mut T doesn't -- non null, sufficiently aligned, exclusive access -- leaving only "correctly initialized" to fulfill.

Comment on lines +290 to +298
## Should we provide alternate forms of `set_discriminant`?

There was some discussion when writing this RFC what the type of the first argument to `set_discriminant` should be:

* `&mut MaybeUninit<T>`
* `NonNull<T>`
* `*mut T`

`*mut T` was chosen as the most general option, however it is likely desirable to accept other forms as well for convenience.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved questions are things that are intended to be required to be answered before stabilization. In this case, the other forms could always be added later, so it may be worth moving this to a future possibility instead.

That is, unless you mean for there to be an open question about whether we'd want the *mut T form at all, but I think @RalfJung sufficiently addressed that in the Zulip thread about how it would be inconsistent with our many other such functions for there to not be a *mut T form of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original draft, I had proposed using &mut MaybeUninit<T>, rather than *mut T. When I swapped that over I left this section, but refactored it. I probably could remove it, but there is still some open discussion about what API should there be on day one.

I'll likely remove this in the near future, as I am somewhat convinced that if there were to be only one interface, it should be on *mut T.

Comment on lines +243 to +251
## Specifically known types vs unknown types

This RFC does not invalidate any currently-accepted ways of initializing enums manually without these new functions. For example, enums with a [primitive representation], or niche represented enums with [discriminant elision], can be soundly created today.

When specific types with these qualities are used, it is not required, but still allowed, to use the `set_discriminant` function to set the discriminant.

However, when authoring generic or macro code, which may potentially accept types that do not have these qualities, it is necessary to call `set_discriminant` to fully initialize the enum in the general case.

For example, if a macro was used to populate an `Option<U>` value, and users could chose `U: u32` (which does not have a niche repr), OR `U: &u32` (which does have a niche repr), then the author of the macro should call `set_discriminant` to soundly initialize the `Option<U>` in all cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coolreader18 (replying in inline to point to the relevant part of the RFC where this is clarified)

What's the current legality of initializing niche-optimized enums?

let mut x: Option<NonNull<u8>> = None;
(&raw mut x).cast::<NonNull<u8>>().write(NonNull::dangling()); // UB?

For Option<NonNull<u8>> specifically, this is allowed and fully defined. For "null-pointer-optimized"/"discriminant-elision-guaranteed" types T, Option<T> is guaranteed to have the same size, alignment, and function call ABI as T, and the same representation for Some. NonNull is one such type.

The documentation explicitly calls out mem::transmute::<T, Option<T>> as being allowed, but (&raw mut x).cast::<T>().write(t); is effectively the same as x = std::mem::transmute::<T, Option<T>>(t); (with the exception of not dropping the old value of x, which is not relevant for Option<NonNull>), so ptr::writeing a t: T into an Option<T> will give Some(t) for "null-pointer-optimized" T.

(These guarantees also apply to some instantiations of Result; IIUC the guarantees currently do not formally extend to all "Option-like" enums, only to core::option::Option and core::result::Result specifically).


For general enums without this guarantee, or for Option<OtherTypes>, you do need some way to explicitly write the discriminant, even if it is functionally a NOP. For #[repr(Int and/or C)] enums, this can be done without additional language support because the layout and value of the discriminant/tag is guaranteed, but for repr(Rust) enums there is no explicit layout guarantee, so to be fully sound1 and general, set_discriminant is needed.

Footnotes

  1. you could argue that if size_of::<T>() == size_of::<MyEnum<T>>() where MyEnum has a variant ThatVariant { only_field: T }, then that enum must have an elided discriminant, and thus writing a T: t into it writes ThatVariant(t). This is probably sound logic, but since it is based on non-guaranteed behavior it could break if in a compiler update size_of::<T>() != size_of::<MyEnum<T>>() and IMO should not be relied upon unless guarantees are established.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the guarantee was extended to "Option-like" enums recently and there are some in progress documentation updates.

* The enum `T` and the selected variant's fields must be well-aligned for reading and writing
* This function is allowed to write and read-back both the discriminant and body (whether they each exist or not, and whether the discriminant and body are separate or not). This function also may do this in an unsynchronized manner (not requiring locks or atomic operations), which means exclusive access is required

If the `T` used for `*mut T` or `Discriminant<T>` when calling this function is NOT an enum, then this function is specified as a no-op. This is not undefined behavior, but later calls to `assume_init` or usage of the value `T` may be undefined behavior if the `T` was not properly initialized. This is also allowed (but not required) to cause a debug assertion failure to aid in testing (implementor's choice).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust does have other types that have discriminants, namely the generators that are created by lowering async. So we have to be a bit careful in how this is worded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-opsem Relevant to the operational semantics team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants