Goal: Accept partial initialization + use of records created via such #54987

pnkfelix · 2018-10-11T09:53:42Z

Spawned off of #21232

In the long-term, we want to accept code like this:

struct S { x: u32, y: u32 }
fn main() { let mut s: S; s.x = 10; s.y = 30; a_use_of(&s); another_use_of(s); }
fn a_use_of(_s: &S) { /* ... */ }
fn another_use_of(_s: S) { /* ... */ }

We probably also want to start accepting this too:

struct S { x: u32, y: u32 }
fn main() { let mut s: S; s.x = 10; a_use_of_field(s.x); }
fn a_use_of_field(_x: u32) { /* ... */ }

(But that second example is more debatable. I don't think there's any debate about the first example.)

See #54986 for the short-term goal of rejecting all partial initializations until we can actually use the results you get from partial initialization.

leonardo-m · 2018-10-11T10:55:25Z

fn main() { let mut s: S; s.x = 10; a_use_of_field(s.x); }

If there's an invariant of the struct fields you can't use one struct field before all of them are initialized.

pnkfelix · 2018-10-11T21:11:20Z

I think @leonardo-m 's previous comment is meant to be an argument against accepting the code snippet they quoted.

Here are my main responses to that:

Usually if there's some sort of invariant of the fields in a struct, it is enforced via privacy. (I.e., by making the fields non-pub, and making sure your public API upholds the invariant of interest.) This feature has no effect on that methodology.
One big instance where struct invariants are super important is destructors: You don't want to get into a situation where a struct is only partially initialized and then its destructor gets called. Or rather, if the language allowed that, it would be basically impossible to write the code for a destructor. Rust already has a rule ensuring that you cannot move out of the fields of an ADT that implements Drop. This feature has no effect on that rule, either.

pnkfelix · 2018-10-16T13:55:00Z

tagging as NLL-deferred because we know we aren't going to try to implement this as part of the 2018 edition.

pnkfelix · 2018-12-21T14:56:11Z

@Centril has an in-progress draft of an RFC here: Centril/rfcs#16

pnkfelix · 2018-12-21T14:56:54Z

Re-triaging for #56754. P-low since any change here deserves a (currently "unwritten" or at least unposted) RFC.

jethrogb · 2019-05-01T22:16:21Z

There are some concerns about the current state of this in #60450

tmandry · 2019-07-27T03:23:27Z

How should we handle yielding from a generator while a local variable is partially initialized? e.g.

let mut gen = || {
  let mut x: (i32, i32);
  x.0 = 11;
  yield;
  x.1 = 22;
}

I'd prefer that this wasn't allowed when generators stabilize. It would simplify some layout code and potentially enable more optimizations. It's possible we decide to allow it eventually, but I'd rather not be locked into that decision from the start.

Note that I'm fine with allowing this in cases where the local is not stored in the generator (i.e. it is never used after the yield). This is only useful in the case of the second example from the issue description.

cc #60889

RalfJung · 2019-08-05T21:38:38Z

See #63230 (comment) for some discussion of this subject, and in particular the interaction with "uninhabited structs are ZST" (which seems trivial to do if we entirely rule out partial initialization).

So, not allowing partial initialization is beneficial not just for generators, but also for type layout in general.

@Centril

…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung

@Centril

…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung

@Centril

…ized, r=Centril Make use of possibly uninitialized data [E0381] a hard error This is one of the behaviors we no longer allow in NLL. Since it can lead to undefined behavior, I think it's definitely worth making it a hard error without waiting to turn off migration mode (rust-lang#58781). Closes rust-lang#60450. My ulterior motive here is making it impossible to leave variables partially initialized across a yield (see rust-lang#60889, discussion at rust-lang#63035), so tests are included for that. cc rust-lang#54987 --- I'm not sure if bypassing the buffer is a good way of doing this. We could also make a `force_errors_buffer` or similar that gets recombined with all the errors as they are emitted. But this is simpler and seems fine to me. r? @Centril cc @cramertj @nikomatsakis @pnkfelix @RalfJung

hanna-kruppe · 2019-08-07T14:49:25Z

As @RalfJung himself noted in the linked discussion, the proposal seems at least very difficult for anything that works with "generic MIR + substs" rather than monomorphizing, chiefly miri. One could imagine performing the equivalent work every time a function is entered, but the cost is likely considerable.

A similar but even more serious problem I see is about unsafe code that performs partial or piecewise initalization manually. For example, adapting the usual (String, !) example (using rust-lang/rfcs#2582 syntax to avoid UB):

fn construct_pair_inplace(
    out: &mut MaybeUninit<(String, !)>,
    mk_string: impl FnOnce() -> String,
    mk_void: impl FnOnce() -> !,
) {
    let ptr = out.as_mut_ptr();
    ptr::write(&raw mut (*ptr).0, mk_string());
    ptr::write(&raw mut (*ptr).1, mk_void());
}

There's no writes to a "sibling of an uninhabited field" visible here, just taking pointers and offsetting them and passing them to other functions (and if you want to consider ptr::write an intrinsic, just imagine it being hidden behind more function boundaries). Those pointers have to point to something large enough to hold the inhabited String, and considering everything else that may happen with the pointers (e.g., returning them up the stack slot, stashing them in a long-lived data structure) they can't just point to made-up local temporaries.

RalfJung · 2019-08-07T15:17:27Z

@rkruppe so under a proposal that makes (String, !) a ZST, that code would be UB. Do you think that is a problem on its own? I think I'd be fine with it.

I am not sure if "the proposal" you refer to is "make structs with uninhabited fields ZST", or "make structs with uninhabited fields ZST and still allow partial initialization in safe code".

hanna-kruppe · 2019-08-07T17:31:50Z

Yes, I think it's a problem, otherwise I wouldn't have brought it up. Well, not literally that one piece of code, more the general pattern of piecewise initalization via out pointer (whether MaybeUninit is involved or not). It's is an extremely reasonable thing to do, and it's especially attractive while we still don't have reliable NRVO/emplacement, which doesn't seem to change any time soon.

If that kind of code suddenly becomes UB as soon as any component of the overall aggregate being initialized is uninhabited, then that would be pretty uncool. Everyone writing such code would need to remember to add subtle additional checks¹ (even more subtle than the special casing already required for ZSTs in some code, as uninhabited types are even rarer and more obscure), and if they care about still executing the side effects of the initialization of the inhabited parts, they'll need a lot of extra complexity for that too. I don't think that'll work out, instead we'll probably get a pile of unsound code that will rarely get in contact with uninhabited types but blow up every time it does.

Keep in mind that it'll rarely be as obviously involving an uninhabited type as in the above (kind of minimal) example, more commonly it will involve generics or a large (possibly auto-generated) struct/enum. Essentially, "what if one of these types is uninhabited" will have to get a dedicated slot on unsafe code audit checklists.

Edit: ¹or rule out that any uninhabited types can be involved, but this seems difficult to ensure in most code for reasons outlined above, and in any case still adds an extra burden to those writing that code.

I am not sure if "the proposal" you refer to is "make structs with uninhabited fields ZST", or "make structs with uninhabited fields ZST and still allow partial initialization in safe code".

Sorry for being unclear, I meant the latter w.r.t. complexity imposed on miri (and other consumers of generic MIR).

RalfJung · 2019-08-07T20:55:59Z

Sorry for being unclear, I meant the latter w.r.t. complexity imposed on miri (and other consumers of generic MIR).

Now I am confused because the rest of your text applies equally to the former.^^

I think for unsafe code it is okay to put the burden on that code to show that it is not initializing something uninhabited. But maybe I am underestimating the complexity of that.

Looking at the generalized version of your example:

fn construct_pair_inplace<T, U>(
    out: &mut MaybeUninit<(T, U)>,
    mk0: impl FnOnce() -> T,
    mk1: impl FnOnce() -> U,
) {
    let ptr = out.as_mut_ptr();
    ptr::write(&raw mut (*ptr).0, mk0());
    ptr::write(&raw mut (*ptr).1, mk1());
}

I think saying this is UB if either T or U are uninhabited is fine. If we don't care where inside this function the UB arises, the only difference from today is that right now this is not UB if U = ! and mk0 diverges or panics.

hanna-kruppe · 2019-08-08T07:56:20Z

Now I am confused because the rest of your text applies equally to the former.^^

Ugh, sorry for all the confusion. I was trying to say "the problem for miri applies only to one of these proposals, the other one regarding unsafe code applies to everything", but code patterns like let x: (String, !) = (..., panic!()); are allowed anyway and generate exactly the kind of MIR that's problematic for miri. So scratch that, my position is both of these problems apply to any proposal that makes uninhabited aggregates ZSTs.

I think saying this is UB if either T or U are uninhabited is fine. If we don't care where inside this function the UB arises, the only difference from today is that right now this is not UB if U = ! and mk0 diverges or panics.

(I have misgivings about using this very simplified, if generic, example as running example, but let's go with it for now.)

One question to answer is, how can someone rewrite that function to be sound? There's a few ways I can think of, but none of them are good. One could simply bail out if the overall type is a ZST:

if size_of_val(out) == 0 {
    return;
}

but this omits side effects from mk0 and mk1, and also affects inhabited ZSTs. The latter can be fixed with a new intrinsic, I think, but even then it makes the function actually return when it really shouldn't (because logically it's trying to initialize something that can't be initialized). To do better, we'll have to proceed and check field-by-field, like so:

if is_uninhabited {
    mk0(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk0());
}
if is_uninhabited {
    mk1(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk1());
}
// ... and so on for all other fields

(I can imagine a variant that doesn't try to preserve all side effects and just preserves divergence behavior, but it'll still need an amount of code linear in the number of fields to figure out which part it needs to call to diverge, assuming the exact kind of divergence matters.)

This is... not great. And it's just what programmers would write if they knew to do that. As I said before, I think this is an obscure obligation that most programmers will not get right, or even think to try to handle. It's exactly the kind of pitfalls that makes it unreasonably difficult to write sound unsafe code IMO. I am really unsure whether being able to make uninhabited aggregates into ZSTs is worth this. Does anyone know of real instances of performance/memory problems caused by them not being ZSTs? (Keep in mind that only product types are affected, Result<T, !> can still be the same size as T regardless.)

RalfJung · 2019-08-08T09:08:45Z

code patterns like let x: (String, !) = (..., panic!());

That's a very good point! MIR construction would have to be very smart about this.

@tmandry but isn't this also a problem for generators? What happens with something like

let x: (String, !) = (String::new(), { yield; panic!() });

In MIR, this looks a lot like partial initialization. So won't it have all the same problems?

One question to answer is, how can someone rewrite that function to be sound?

So you are saying this function actually is sound under our current semantics? Wow, I think you are right. And it wouldn't be sound any more with new semantics using the following safe client:

let mut x = MaybeUninit::uninit();
construct_pair_inplace(&mut x,
    || 0i32,
    || panic!(),
);

I think you convinced me that we cannot make these types ZST.

hanna-kruppe · 2019-08-08T09:19:47Z

That's a very good point! MIR construction would have to be very smart about this.

Worse, since this too can be made generic, there's really nothing MIR construction can do about it (unless we give up on placement entirely and always evaluate subexpressions into isolated temporaries first, but that's unacceptable IMO). It has to be "fixed up" at monomorphization time, with all the problems that entails.

tmandry · 2019-08-09T00:59:01Z

@tmandry but isn't this also a problem for generators? What happens with something like
let x: (String, !) = (String::new(), { yield; panic!() });
In MIR, this looks a lot like partial initialization. So won't it have all the same problems?

Today it's not a problem, because of how we emit the MIR (we assign each piece to a temporary, and then build the aggregate all at once). I asked @eddyb if depending on this was too brittle, and they said yes, it might change in the future.

So it seems likely that #63230 was just a stopgap and we'll have to be smarter and/or more conservative in the future about the possibility of partial initialization in generators.

Aaron1011 · 2019-08-09T02:03:37Z

It has to be "fixed up" at monomorphization time, with all the problems that entails.

@rkruppe: Can you elaborate on that? The way I see it, this shouldn't be too complicated. Going back to your example user-written code:

if is_uninhabited {
    mk0(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk0());
}
if is_uninhabited {
    mk1(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk1());
}

this is just want we'd want monomorphization/miri to do. Setting aside partial initialization for the moment, we know that the uninhabited type we're 'constructing' can never escape the function (as any code after the creationof uninhabited type is unreachable).

For example, when monomorphizing or interpreting:

let x: (String, !) = (String::new(), { yield; panic!() });

We can simply write the String to a temporary. This wouldn't require us to make any changes to the MIR itself - that is, we could choose to assign directly to the (possibly uninhabited field), and detect this when the concrete type is known.

While this would result in some additional complexity in codegen and Miri, it would both simply and optimize actual Rust code. 'Uninhabited types are zero size' is a much nicer rule than 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable).'

RalfJung · 2019-08-09T07:22:55Z

@tmandry

we'll have to be smarter and/or more conservative in the future about the possibility of partial initialization in generators.

Is there any place we are tracking this specifically for generators? Or is the issue here sufficient?

@Aaron1011

The way I see it, this shouldn't be too complicated.

What you are proposing sounds extremely complicated to me.^^

If that chain of ifs is your proposal for how the MIR would look like, one big issue with that is that it duplicates the initializer expression. Sure, here it is just mk0(), but it can be arbitrarily complicated, and we probably don't want to blow up the MIR by duplicating them.

(FWIW, the test could be just "if size of struct greater 0", because if this is a ZST struct then "copying" the result of the computation into the struct is redundant in any case. We don't need a new "is uninhabited" intrinsic.)

While this would result in some additional complexity in codegen and Miri,

Given that this affects MIR semantics, I think it would also result in additional complexity in any other MIR consumer -- borrowcheck and MIR optimizations, specifically. And making the life of these passes as easy as we can is basically the only reason MIR exists at all. So if they all become too complicated through things like this, MIR failed at its job.

'Uninhabited types are zero size' is a much nicer rule than 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable).'

'The size of a struct is the sum of the sizes of its fields and padding' is also a much nicer rule than 'The size of a struct is the sum of the sizes of its fields and padding, except under certain circumstances (despite still being partially initializable).'

I would not say 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable)' is a rule we currently have, so it seems silly to compare with that. There just is no connection between inhabitedness and size, end of story. So we are trading the 'size of a struct is the sum' rule against the 'uninhabited types are ZST' rule.

And note that even with your proposal 'uninhabited types are ZST' will likely not be true. For example, there are proposals to make &! intrinsically uninhabited so that we can just remove &self methods on ! (and more generally all uninhabited types) from the generated code -- but &! can never be a ZST, references are layout-compatible with pointers. So the rule would be more like 'many uninhabited types are ZST, but not all of them'.

tmandry · 2019-08-15T22:06:53Z

Is there any place we are tracking this specifically for generators? Or is the issue here sufficient?

Opened #63616.

This was referenced Oct 11, 2018

borrow-checker allows partial reinit of struct that has been moved away, but no use of it. #21232

Closed

NLL: Reject partial init of uninitialized record (struct or tuple) #54986

Closed

Havvy added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Oct 11, 2018

pnkfelix added the A-NLL Area: Non-lexical lifetimes (NLL) label Oct 11, 2018

Centril added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Oct 11, 2018

pnkfelix added the NLL-deferred label Oct 16, 2018

pnkfelix mentioned this issue Oct 17, 2018

reject partial init and reinit of uninitialized data #54941

Merged

This was referenced Nov 10, 2018

Enum variant types rust-lang/rfcs#2593

Closed

[Stabilization] Pin APIs #55766

Closed

hanna-kruppe mentioned this issue Nov 22, 2018

Unconditionally fully initialized variable errors with possibly uninitialized #56159

Closed

pnkfelix mentioned this issue Dec 19, 2018

Pattern errors are too imprecise and should be removed in favor of MIR borrowck #45600

Closed

pnkfelix added P-low Low priority and removed NLL-deferred labels Dec 21, 2018

matthewjasper added the NLL-complete Working towards the "valid code works" goal label Jan 19, 2019

Centril mentioned this issue May 1, 2019

assign to part of possibly uninitialized variable is flagged as UB (but only in 2018 edition) #60450

Closed

pnkfelix mentioned this issue May 22, 2019

Add FAQ for NLL migration #60809

Merged

This was referenced Jul 29, 2019

Use MaybeUninit to produce more correct layouts #63035

Merged

Make use of possibly uninitialized data [E0381] a hard error #63230

Merged

glaebhoerl mentioned this issue Aug 12, 2019

Validity of references: Memory-related properties rust-lang/unsafe-code-guidelines#77

Closed

Centril mentioned this issue Oct 21, 2019

Dereferencing raw pointer to invalid data? rust-lang/unsafe-code-guidelines#216

Closed

shepmaster mentioned this issue Oct 29, 2019

partially initialized structs in local variables, similar to partially moved structs, perhaps should be possible rust-lang/rfcs#448

Open

pnkfelix mentioned this issue Nov 21, 2019

Delay an is_local_ever_initialized call. #66537

Merged

RalfJung mentioned this issue Mar 18, 2022

The size of (T, Infallible) is not zero #93032

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goal: Accept partial initialization + use of records created via such #54987

Goal: Accept partial initialization + use of records created via such #54987

pnkfelix commented Oct 11, 2018 •

edited

Loading

leonardo-m commented Oct 11, 2018

pnkfelix commented Oct 11, 2018 •

edited

Loading

pnkfelix commented Oct 16, 2018

pnkfelix commented Dec 21, 2018

pnkfelix commented Dec 21, 2018

jethrogb commented May 1, 2019

tmandry commented Jul 27, 2019

RalfJung commented Aug 5, 2019

hanna-kruppe commented Aug 7, 2019 •

edited

Loading

RalfJung commented Aug 7, 2019

hanna-kruppe commented Aug 7, 2019 •

edited

Loading

RalfJung commented Aug 7, 2019

hanna-kruppe commented Aug 8, 2019

RalfJung commented Aug 8, 2019

hanna-kruppe commented Aug 8, 2019

tmandry commented Aug 9, 2019 •

edited

Loading

Aaron1011 commented Aug 9, 2019

RalfJung commented Aug 9, 2019 •

edited

Loading

tmandry commented Aug 15, 2019

Goal: Accept partial initialization + use of records created via such #54987

Goal: Accept partial initialization + use of records created via such #54987

Comments

pnkfelix commented Oct 11, 2018 • edited Loading

leonardo-m commented Oct 11, 2018

pnkfelix commented Oct 11, 2018 • edited Loading

pnkfelix commented Oct 16, 2018

pnkfelix commented Dec 21, 2018

pnkfelix commented Dec 21, 2018

jethrogb commented May 1, 2019

tmandry commented Jul 27, 2019

RalfJung commented Aug 5, 2019

hanna-kruppe commented Aug 7, 2019 • edited Loading

RalfJung commented Aug 7, 2019

hanna-kruppe commented Aug 7, 2019 • edited Loading

RalfJung commented Aug 7, 2019

hanna-kruppe commented Aug 8, 2019

RalfJung commented Aug 8, 2019

hanna-kruppe commented Aug 8, 2019

tmandry commented Aug 9, 2019 • edited Loading

Aaron1011 commented Aug 9, 2019

RalfJung commented Aug 9, 2019 • edited Loading

tmandry commented Aug 15, 2019

pnkfelix commented Oct 11, 2018 •

edited

Loading

pnkfelix commented Oct 11, 2018 •

edited

Loading

hanna-kruppe commented Aug 7, 2019 •

edited

Loading

hanna-kruppe commented Aug 7, 2019 •

edited

Loading

tmandry commented Aug 9, 2019 •

edited

Loading

RalfJung commented Aug 9, 2019 •

edited

Loading