Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Avoid storing captured upvars in generators twice if possible #89213

Closed
wants to merge 2 commits into from

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Sep 24, 2021

Hi!
I tried to address the first problem described in #62958, but I'm repeatedly running into dead-ends. So I wanted to share my code to ask if what I'm doing makes any sense and if not, what other directions could I try. I'll try to describe the current state, both to confirm if I understand it correctly and possibly to get potential reviewers to remember these things quicker.

Description of the current state

Upvars in generators (captured variables and parameters of async functions) are currently stored at the start of the generator struct. At the beginning of the generator function code (in the first basic block), these upvars are stored into locals. If these locals are accessed across an await point, they will be included inside the generator layout as new fields, even though they are already stored at the very beginning of the generator struct.

For example, this generator has 12 bytes (4 bytes upvar a, 4 bytes Ready future + discriminant + padding, 4 bytes local referencing a), even though it could only have 8, because the a parameter could reuse the same memory slot.

async fn wait(a: u32) {
    std::future::ready(()).await;
    drop(a);
}

Even though this example may look contrived, this situation probably comes up quite often, since Drop arguments/upvars will be implicitly dropped at the end of the generator, which causes them to be duplicated even though it seems like they are not accessed at all:

async fn wait(a: HasDrop) {
    std::future::ready(()).await;
} // Here `a` is dropped implicitly, therefore it's included twice in the generator struct

The current layout of generators looks like this:

Upvar #0
Upvar #1
...
Discriminant
Promoted field #0
Promoted field #1
...
Variants with overlapped fields

Where promoted fields are locals that cannot be overlapped. The upvars and discriminant are stored as fields on the generator struct, while the promoted fields are just referenced by the individual variants.

Solution

I tried to take the upvar locals, and store them into the unresumed variant of the generator. That by itself seems like the right thing to do. However, then I had to modify the generator layout code and that's where issues started to crop up.

Layout modification 1

First I tried to generate the layout for the unresumed variant in such a way that it's fields (containing saved upvar locals) will point to the beginning of the generator struct. So with e.g. two 1-byte upvars, the layout looked like this:

fields: [
offset: 0, // first upvar
offset: 1, // second upvar
offset: 2 // discriminant
],
variants: [
   variant 0 (unresumed): {
       fields: [
          offset: 0, // local referencing first upvar
          offset: 1 // local referencing second upvar
      ]
   }
]

This actually worked for simple cases, but when running more complicated tests form the ui suite, the codegen was ending with errors. It seemed like the offsets were referencing some invalid fields, the sizes of the individual upvar fields and the field offsets didn't match up, it was a mess. It looked like I broke some invariants in the layout code (or I had a different bug there).

Question: Is this layout correct? Is it OK that some memory slots (the upvars) are both included in the fields of the struct AND they are also referenced by a variant, which itself is StructKind::Prefixed by the size of the upvars and the discriminant (which means that the variant touches memory slots "outside" of its range). Maybe the unresumed variant could somehow start at the beginning of the struct (conceptually)?

Anyway, I thought that this approach is not correct, so I tried something else. This version can be found here.

Layout modification 2

In my second attempt, to avoid the "overlap" of the upvar references from the above approach, I tried to remove the upvars from the generator struct fields. Therefore the generator struct layout will only have the discriminant in its fields. This is a rather disruptive change, because I had to find all code that creates generator upvar accesses and change it so that it accesses the unresumed variant fields instead of the fields of the generator itself.

I'm not sure if this is the correct approach though. And after changing all of the places that I have found, I began to run into MIR cycle issues. At this point I realized that it would be better to open a PR than to try other "random" approaches without knowing if it's the correct thing to do.

The current code in this PR contains the second approach which turns all generator fields accesses into unresumed variant field accesses. As you can see, it's heavily WIP, I just wanted to ask if (and how) should I continue or if this doesn't lead anywhere :) I'll be glad for any feedback.

MIR cycle error that I'm getting
error[E0391]: cycle detected when borrow-checking `wait::{closure#0}`
  --> src/main.rs:18:23
   |
18 |   async fn wait(a: u32) {
   |  _______________________^
19 | |     std::future::ready(1).await;
20 | |     drop(a);
21 | | }
   | |_^
   |
note: ...which requires optimizing MIR for `wait::{closure#0}`...
  --> src/main.rs:18:23
   |
18 |   async fn wait(a: u32) {
   |  _______________________^
19 | |     std::future::ready(1).await;
20 | |     drop(a);
21 | | }
   | |_^
note: ...which requires elaborating drops for `wait::{closure#0}`...
  --> src/main.rs:18:23
   |
18 |   async fn wait(a: u32) {
   |  _______________________^
19 | |     std::future::ready(1).await;
20 | |     drop(a);
21 | | }
   | |_^
   = note: ...which again requires borrow-checking `wait::{closure#0}`, completing the cycle
note: cycle used when borrow-checking `wait`
  --> src/main.rs:18:1
   |
18 | async fn wait(a: u32) {
   | ^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0391`.
warning: `test-crate` (bin "test-crate") generated 1 warning
error: could not compile `test-crate` due to previous error; 1 warning emitted

Related issue: #62958

r? @tmandry

@rust-log-analyzer

This comment has been minimized.

@camelid
Copy link
Member

camelid commented Oct 31, 2021

Looks like highfive didn't recognize the r? in the PR description for some reason, so r? @tmandry

@camelid camelid added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 31, 2021
Copy link
Member

@tmandry tmandry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the approach looks correct.

I suspect the cycle is being caused by the downcasts that are now being inserted, causing MIR type check to need more information. Likely the culprit is the general downcast typing code wanting to know more about the variant type, causing it to ask for the generator layout and thus the optimized MIR and so on.

It's possible we can special case this code to recognize accesses to the Unresumed variant and redirect to the list of upvars for that generator, which is available prior to calculating all of that.

An alternative is to work around the issue by keeping as much as possible the same. We would allow upvar field accesses in MIR to be generated and checked just as before, as just a field directly on the generator itself. In type layout descriptions multiple fields can have the same offset. So what we can do is remove the upvars when computing the prefix offsets of the layout, but keep them as fields in the type description of the generator struct. The offsets of the fields would be copied from their offsets in the Unresumed variant.
(EDIT: You may need to disregard some of my comments on the layout code below for this approach)

Both of these approaches can work – the second means some of the "frontend" changes can be reverted and this knowledge can be contained within the layout code, but it is sort of a dirty hack. It would also make debuginfo worse, because these fields would appear to always be valid when they aren't, so we would have to add knowledge of this approach back into debuginfo. So in some ways the first approach is preferable, though it could turn out to be a hack too.

@@ -1498,7 +1502,9 @@ impl<'tcx> LayoutCx<'tcx, TyCtxt<'tcx>> {
let mut used_variants = BitSet::new_empty(info.variant_fields.len());
for assignment in &assignments {
if let Assigned(idx) = assignment {
used_variants.insert(*idx);
if *idx != VariantIdx::new(0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove the variant after this loop rather than add another branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Comment that this is correct because the 0 variant is the one we would merge into?)

remap.entry(locals[saved_local]).or_insert((
tys[saved_local],
unresumed_idx,
local.as_usize() - 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the length of variant_fields[unresumed_idx] to determine the field number.

@@ -1075,3 +1078,4 @@ mod misc;
mod scope;

pub(crate) use expr::category::Category as ExprCategory;
use rustc_target::abi::VariantIdx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not public, this should go at the top

// They are also included in the 0th (unresumed) variant
// The promoted locals are placed directly after the upvars.
// Because of this the rest of the code can handle upvars and promoted locals in a generic
// way.
let prefix_layouts = substs
.as_generator()
.prefix_tys()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should no longer be here (also prefix_tys as a method needs to be removed or renamed).

This is where we actually compute the field offsets of the prefix.

Comment on lines 1520 to 1524
for upvar in 0..upvar_count {
let local = GeneratorSavedLocal::new(upvar);
ineligible_locals.remove(local);
assignments[local] = Ineligible(Some(upvar as u32));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this should be needed now

for (idx, local) in ineligible_locals.iter().enumerate() {
assignments[local] = Ineligible(Some(idx as u32));
// Skip over tag and upvars
assignments[local] = Ineligible(Some((upvar_count + 1 + idx) as u32));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert


// Promoted fields - upvars and promoted locals
let offsets_promoted = offsets;
let inverse_memory_index_promoted = inverse_memory_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to decrease everything by 1

Don't remember why the code is like this, but anyway


// Outer fields - upvars and tag
let after_tag = tag_index + 1;
let offsets_outer: Vec<_> = vec![offsets[upvar_count].clone()];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's just the tag

@tmandry tmandry added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 2, 2021
@Kobzol
Copy link
Contributor Author

Kobzol commented Nov 2, 2021

Thank you for your review! In the meantime, I was trying to experiment with the second approach (changing the prefix calculations, but keeping upvar fields). I sort of got it to work and I was able to progress to more complex test cases, where I found a problem which blocks progress on either of these approaches.

Nevermind what layout approach we take, in rustc_mir_transform/src/generator.rs I need to be able to find what locals actually refer to upvars, so that I can recognize what locals can actually be "overlapped" with the upvars.

I assumed that locals stored from upvars will just be located in _3, _4, ..., 3 + upvar_count, but sadly that is not the case.
For example this code (extracted from a test):

Code
// run-pass

#![feature(generators, generator_trait)]

use std::marker::Unpin;
use std::ops::{GeneratorState, Generator};
use std::pin::Pin;

struct W<T>(T);

// This impl isn't safe in general, but the generator used in this test is movable
// so it won't cause problems.
impl<T: Generator<(), Return = ()> + Unpin> Iterator for W<T> {
    type Item = T::Yield;

    fn next(&mut self) -> Option<Self::Item> {
        match Pin::new(&mut self.0).resume(()) {
            GeneratorState::Complete(..) => None,
            GeneratorState::Yielded(v) => Some(v),
        }
    }
}

fn test() -> impl Generator<(), Return=(), Yield=u8> + Unpin {
    || {
        for i in 1..6 {
            yield i
        }
    }
}

fn main() {
    let end = 11;

    let closure_test = |start| {
        move || {
            for i in start..end {
                yield i
            }
        }
    };

    assert!(W(test()).chain(W(closure_test(6))).eq(1..11));
}

will produce this MIR:

MIR
// MIR for `main::{closure#0}::{closure#0}` after CheckPackedRef

fn main::{closure#0}::{closure#0}(_1: [generator@src/main.rs:36:9: 40:10], _2: ()) -> ()
yields u8
 {
    debug start => (_1.0: u8);           // in scope 0 at src/main.rs:35:25: 35:30
    debug end => (_1.1: u8);             // in scope 0 at src/main.rs:33:9: 33:12
    let mut _0: ();                      // return place in scope 0 at src/main.rs:36:17: 36:17
    let mut _3: std::ops::Range<u8>;     // in scope 0 at src/main.rs:37:22: 37:32
    let mut _4: std::ops::Range<u8>;     // in scope 0 at src/main.rs:37:22: 37:32
    let mut _5: u8;                      // in scope 0 at src/main.rs:37:22: 37:27
    let mut _6: u8;                      // in scope 0 at src/main.rs:37:29: 37:32
    let mut _7: std::ops::Range<u8>;     // in scope 0 at src/main.rs:37:22: 37:32
    let mut _8: ();                      // in scope 0 at src/main.rs:36:9: 40:10
    let _10: ();                         // in scope 0 at src/main.rs:37:22: 37:32
    let mut _11: std::option::Option<u8>; // in scope 0 at src/main.rs:37:22: 37:32
    let mut _12: &mut std::ops::Range<u8>; // in scope 0 at src/main.rs:37:22: 37:32
    let mut _13: &mut std::ops::Range<u8>; // in scope 0 at src/main.rs:37:22: 37:32
    let mut _14: isize;                  // in scope 0 at src/main.rs:37:17: 37:18
    let mut _16: u8;                     // in scope 0 at src/main.rs:37:17: 37:18
    let mut _17: !;                      // in scope 0 at src/main.rs:37:13: 39:14
    let _19: ();                         // in scope 0 at src/main.rs:38:17: 38:24
    let mut _20: u8;                     // in scope 0 at src/main.rs:38:23: 38:24
    scope 1 {
        debug iter => _7;                // in scope 1 at src/main.rs:37:22: 37:32
        let mut _9: u8;                  // in scope 1 at src/main.rs:37:22: 37:32
        scope 2 {
            debug __next => _9;          // in scope 2 at src/main.rs:37:22: 37:32
            let _15: u8;                 // in scope 2 at src/main.rs:37:17: 37:18
            let _18: u8;                 // in scope 2 at src/main.rs:37:17: 37:18
            scope 3 {
                debug val => _15;        // in scope 3 at src/main.rs:37:17: 37:18
            }
            scope 4 {
                debug i => _18;          // in scope 4 at src/main.rs:37:17: 37:18
            }
        }
    }

    bb0: {
        StorageLive(_3);                 // scope 0 at src/main.rs:37:22: 37:32
        StorageLive(_4);                 // scope 0 at src/main.rs:37:22: 37:32
        StorageLive(_5);                 // scope 0 at src/main.rs:37:22: 37:27
        _5 = (_1.0: u8);                 // scope 0 at src/main.rs:37:22: 37:27
        StorageLive(_6);                 // scope 0 at src/main.rs:37:29: 37:32
        _6 = (_1.1: u8);                 // scope 0 at src/main.rs:37:29: 37:32
        _4 = std::ops::Range::<u8> { start: move _5, end: move _6 }; // scope 0 at src/main.rs:37:22: 37:32
        StorageDead(_6);                 // scope 0 at src/main.rs:37:31: 37:32
        StorageDead(_5);                 // scope 0 at src/main.rs:37:31: 37:32
        _3 = <std::ops::Range<u8> as IntoIterator>::into_iter(move _4) -> [return: bb1, unwind: bb18]; // scope 0 at src/main.rs:37:22: 37:32
                                         // mir::Constant
                                         // + span: src/main.rs:37:22: 37:32
                                         // + literal: Const { ty: fn(std::ops::Range<u8>) -> <std::ops::Range<u8> as std::iter::IntoIterator>::IntoIter {<std::ops::Range<u8> as std::iter::IntoIterator>::into_iter}, val: Value(Scalar(<ZST>)) }
    }
...

The two upvars (_1.0 and _1.1) are stored into _5 and _6, not into _3 and _4, which completely breaks the assumptions that I have made so far. I tried to create a visitor that would visit the first basic block and try to detect locals that are assigned from fields of _1 (the generator state parameter) to find the "upvar locals". However, it turned out that in more complex cases (maybe because of disjoint closure captures?), the RHS of these assignments can be really complex.

At this point I stopped and decided to ask you whether this makes sense at all. There are some approaches that come to mind how this could be solved:

  • Somehow "mark" the corresponding locals in the MIR when they are generated by the code that assigns the captured upvars to locals. But maybe this information could disappear after MIR transforms? Or we could store some metadata into e.g. the generator info.
  • Is there some existing way of correctly finding if the local "comes from" an upvar? I think that writing an ad-hoc visitor might not be the best solution here.

I think that this has to be resolved first, because any layout approaches won't work if I'm not able to reliably detect which locals should be treated specially.

@tmandry
Copy link
Member

tmandry commented Nov 10, 2021

We load the upvars into locals in the generator MIR transform, which is also where you would need access to that information, so it should be a matter of threading that state through different places in that file. If you needed that kind of information in the layout code for some reason, you could add it to the GeneratorLayout struct.

@Kobzol
Copy link
Contributor Author

Kobzol commented Nov 11, 2021

Are you sure that this is indeed performed in the MIR generator transform? I originally also thought so, but when I dump the MIR, I can already see the upvars stored into locals long before this transforms runs, e.g.:

001-000-CheckPackedRef.after.mir

bb0: {
    StorageLive(_3);                 // scope 0 at src/main.rs:37:22: 37:32
    StorageLive(_4);                 // scope 0 at src/main.rs:37:22: 37:32
    StorageLive(_5);                 // scope 0 at src/main.rs:37:22: 37:27
    _5 = (_1.0: u8);                 // scope 0 at src/main.rs:37:22: 37:27
    StorageLive(_6);                 // scope 0 at src/main.rs:37:29: 37:32
    _6 = (_1.1: u8);                 // scope 0 at src/main.rs:37:29: 37:32
    _4 = std::ops::Range::<u8> { start: move _5, end: move _6 }; // scope 0 at src/main.rs:37:22: 37:32
    StorageDead(_6);                 // scope 0 at src/main.rs:37:31: 37:32
    StorageDead(_5);                 // scope 0 at src/main.rs:37:31: 37:32
    _3 = <std::ops::Range<u8> as IntoIterator>::into_iter(move _4) -> [return: bb1, unwind: bb18]; // scope 0 at src/main.rs:37:22: 37:32
                                     // mir::Constant
                                     // + span: src/main.rs:37:22: 37:32
                                     // + literal: Const { ty: fn(std::ops::Range<u8>) -> <std::ops::Range<u8> as std::iter::IntoIterator>::IntoIter {<std::ops::Range<u8> as std::iter::IntoIterator>::into_iter}, val: Value(Scalar(<ZST>)) }
}

and I couldn't find any code inside the generator MIR transform that would do this. If I understand it correcly, it's performed long before the generator transform, by code that lays out closures. But maybe I got it wrong.

@tmandry
Copy link
Member

tmandry commented Nov 17, 2021

I thought I saw some code that did it in the generator transform, but looks like I remembered wrong. Then maybe we need to produce some artifact similar to GeneratorLayout (but for upvar mapping) in whatever MIR transform does this, so we have all the info we need later on. (Or maybe it's much simpler than that and we just need to encapsulate the mapping logic in a method on closure types somewhere.. see e.g. upvar_tys.. seems pretty likely to be simple but I'm not sure.)

@Kobzol
Copy link
Contributor Author

Kobzol commented Nov 23, 2021

So, it was a great detective hunt for me :D But I finally found the place, and it's not pretty (unless I'm missing something). Sadly it seems like the upvars are lowered in a place that does not have a lot of information regarding upvars. It happens here.

In theory, I could do something like this with the UpvarRefs:

// Avoid creating a temporary
ExprKind::VarRef { .. }
| ExprKind::UpvarRef { .. }
| ExprKind::PlaceTypeAscription { .. }
| ExprKind::ValueTypeAscription { .. } => {
    debug_assert!(Category::of(&expr.kind) == Some(Category::Place));

    let place = unpack!(block = this.as_place(block, expr));
    let rvalue = Rvalue::Use(this.consume_by_copy_or_move(place));

    if this.generator_kind.is_some() {
        if let ExprKind::UpvarRef { .. } = &expr.kind {
            let local = destination.local;
            assert!(destination.projection.is_empty());
            match place.projection[0] {
                PlaceElem::Field(field, _) => {
                    println!("Storing upvar {:?} into {:?}", field.as_usize(), local);
                    // Store into Builder and then propagate to generator MIR transform
                }
                _ => panic!("Unexpected upvar field")
            }
        }
    }

    this.cfg.push_assign(block, source_info, destination, rvalue);
    block.unit()
}

but I'm not sure if it's much better than reconstructing the upvar references from an already lowered MIR in the generator transform.

Also I found out that sometimes it's not UpvarRef, but VarRef that is being stored into a local, which is a bit weird. For example here:

struct B {
    x: u32
}

struct A {
    b: B,
    c: u32
}

fn main() {
    let a = A {
        b: B {
            x: 0
        },
        c: 1
    };
    let gen = move || {
        drop(a.b.x);
        yield;
    };
    println!("{}", a.c);
    println!("{}", std::mem::size_of_val(&gen));
}

With edition 2021 and precise captures, it generates a VarRef with this MIR:

bb0: {
    StorageLive(_3);                 // scope 0 at src/main.rs:25:9: 25:20
    StorageLive(_4);                 // scope 0 at src/main.rs:25:14: 25:19
    _4 = (_1.0: u32);                // scope 0 at src/main.rs:25:14: 25:19
    _3 = std::mem::drop::<u32>(move _4) -> [return: bb1, unwind: bb4]; // scope 0 at src/main.rs:25:9: 25:20
                                     // mir::Constant
                                     // + span: src/main.rs:25:9: 25:13
                                     // + literal: Const { ty: fn(u32) {std::mem::drop::<u32>}, val: Value(Scalar(<ZST>)) }
}

With edition 2018, it generates a VarRef with this MIR:

 bb0: {
    StorageLive(_3);                 // scope 0 at src/main.rs:25:9: 25:20
    StorageLive(_4);                 // scope 0 at src/main.rs:25:14: 25:19
    _4 = (((_1.0: A).0: B).0: u32);  // scope 0 at src/main.rs:25:14: 25:19
    _3 = std::mem::drop::<u32>(move _4) -> [return: bb1, unwind: bb4]; // scope 0 at src/main.rs:25:9: 25:20
                                     // mir::Constant
                                     // + span: src/main.rs:25:9: 25:13
                                     // + literal: Const { ty: fn(u32) {std::mem::drop::<u32>}, val: Value(Scalar(<ZST>)) }
}

When the captured location is simpler, e.g.

fn main() {
    let a = 0;
    let gen = move || {
        drop(a);
        yield;
    };
    println!("{}", std::mem::size_of_val(&gen));
}

it generates an UpvarRef instead (even though the MIR is the same here as with precise captures).

Am I missing a simpler way? Maybe I could just ignore all of this and go back to extracting the upvar references from already lowered MIR in the generator transform. If an upvar would not have the simple _3 = _1.0 (_local = _1.upvar_id) format, I would just ignore it and not overlap it.

@tmandry
Copy link
Member

tmandry commented Nov 30, 2021

So after poking around some, the place the locals actually get introduced is in as_operand while building the MIR.

It doesn't seem like we can thread that information through – instead what we can do is walk the MIR looking for locals assigned to _1.(whatever) and mark those as overlapping with that upvar. We can be conservative (e.g. pessimize if the local is ever assigned again) since we don't want to introduce possible bugs and I think we only care about optimizing exactly what as_operand does for now.

@Kobzol
Copy link
Contributor Author

Kobzol commented Nov 30, 2021

I tried walking the MIR to collect upvar locals before, it should work for simple cases, but then there are also cases like this (in edition 2018):

_4 = (((_1.0: A).0: B).0: u32);

or some other more complex projections, which made the detection more difficult. I also found some places where these upvar local stores were not in the first BB, but in some later one, so I may need to go through all BBs, but that shouldn't be an issue I suppose.

But in general this could work. I'll try the MIR walk again and see what problems I run into.

@Kobzol
Copy link
Contributor Author

Kobzol commented Dec 23, 2021

While writing the visitor, I hit a case that I didn't realized before is possible - generator where the code writes to an upvar:

#![feature(generators, generator_trait)]

use std::ops::Generator;
use std::pin::Pin;

fn main() {
    let mut a = 5;
    let mut b = || {
        yield;
        a = 1;
    };
}
MIR
// MIR for `main::{closure#0}` before SimplifyCfg-early-opt

fn main::{closure#0}(_1: [generator@src/main.rs:8:17: 11:6], _2: ()) -> ()
yields ()
 {
    debug a => (*(_1.0: &mut i32));      // in scope 0 at src/main.rs:7:9: 7:14
    let mut _0: ();                      // return place in scope 0 at src/main.rs:8:20: 8:20
    let _3: ();                          // in scope 0 at src/main.rs:9:9: 9:14
    let mut _4: ();                      // in scope 0 at src/main.rs:9:9: 9:14

    bb0: {
        StorageLive(_3);                 // scope 0 at src/main.rs:9:9: 9:14
        StorageLive(_4);                 // scope 0 at src/main.rs:9:9: 9:14
        _4 = ();                         // scope 0 at src/main.rs:9:9: 9:14
        _3 = yield(move _4) -> [resume: bb1, drop: bb2]; // scope 0 at src/main.rs:9:9: 9:14
    }

    bb1: {
        StorageDead(_4);                 // scope 0 at src/main.rs:9:13: 9:14
        StorageDead(_3);                 // scope 0 at src/main.rs:9:14: 9:15
        (*(_1.0: &mut i32)) = const 1_i32; // scope 0 at src/main.rs:10:9: 10:14
        _0 = const ();                   // scope 0 at src/main.rs:8:20: 11:6
        return;                          // scope 0 at src/main.rs:11:6: 11:6
    }

    bb2: {
        StorageDead(_4);                 // scope 0 at src/main.rs:9:13: 9:14
        StorageDead(_3);                 // scope 0 at src/main.rs:9:14: 9:15
        generator_drop;                  // scope 0 at src/main.rs:8:17: 11:6
    }

    bb3 (cleanup): {
        resume;                          // scope 0 at src/main.rs:8:17: 11:6
    }
}

In this case, there is not even any read from _1.x, even though there is an upvar (this broke my assertions).
This complicates the analysis a bit, since we also have to consider the case that upvar locals are later invalidated by these writes (or maybe in the case of such write there will be no local generated?).

Anyway, it got me thinking about my approach. In all my implementation attempts so far, it was really messy trying to special case locals that are assigned from an upvar through the generator transform and layout generation, since there are a lot of non-obvious assumptions in the code and this special casing was making it a bit painful.

If I understand it correctly, upvars have a fixed location in the generator struct and they cannot be overlapped with anything (well they could be in theory, but not with the current implementation). The problem with the current situation is that these upvars are stored into locals that are then being included in the generator struct, which essentially doubles the upvar memory cost of the generator.

I wonder if a simpler approach might be to simply detect which locals are coming from the upvars, and then instead of special casing these locals in the transform (and mainly layout!) code, we would just remove these locals and replace their usages with direct references to the generator fields:

// before
_3 = _1.0; // here upvar is stored into a local, the local will be needlessly included in the generator
_4 = _3; // here the local is used

// after
_4 = _1.0; // we got rid of the local altogether

Or, if such field references cannot be used everywhere and locals are really needed, we could just ignore the upvar locals w.r.t overlapping and memory storage, and re-generate them with a load of each generator upvar/field after every yield point.

// before
_3 = _1.0;
_4 = _3;
yield;
_6 = _3;

// after
_3 = _1.0;  // this local will not be stored in the generator at all
_4 = _3;
yield;
_5 = _1.0; // create a new local that will reload the value of the upvar directly from the generator fields
_6 = _5;

Actually if we added "upvar reloading after yield", I suppose that we wouldn't even need to special case the upvar locals in any way. They wouldn't be used across yield, so they shouldn't be stored inside the generator even with the current implementation.

Does that make sense? Could it work?

@tmandry
Copy link
Member

tmandry commented Jan 5, 2022

@Kobzol I think that could work; it would fix some of the existing issues while leaving some future room for improvement on the table. (Namely, making it possible to overlap upvars with other stored locals.)

@Kobzol Kobzol force-pushed the generator-parameter-overlap branch from dafd14e to 0af63de Compare January 25, 2022 15:41
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 25, 2022
@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 25, 2022

I tried the first strategy, to replace locals that are assigned from generator upvar fields by the upvar field accessed themselves. It works for simple cases, but it's probably too simplistic for more complex situations, in certain test cases it panics because some types don't match when I replace the locals.

It also doesn't help with situations like these:

_3 = _1.0; // store upvar into a local
_4 = _3; // :(
// use _4

because it will just produce this:

nop;
_4 = _1.0;
// use _4

which doesn't improve the overlapping situation.

For that I would need to propagate the information that _3 is an upvar local into _4, which sounds like move propagation, which sounds scary. But maybe there is a simpler way of doing this?

In general, it feels a bit backward that the previous code stores the upvar fields into locals and I'm then trying to revert this operation and lift the locals back into generator state field accesses. It would be much neater if the locals didn't exist in the first place and all MIR in the generator body would just reference the generator fields directly, but I'm not sure if that's possible/reasonable.

@Dylan-DPC
Copy link
Member

@Kobzol any updates?

@Kobzol
Copy link
Contributor Author

Kobzol commented Mar 3, 2022

I got a bit stuck with rewriting the locals, but I think there's still potential, I'll try to get to it in the near future.

@tmandry
Copy link
Member

tmandry commented Mar 29, 2022

In general, it feels a bit backward that the previous code stores the upvar fields into locals and I'm then trying to revert this operation and lift the locals back into generator state field accesses. It would be much neater if the locals didn't exist in the first place and all MIR in the generator body would just reference the generator fields directly, but I'm not sure if that's possible/reasonable.

Agreed, maybe the right approach is to make the MIR building code (as_operand) smarter and avoid storing into locals in the first place, at least in simpler cases. I unfortunately don't have a lot of background on that code or why it's structured that way, so that will take some investigating.

@tmandry
Copy link
Member

tmandry commented Apr 26, 2022

Agreed, maybe the right approach is to make the MIR building code (as_operand) smarter and avoid storing into locals in the first place, at least in simpler cases. I unfortunately don't have a lot of background on that code or why it's structured that way, so that will take some investigating.

@oli-obk You seem to be knowledgeable in this area; do you have any thoughts on this?

The quick summary is that when as_operand moves upvars to locals in generators it makes it hard to optimize their layout, and we end up reserving space for two copies of anything that's held over an await point.

We've attempted to fix this directly in the generator_transform MIR pass but it's proving quite difficult. If we could avoid these moves into locals it would help. I believe they always happen at the beginning of the function; it might help to move them closer to where they're used. Otherwise if we could have the upvars (which are projections from the generator struct) be used directly instead of being moved into a local first, that would definitely help.

Is it feasible to change MIR building in this way?

@oli-obk oli-obk self-assigned this Apr 26, 2022
@oli-obk
Copy link
Contributor

oli-obk commented Apr 28, 2022

we're generally trying to avoid making mir building more complicated. I think it could be beneficial in general to avoid creating temporaries for upvars (and possibly even function arguments in general).

I am not certain that as_operand is the right place though. I think the problem must be at some site of a caller to as_operand. as_operand will be called for the a argument to drop in

async fn wait(a: u32) {
    std::future::ready(()).await;
    drop(a);
}

I'm guessing we unconditionally turn all upvars into a local variable at the start and then reference that local variable. Thus the local variable lives across generator state changes. If we can just allow directly accessing the upvars, we need to make sure that we don't end up in a situation where the upvars get mutated in multiple places that thought they had a copy. Not sure that can happen, but we need to be aware of it and make sure it doesn't happen.

maybe it could suffice to run the dest prop pass on generators, that could make the temporaries dead code and get them removed? It is buggy right now, but it's being fixed as we speak over in #96451

@Kobzol
Copy link
Contributor Author

Kobzol commented Apr 28, 2022

While I was working on this, I thought several times that if I just could run destprop on the generator function, it would solve a lot (maybe all) of the issues. The linked PR looks promising! After destprop gets "revived", we can try to use it here.

@Dylan-DPC
Copy link
Member

@Kobzol any updates on this?

@Kobzol
Copy link
Contributor Author

Kobzol commented Jul 8, 2022

Well, I was still waiting for a working dest prop, which seems to be in progress. But I suppose that using dest prop here will probably warrant a different approach, and my current experiments didn't seem to lead anywhere, they worked only for simple cases.

So I think that this can be closed now and after Dest prop is working, I'll take another look.

@Dylan-DPC
Copy link
Member

ok thanks. closing it then

@Dylan-DPC Dylan-DPC closed this Jul 8, 2022
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants