-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Avoid storing captured upvars in generators twice if possible #89213
Conversation
This comment has been minimized.
This comment has been minimized.
Looks like highfive didn't recognize the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the approach looks correct.
I suspect the cycle is being caused by the downcasts that are now being inserted, causing MIR type check to need more information. Likely the culprit is the general downcast typing code wanting to know more about the variant type, causing it to ask for the generator layout and thus the optimized MIR and so on.
It's possible we can special case this code to recognize accesses to the Unresumed variant and redirect to the list of upvars for that generator, which is available prior to calculating all of that.
An alternative is to work around the issue by keeping as much as possible the same. We would allow upvar field accesses in MIR to be generated and checked just as before, as just a field directly on the generator itself. In type layout descriptions multiple fields can have the same offset. So what we can do is remove the upvars when computing the prefix offsets of the layout, but keep them as fields in the type description of the generator struct. The offsets of the fields would be copied from their offsets in the Unresumed variant.
(EDIT: You may need to disregard some of my comments on the layout code below for this approach)
Both of these approaches can work – the second means some of the "frontend" changes can be reverted and this knowledge can be contained within the layout code, but it is sort of a dirty hack. It would also make debuginfo worse, because these fields would appear to always be valid when they aren't, so we would have to add knowledge of this approach back into debuginfo. So in some ways the first approach is preferable, though it could turn out to be a hack too.
@@ -1498,7 +1502,9 @@ impl<'tcx> LayoutCx<'tcx, TyCtxt<'tcx>> { | |||
let mut used_variants = BitSet::new_empty(info.variant_fields.len()); | |||
for assignment in &assignments { | |||
if let Assigned(idx) = assignment { | |||
used_variants.insert(*idx); | |||
if *idx != VariantIdx::new(0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Remove the variant after this loop rather than add another branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Comment that this is correct because the 0 variant is the one we would merge into?)
remap.entry(locals[saved_local]).or_insert(( | ||
tys[saved_local], | ||
unresumed_idx, | ||
local.as_usize() - 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the length of variant_fields[unresumed_idx]
to determine the field number.
@@ -1075,3 +1078,4 @@ mod misc; | |||
mod scope; | |||
|
|||
pub(crate) use expr::category::Category as ExprCategory; | |||
use rustc_target::abi::VariantIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not public, this should go at the top
// They are also included in the 0th (unresumed) variant | ||
// The promoted locals are placed directly after the upvars. | ||
// Because of this the rest of the code can handle upvars and promoted locals in a generic | ||
// way. | ||
let prefix_layouts = substs | ||
.as_generator() | ||
.prefix_tys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should no longer be here (also prefix_tys
as a method needs to be removed or renamed).
This is where we actually compute the field offsets of the prefix.
for upvar in 0..upvar_count { | ||
let local = GeneratorSavedLocal::new(upvar); | ||
ineligible_locals.remove(local); | ||
assignments[local] = Ineligible(Some(upvar as u32)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this should be needed now
for (idx, local) in ineligible_locals.iter().enumerate() { | ||
assignments[local] = Ineligible(Some(idx as u32)); | ||
// Skip over tag and upvars | ||
assignments[local] = Ineligible(Some((upvar_count + 1 + idx) as u32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
|
||
// Promoted fields - upvars and promoted locals | ||
let offsets_promoted = offsets; | ||
let inverse_memory_index_promoted = inverse_memory_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to decrease everything by 1
Don't remember why the code is like this, but anyway
|
||
// Outer fields - upvars and tag | ||
let after_tag = tag_index + 1; | ||
let offsets_outer: Vec<_> = vec![offsets[upvar_count].clone()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's just the tag
Thank you for your review! In the meantime, I was trying to experiment with the second approach (changing the prefix calculations, but keeping upvar fields). I sort of got it to work and I was able to progress to more complex test cases, where I found a problem which blocks progress on either of these approaches. Nevermind what layout approach we take, in I assumed that locals stored from upvars will just be located in Code// run-pass
#![feature(generators, generator_trait)]
use std::marker::Unpin;
use std::ops::{GeneratorState, Generator};
use std::pin::Pin;
struct W<T>(T);
// This impl isn't safe in general, but the generator used in this test is movable
// so it won't cause problems.
impl<T: Generator<(), Return = ()> + Unpin> Iterator for W<T> {
type Item = T::Yield;
fn next(&mut self) -> Option<Self::Item> {
match Pin::new(&mut self.0).resume(()) {
GeneratorState::Complete(..) => None,
GeneratorState::Yielded(v) => Some(v),
}
}
}
fn test() -> impl Generator<(), Return=(), Yield=u8> + Unpin {
|| {
for i in 1..6 {
yield i
}
}
}
fn main() {
let end = 11;
let closure_test = |start| {
move || {
for i in start..end {
yield i
}
}
};
assert!(W(test()).chain(W(closure_test(6))).eq(1..11));
} will produce this MIR: MIR
The two upvars ( At this point I stopped and decided to ask you whether this makes sense at all. There are some approaches that come to mind how this could be solved:
I think that this has to be resolved first, because any layout approaches won't work if I'm not able to reliably detect which locals should be treated specially. |
We load the upvars into locals in the generator MIR transform, which is also where you would need access to that information, so it should be a matter of threading that state through different places in that file. If you needed that kind of information in the layout code for some reason, you could add it to the GeneratorLayout struct. |
Are you sure that this is indeed performed in the MIR generator transform? I originally also thought so, but when I dump the MIR, I can already see the upvars stored into locals long before this transforms runs, e.g.:
and I couldn't find any code inside the generator MIR transform that would do this. If I understand it correcly, it's performed long before the generator transform, by code that lays out closures. But maybe I got it wrong. |
I thought I saw some code that did it in the generator transform, but looks like I remembered wrong. Then maybe we need to produce some artifact similar to |
So, it was a great detective hunt for me :D But I finally found the place, and it's not pretty (unless I'm missing something). Sadly it seems like the upvars are lowered in a place that does not have a lot of information regarding upvars. It happens here. In theory, I could do something like this with the // Avoid creating a temporary
ExprKind::VarRef { .. }
| ExprKind::UpvarRef { .. }
| ExprKind::PlaceTypeAscription { .. }
| ExprKind::ValueTypeAscription { .. } => {
debug_assert!(Category::of(&expr.kind) == Some(Category::Place));
let place = unpack!(block = this.as_place(block, expr));
let rvalue = Rvalue::Use(this.consume_by_copy_or_move(place));
if this.generator_kind.is_some() {
if let ExprKind::UpvarRef { .. } = &expr.kind {
let local = destination.local;
assert!(destination.projection.is_empty());
match place.projection[0] {
PlaceElem::Field(field, _) => {
println!("Storing upvar {:?} into {:?}", field.as_usize(), local);
// Store into Builder and then propagate to generator MIR transform
}
_ => panic!("Unexpected upvar field")
}
}
}
this.cfg.push_assign(block, source_info, destination, rvalue);
block.unit()
} but I'm not sure if it's much better than reconstructing the upvar references from an already lowered MIR in the generator transform. Also I found out that sometimes it's not struct B {
x: u32
}
struct A {
b: B,
c: u32
}
fn main() {
let a = A {
b: B {
x: 0
},
c: 1
};
let gen = move || {
drop(a.b.x);
yield;
};
println!("{}", a.c);
println!("{}", std::mem::size_of_val(&gen));
} With edition 2021 and precise captures, it generates a
With edition 2018, it generates a
When the captured location is simpler, e.g. fn main() {
let a = 0;
let gen = move || {
drop(a);
yield;
};
println!("{}", std::mem::size_of_val(&gen));
} it generates an Am I missing a simpler way? Maybe I could just ignore all of this and go back to extracting the upvar references from already lowered MIR in the generator transform. If an upvar would not have the simple |
So after poking around some, the place the locals actually get introduced is in It doesn't seem like we can thread that information through – instead what we can do is walk the MIR looking for locals assigned to |
I tried walking the MIR to collect upvar locals before, it should work for simple cases, but then there are also cases like this (in edition
or some other more complex projections, which made the detection more difficult. I also found some places where these upvar local stores were not in the first BB, but in some later one, so I may need to go through all BBs, but that shouldn't be an issue I suppose. But in general this could work. I'll try the MIR walk again and see what problems I run into. |
While writing the visitor, I hit a case that I didn't realized before is possible - generator where the code writes to an upvar: #![feature(generators, generator_trait)]
use std::ops::Generator;
use std::pin::Pin;
fn main() {
let mut a = 5;
let mut b = || {
yield;
a = 1;
};
} MIR
In this case, there is not even any read from Anyway, it got me thinking about my approach. In all my implementation attempts so far, it was really messy trying to special case locals that are assigned from an upvar through the generator transform and layout generation, since there are a lot of non-obvious assumptions in the code and this special casing was making it a bit painful. If I understand it correctly, upvars have a fixed location in the generator struct and they cannot be overlapped with anything (well they could be in theory, but not with the current implementation). The problem with the current situation is that these upvars are stored into locals that are then being included in the generator struct, which essentially doubles the upvar memory cost of the generator. I wonder if a simpler approach might be to simply detect which locals are coming from the upvars, and then instead of special casing these locals in the transform (and mainly layout!) code, we would just remove these locals and replace their usages with direct references to the generator fields:
Or, if such field references cannot be used everywhere and locals are really needed, we could just ignore the upvar locals w.r.t overlapping and memory storage, and re-generate them with a load of each generator upvar/field after every yield point.
Actually if we added "upvar reloading after yield", I suppose that we wouldn't even need to special case the upvar locals in any way. They wouldn't be used across yield, so they shouldn't be stored inside the generator even with the current implementation. Does that make sense? Could it work? |
@Kobzol I think that could work; it would fix some of the existing issues while leaving some future room for improvement on the table. (Namely, making it possible to overlap upvars with other stored locals.) |
dafd14e
to
0af63de
Compare
I tried the first strategy, to replace locals that are assigned from generator upvar fields by the upvar field accessed themselves. It works for simple cases, but it's probably too simplistic for more complex situations, in certain test cases it panics because some types don't match when I replace the locals. It also doesn't help with situations like these:
because it will just produce this:
which doesn't improve the overlapping situation. For that I would need to propagate the information that In general, it feels a bit backward that the previous code stores the upvar fields into locals and I'm then trying to revert this operation and lift the locals back into generator state field accesses. It would be much neater if the locals didn't exist in the first place and all MIR in the generator body would just reference the generator fields directly, but I'm not sure if that's possible/reasonable. |
@Kobzol any updates? |
I got a bit stuck with rewriting the locals, but I think there's still potential, I'll try to get to it in the near future. |
Agreed, maybe the right approach is to make the MIR building code ( |
@oli-obk You seem to be knowledgeable in this area; do you have any thoughts on this? The quick summary is that when We've attempted to fix this directly in the generator_transform MIR pass but it's proving quite difficult. If we could avoid these moves into locals it would help. I believe they always happen at the beginning of the function; it might help to move them closer to where they're used. Otherwise if we could have the upvars (which are projections from the generator struct) be used directly instead of being moved into a local first, that would definitely help. Is it feasible to change MIR building in this way? |
we're generally trying to avoid making mir building more complicated. I think it could be beneficial in general to avoid creating temporaries for upvars (and possibly even function arguments in general). I am not certain that async fn wait(a: u32) {
std::future::ready(()).await;
drop(a);
} I'm guessing we unconditionally turn all upvars into a local variable at the start and then reference that local variable. Thus the local variable lives across generator state changes. If we can just allow directly accessing the upvars, we need to make sure that we don't end up in a situation where the upvars get mutated in multiple places that thought they had a copy. Not sure that can happen, but we need to be aware of it and make sure it doesn't happen. maybe it could suffice to run the dest prop pass on generators, that could make the temporaries dead code and get them removed? It is buggy right now, but it's being fixed as we speak over in #96451 |
While I was working on this, I thought several times that if I just could run destprop on the generator function, it would solve a lot (maybe all) of the issues. The linked PR looks promising! After destprop gets "revived", we can try to use it here. |
@Kobzol any updates on this? |
Well, I was still waiting for a working dest prop, which seems to be in progress. But I suppose that using dest prop here will probably warrant a different approach, and my current experiments didn't seem to lead anywhere, they worked only for simple cases. So I think that this can be closed now and after Dest prop is working, I'll take another look. |
ok thanks. closing it then |
Hi!
I tried to address the first problem described in #62958, but I'm repeatedly running into dead-ends. So I wanted to share my code to ask if what I'm doing makes any sense and if not, what other directions could I try. I'll try to describe the current state, both to confirm if I understand it correctly and possibly to get potential reviewers to remember these things quicker.
Description of the current state
Upvars in generators (captured variables and parameters of
async
functions) are currently stored at the start of the generator struct. At the beginning of the generator function code (in the first basic block), these upvars are stored into locals. If these locals are accessed across anawait
point, they will be included inside the generator layout as new fields, even though they are already stored at the very beginning of the generator struct.For example, this generator has
12
bytes (4 bytes upvara
, 4 bytesReady
future + discriminant + padding, 4 bytes local referencinga
), even though it could only have8
, because thea
parameter could reuse the same memory slot.Even though this example may look contrived, this situation probably comes up quite often, since
Drop
arguments/upvars will be implicitly dropped at the end of the generator, which causes them to be duplicated even though it seems like they are not accessed at all:The current layout of generators looks like this:
Where promoted fields are locals that cannot be overlapped. The upvars and discriminant are stored as fields on the generator struct, while the promoted fields are just referenced by the individual variants.
Solution
I tried to take the upvar locals, and store them into the unresumed variant of the generator. That by itself seems like the right thing to do. However, then I had to modify the generator layout code and that's where issues started to crop up.
Layout modification 1
First I tried to generate the layout for the unresumed variant in such a way that it's fields (containing saved upvar locals) will point to the beginning of the generator struct. So with e.g. two 1-byte upvars, the layout looked like this:
This actually worked for simple cases, but when running more complicated tests form the
ui
suite, the codegen was ending with errors. It seemed like the offsets were referencing some invalid fields, the sizes of the individual upvar fields and the field offsets didn't match up, it was a mess. It looked like I broke some invariants in the layout code (or I had a different bug there).Question: Is this layout correct? Is it OK that some memory slots (the upvars) are both included in the fields of the struct AND they are also referenced by a variant, which itself is
StructKind::Prefixed
by the size of the upvars and the discriminant (which means that the variant touches memory slots "outside" of its range). Maybe the unresumed variant could somehow start at the beginning of the struct (conceptually)?Anyway, I thought that this approach is not correct, so I tried something else. This version can be found here.
Layout modification 2
In my second attempt, to avoid the "overlap" of the upvar references from the above approach, I tried to remove the upvars from the generator struct fields. Therefore the generator struct layout will only have the discriminant in its fields. This is a rather disruptive change, because I had to find all code that creates generator upvar accesses and change it so that it accesses the unresumed variant fields instead of the fields of the generator itself.
I'm not sure if this is the correct approach though. And after changing all of the places that I have found, I began to run into MIR cycle issues. At this point I realized that it would be better to open a PR than to try other "random" approaches without knowing if it's the correct thing to do.
The current code in this PR contains the second approach which turns all generator fields accesses into unresumed variant field accesses. As you can see, it's heavily WIP, I just wanted to ask if (and how) should I continue or if this doesn't lead anywhere :) I'll be glad for any feedback.
MIR cycle error that I'm getting
Related issue: #62958
r? @tmandry