Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce more fine-grained borrowed locals analysis for generators #112050

Closed
wants to merge 20 commits into from

Conversation

b-naber
Copy link
Contributor

@b-naber b-naber commented May 28, 2023

In MaybeBorrowedLocals we currently use occurrences of borrows and StorageDeads to determine which locals need to be kept live across suspension points. This is a fairly conservative analysis which can cause generator sizes to grow needlessly as in #96084. This PR introduces a more fine-grained approach to determine locals that need to be live due to outstanding references/pointers by performing a liveness analysis of refs/ptrs and a dependency graph between refs/ptr and the Locals they refer to.

Fixes #96084
Fixes #94067

@rustbot
Copy link
Collaborator

rustbot commented May 28, 2023

r? @davidtwco

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 28, 2023
@rustbot
Copy link
Collaborator

rustbot commented May 28, 2023

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@b-naber
Copy link
Contributor Author

b-naber commented May 28, 2023

r? @cjgillot maybe?

@rustbot rustbot assigned cjgillot and unassigned davidtwco May 28, 2023
@rust-log-analyzer

This comment has been minimized.

@b-naber
Copy link
Contributor Author

b-naber commented May 28, 2023

Forgot to properly handle projections when building the graph, marking this as a draft until I fix that.

@b-naber b-naber marked this pull request as draft May 28, 2023 23:27
Copy link
Contributor

@cjgillot cjgillot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @b-naber.

I'm not sure I understand how this works and how it is sound.
I left a few comments with my first questions.

compiler/rustc_mir_dataflow/src/impls/live_borrows.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_dataflow/src/impls/live_borrows.rs Outdated Show resolved Hide resolved
}
}

impl<'a, 'tcx> GenKillAnalysis<'tcx> for LiveBorrows<'a, 'tcx> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this reuse an existing DefUse analysis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any existing DefUse analysis to match this analysis.

compiler/rustc_mir_dataflow/src/impls/live_borrows.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_dataflow/src/impls/live_borrows.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_dataflow/src/impls/live_borrows.rs Outdated Show resolved Hide resolved
}
ty::Param(_) => {
self.has_params = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to recurse inside ADTs?


self.super_operand(value, location);
}
_ => self.super_terminator(terminator, location),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Asm blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to still implement this, will have to read more about asm blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still haven't implemented this. I'm not super familiar with inline asm, so I'd prefer to wait for confirmation on whether the rest of the implementation is fine before investing more time.

self.visit_operand(func, location);
for arg in args {
self.visit_operand(arg, location);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if arg contains a pointer, which is stored somewhere, and returned later?
For instance:

enum Stash<T> { A, B(*const T) }

impl<T> Stash<T> {
    fn stash(&mut self, x: *const T) -> *const T {
        match *self {
            Stash::A => { *self = Stash::B(x); std::mem::null::<T>() }
            Stash::B(y) => { *self = Stash::B(y); x }
        }
    }
}

async fn foo<T>(x: T, y: T) {
    let mut stash = Stash::A;
    stash.stash(&raw const x);
    let p = stash.stash(&raw const y); // `p` points to `x`.
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this case should be handled by the introduction of NodeKind::LocalWithRefs now.

@rust-log-analyzer

This comment has been minimized.

@b-naber
Copy link
Contributor Author

b-naber commented May 30, 2023

@cjgillot Thanks for the review. I've changed the implementation to address your unsafety concerns (though there are still some problems) and improved the documentation. Hope the idea behind this PR becomes clearer now.

@rust-log-analyzer

This comment has been minimized.

/// `TerminatorKind::Yield` if we pass in any references, pointers or composite values that
/// might correspond to references, pointers or exposed pointers (`NodeKind::LocalWithRef`s).
/// The rationale for this is that the return values of both of these terminators might themselves
/// contain any of the references or pointers passed as arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is explaining the algorithm that is implemented. That's not the most important part though. The BorrowDependencies struct and its associated logic first and foremost represents an analysis of a mir body. The documentation needs to describe what the results from that analysis are. In other words, this currently says something like "this analysis works by doing steps 1, 2, and 3." I'm looking for something that says, "if the result of this analysis is a graph that looks like G, then you can conclude that at location L in the body, thing X is true" (where this'll be something about liveness or whatever).

Copy link
Contributor Author

@b-naber b-naber Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BorrowDependencies is only responsible for constructing the dependency graph. I tried to elaborate in more detail what results the analysis as a whole produces in the doc comment. I still think that the comment for BorrowDependencies is fairly adequate given that it's a private type and only used in the live borrows analysis.

@cjgillot cjgillot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 11, 2023
@b-naber
Copy link
Contributor Author

b-naber commented Jun 29, 2023

Sorry for taking so long to make progress on this PR.

I've updated this to account for the following unsoundness scenarios:

  • exposed pointers (i.e. a cast of a raw pointer to usize)
    These count towards the liveness of the Local that is behind the raw pointer, e.g. for
    _5 = _4 as usize (where _4 = AddressOf(_3)), we keep _3 alive on any use site of _5.
    The same holds for any re-assignments of the variable containing the exposed pointer, e.g.
    for _6 = _5 or _6.0 = _5 we keep _3 alive for any use site of _6.
    Note: we cover Locals corresponding to exposed pointers or re-assignments of those exposed pointers
    under the concept of a LocalWithRefs. LocalWithRefs are Locals that might contain references or
    pointers and that need to be kept alive if they're behind a reference or pointer.
    E.g. for struct Foo<'a> {p: &'a Bar} and _4 = Foo { p: _3 }, where _3 = Ref(_2), any reference or
    pointer of _4 requires both _3 and 4 to be kept alive. LocalWithRefs are also used for Locals
    corresponding to destination places of function calls or yield statements, since they could refer to a
    reference, pointer or another LocalWithRefs that was returned from the function call or yield statement.
  • function calls and yield statements could allow for any Operand corresponding to a borrow or LocalWithRefs
    to be moved into the destination place.
    destination places count towards the liveness of the Local that could be behind any ref/ptr or LocalWithRefs
    corresponding to an Operand, e.g. for TerminatorKind::Call { func, args: [_3, _4], destination: _5, .. },
    where _3 has type Ref or RawPtr or is a LocalWithRefs we keep any Local which _3 could refer
    to live on any use site of _5.
  • function calls could move refs/ptrs or LocalWithRefs in arguments into Locals behind mutable references or shared references
    that allow for interior mutability.
    E.g. let Op1 and Op2 be Operands of a function call and suppose that at least one corresponds to either a mutable
    references or shared references that allow for interior mutability and the other is a shared reference or a LocalWithRefs.
    Let Locals2 be the set of Locals that need to be kept alive due to the borrows corresponding to Op2
    Then we need to start tracking all Locals in Locals2 for the borrow corresponding to Op1, since the Place
    corresponding to Op2 might be moved into Op1 in then call.

Are there other situations?

Something I've observed is that while this analysis does reduce the number of borrows we would need to store across a yield point, one thing that reduces the effectiveness of the analysis in shrinking generator sizes is that we take into account the Places in drop statements in both the MaybeLiveLocals analysis and also in LiveBorrows (because we haven't decided on exposing self through pointers in the destructor being UB). The problem is that we might have something like this:

struct Foo<T>(T, &mut *const T);
impl<T> Drop for Foo<T> {
    fn drop(&mut self) {
        *self.1 = &self.0 as *const T;
    }
}

Now in the new analysis we can infer when this could potentially happen (the dropped place would be LocalWithRefs). So we could in principle not gen these Locals in the borrowed locals analysis in certain situations. If we were to ignore these dropped Locals in the local liveness analysis (which I think we should be able to, a drop terminator is essentially a function call with a mutable borrow argument, so only the borrowed analysis should be responsible for this), we would have the potential to move Terminators in front of the yield points (if they're not live or borrowed after the yield point that is of course). Doing this would allow us to really reduce generator sizes, but I'm not sure whether we can actually do this. Are there other reasons why we couldn't move Terminators for Locals that aren't live or borrowed after a yield point in front of the yield point?

@b-naber b-naber marked this pull request as ready for review June 29, 2023 20:48
@b-naber b-naber added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 21, 2023
Copy link
Contributor

@cjgillot cjgillot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I have a lot of trouble understanding what this analysis does.

The implementation tries to do several thing at once, which makes the information difficult to follow. From what I grasp:

  1. you compute a graph Local -> Set of locals that roughly means "local's value may contain a pointer to one of those locals".
  2. from liveness analysis that takes into account this "points to" information, and considers that a local is live if one pointer to it is.

If that's what it does, could you formulate it so? Can (1) be a standalone analysis in its own file to be used by (2)?

You mix a lot of type-based reasoning with value-based reasoning. We are in runtime MIR territory, so I'd rather have only value-based reasoning. This will be more robust to handle exposed pointer addresses. Type-based reasoning should only be used to check if a type has interior mutability.

//! static move || {
//! let bar = Bar {}; // `Local1`
//! let bar2 = Bar {}; // `Local2`
//! let bar_ptr = &bar as *const Bar; // live locals: [`Local1`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just use the local's name in the list, that's easier to follow.

//! This module defines a liveness analysis for `Local`s that are live due to outstanding references
//! or raw pointers. We can, however, not solely rely on tracking references and pointers due to soundness
//! issues. Exposed raw pointers (i.e. those cast to `usize`) and function calls would make this simple
//! analysis unsound, so we have to handle them as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you start the description at a higher-level? = plain English with as few concepts as possible.
I've read the comment several times, and I'm not sure what this analysis is for.

//! The same holds for any re-assignments of the variable containing the exposed pointer, e.g.
//! for `_6 = _5` or `_6.0 = _5` we keep `_3` alive for any use site of `_6`.
//! Note: we cover `Local`s corresponding to exposed pointers or re-assignments of those exposed pointers
//! under the concept of a `LocalWithRefs`. `LocalWithRefs` are `Local`s that might contain references or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define the different concepts earlier, as they are not just useful for exposed pointers.

Comment on lines +112 to +116
//! * `NodeKind::Borrow` is used for `Local`s that correspond to borrows (`_5` in our example). We equate
//! re-borrows with the `Node` that corresponds to the original borrow.
//! * `NodeKind::LocalWithRefs` is used for `Local`s that aren't themselves refs/ptrs, but contain
//! `Local`s that correspond to refs/ptrs or other `Local`s with `Node`s of kind `NodeKind::LocalWithRef`s.
//! `LocalWithRefs` is also used for exposed pointers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can both concepts be merged? Ie have 2 kinds of locals:

  • locals that do not contain references/pointers to locals;
  • locals that may contain references/pointers to locals.

//! want to also treat them like `Local`s with `Node`s of kind `NodeKind::Borrow` as they ultimately
//! could also contain references or pointers that refer to other `Local`s. So we want a
//! path in the graph from a `NodeKind::LocalWithRef`s node to the `NodeKind::Local` nodes, whose borrows
//! they might contain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment.
If you have a borrowed borrow _8 = &_5, shouldn't it be kept live as long as _8 lives?

Comment on lines +587 to +591
// FIXME maybe cache those?
let mut visitor = MaybeContainsInteriorMutabilityVisitor::new(self.tcx);
ty.visit_with(&mut visitor);

visitor.has_interior_mut || visitor.reached_depth_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FIXME maybe cache those?
let mut visitor = MaybeContainsInteriorMutabilityVisitor::new(self.tcx);
ty.visit_with(&mut visitor);
visitor.has_interior_mut || visitor.reached_depth_limit
!ty.is_freeze(self.tcx, self.param_env)

#[instrument(skip(self), level = "debug")]
fn visit_statement(&mut self, statement: &Statement<'tcx>, location: Location) {
match &statement.kind {
StatementKind::Assign(assign) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a local is assigned twice? Do you create 2 edges?

//! `TerminatorKind::Yield`, the destination `Place` or resume place, resp., might contain
//! these references, pointers or `LocalWithRefs`, hence we have to be conservative
//! and keep the `destination` `Local` and `resume_arg` `Local` live.
//! ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this graph models.
In itself, what does an edge mean?
I'd have expected _5 -> _4 to mean _5's value contains a pointer to _4, but that does not fit with your description.

}

/// Uses the dependency graph to find all locals that we need to keep live for a given
/// `Node` (or more specically the `Local` corresponding to that `Node`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reformulate this without the Node, as your don't return one.

}

#[instrument(skip(self), level = "debug")]
fn visit_rvalue(&mut self, rvalue: &Rvalue<'tcx>, location: Location) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use visit_assign instead of using a current_local state.

@bors
Copy link
Contributor

bors commented Aug 9, 2023

☔ The latest upstream changes (presumably #88936) made this pull request unmergeable. Please resolve the merge conflicts.

@cjgillot cjgillot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 10, 2023
@Dylan-DPC
Copy link
Member

@b-naber any updates on this?

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 4, 2023
Stop considering moved-out locals when computing auto traits for generators

Addresses rust-lang#94067 (but does not fix it since drop-tracking-mir is unstable)

This PR, unlike rust-lang#110420 or rust-lang#112050, does not attempt to reduce the number of live locals across suspension points. It however ignores moved-out locals for trait purposes. So this PR solves the non-send issue, but not the size issue.

Suggested by `@RalfJung` in [rust-lang/unsafe-code-guidelines#188](rust-lang/unsafe-code-guidelines#188 (comment))

cc `@b-naber` who's working on this from a different perspective.

r? `@cjgillot`
@Dylan-DPC
Copy link
Member

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

@Dylan-DPC Dylan-DPC closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
8 participants