Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize MIR for comparison of references #112542

Closed
wants to merge 1 commit into from

Conversation

JulianKnodt
Copy link
Contributor

Addresses #111442 (comment)

This pass looks for taking a reference, deref-for-copy, then dereference. It then will directly stick a dst = *src after this, making the previous instructions redundant and able to be removed by other passes.

I'm not sure whether peephole optimizations are still being used/have they been deemed worth having, but it seemed to me to be doable.

I'm also not totally set on where to put this pass, as it should go before any unused instruction cleanup, but I haven't looked at the other passes to see where it fits.

@rustbot
Copy link
Collaborator

rustbot commented Jun 12, 2023

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 12, 2023
@rustbot
Copy link
Collaborator

rustbot commented Jun 12, 2023

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rust-log-analyzer

This comment has been minimized.

@cjgillot
Copy link
Contributor

Could you explain how you designed this pass, and what assumptions it makes on semantics of references?

This pass looks redundant with ReferencePropagation. What is the difference? What would be the upside of having 2 different passes?

You say you rely on unused instruction cleanup. Is this necessary, or can this pass cleanup after itself?
There is currently only one DSE pass, which runs fairly late.

@rust-log-analyzer

This comment has been minimized.

@JulianKnodt
Copy link
Contributor Author

This pass was designed directly as a peephole optimization to handle the case of comparing references of integers (altho I extended it here to primitives).

Specifically it looks exactly for

// _4 is a primitive without references. (there is not mut in any of the below references)
_2 = &(Shared | Shallow) _1
_3 = copy_for_deref(*2)
_4 = *_3

And upon finding this, it will then insert at the <<:

_2 = &(Shared | Shallow) _1
_3 = copy_for_deref(*2)
_4 = *_3
<< _4 = *_1

Where all the previous lines can then be dropped.
It assumes that locals are in ascending order (which may be a poor assumption), and will track the maximum local seen that it is operating on. It will terminate if there are no more locals that are able to be optimization below the maximum seen local.

As for the differences of this pass and ReferencePropagation, I've read thru ReferencePropagation a number of times (and I still fail to understand all of it), but as far as I can tell they are somewhat similar, but according to the top doc comment ultimately RefProp seeks to optimize when the final assignment is a reference, where for this the final assignment is a dereference. RefProp is also much more general, so if this could be integrated into RefProp that would be great.

I don't think there is a specific upside to having 2 separate passes, but this pass (imo) is very targeted, whereas the existing RefProp impl is much more general and I think has a higher run-time cost. Of course, any argument on cost should just be benchmarked, as I don't know if this is actually pulling any weight. Another thing is that this pass runs in reverse instruction order (not sure if the existing RefProp pass does), so it doesn't need to allocate to track locations.

For what I've implemented thus far, it is necessary to have instructions be cleaned up. I don't do any checks for aliasing, so if an intermediate copy is read elsewhere it can't be safely removed currently. I think a benefit of this is that I don't have to think about StorageLive/StorageDead as _3: _1.

I'll have to check if I put it before the DSE. Should the pass clean up after itself?

Since you created the RefProp pass, would you have an idea where this could be integrated into it? If you think that would be preferable I can try that.

@cjgillot
Copy link
Contributor

This pass was designed directly as a peephole optimization to handle the case of comparing references of integers (altho I extended it here to primitives).

Specifically it looks exactly for

// _4 is a primitive without references. (there is not mut in any of the below references)
_2 = &(Shared | Shallow) _1
_3 = copy_for_deref(*2)
_4 = *_3

And upon finding this, it will then insert at the <<:

_2 = &(Shared | Shallow) _1
_3 = copy_for_deref(*2)
_4 = *_3
<< _4 = *_1

Where all the previous lines can then be dropped. It assumes that locals are in ascending order (which may be a poor assumption), and will track the maximum local seen that it is operating on. It will terminate if there are no more locals that are able to be optimization below the maximum seen local.

Yes, this assumption is wrong in general. If this is only a heuristic, this is all right, but having an order-agnostic pass would be much better.

Other soundness issue: you need to check that there is no indirect mutation of the locals you reason about.

As for the differences of this pass and ReferencePropagation, I've read thru ReferencePropagation a number of times (and I still fail to understand all of it), but as far as I can tell they are somewhat similar, but according to the top doc comment ultimately RefProp seeks to optimize when the final assignment is a reference, where for this the final assignment is a dereference. RefProp is also much more general, so if this could be integrated into RefProp that would be great.

I don't think there is a specific upside to having 2 separate passes, but this pass (imo) is very targeted, whereas the existing RefProp impl is much more general and I think has a higher run-time cost. Of course, any argument on cost should just be benchmarked, as I don't know if this is actually pulling any weight. Another thing is that this pass runs in reverse instruction order (not sure if the existing RefProp pass does), so it doesn't need to allocate to track locations.

For what I've implemented thus far, it is necessary to have instructions be cleaned up. I don't do any checks for aliasing, so if an intermediate copy is read elsewhere it can't be safely removed currently. I think a benefit of this is that I don't have to think about StorageLive/StorageDead as _3: _1.

I'll have to check if I put it before the DSE. Should the pass clean up after itself?

Yes. It'd be better. In your example, overwriting the assignment _4 = *_3 by _4 = *_1 may be the simplest solution. With that, calling remove_unused_definitions at the end of the pass will be enough.

Since you created the RefProp pass, would you have an idea where this could be integrated into it? If you think that would be preferable I can try that.

ReferencePropagation is designed around SSA properties. This pass is more of a bb-local symbolic execution, so it is better separate.

One question that was discussed at length in ReferencePropagation: how do you handle mutable references & pointers? Using a local which is mutably borrowed between two used of that mutable borrow may be UB, so we need to take care not to introduce such uses.

a = &mut b
c = &a
d = *c // Must not become b
e = *a

@apiraino apiraino added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 13, 2023
@bors
Copy link
Contributor

bors commented Jul 14, 2023

☔ The latest upstream changes (presumably #109025) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC
Copy link
Member

@JulianKnodt any updates on this?

@JulianKnodt
Copy link
Contributor Author

been meaning to get around to this, I think I won't prioritize it for now. If @cjgillot wants to somehow combine it into other MIR passes that's good with me, but if y'all busy with other things that's fine too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants