-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate obligations in opt_normalize_projection_type
#91186
Conversation
r? @estebank (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 34812ba124bf254ef86abf1b61c5801d58b2375e with merge 1937cf62d0cb08000a49346c146830a7de817de3... |
☀️ Try build successful - checks-actions |
Queued 1937cf62d0cb08000a49346c146830a7de817de3 with parent dd549dc, future comparison URL. |
Finished benchmarking commit (1937cf62d0cb08000a49346c146830a7de817de3): comparison url. Summary: This change led to very large relevant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
Mostly the same results, except a different |
34812ba
to
441d34f
Compare
ping @estebank |
☔ The latest upstream changes (presumably #91945) made this pull request unmergeable. Please resolve the merge conflicts. |
441d34f
to
df24a7f
Compare
Rerunning perf since #90423 landed. @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit df24a7fe0c8de8459b11d2022276c9222f8c1191 with merge a00e9c1413f84459311a0b7c905264d28c199c08... |
Queued a00e9c1413f84459311a0b7c905264d28c199c08 with parent c5ecc15, future comparison URL. |
Finished benchmarking commit (a00e9c1413f84459311a0b7c905264d28c199c08): comparison url. Summary: This change led to moderate relevant regressions 😿 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
let expected_new = iter.len(); | ||
let combined_size = initial_size + expected_new; | ||
|
||
if combined_size <= 16 || combined_size <= current_capacity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could using SmallVec
be an alternative to making these checks yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checks here are to determine when to skip deduplication work, not directly to reduce memory footprint (although it does indirectly by preventing blowup with duplicate items)
}; | ||
}; | ||
|
||
self.obligations.retain(|obligation| !is_duplicate(obligation)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that we don't dedup earlier because of cache effects (a perf optimization), but doesn't that mean that you always incur the O(n) cost of deduplicating the obligations caught here? Can we check what the perf implications would be of having state in the struct
for "self.obligations
is already deduped"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but doesn't that mean that you always incur the O(n) cost of deduplicating the obligations caught here?
Well, not always the vec is grown, no, that would result in O(n²) time. It's amortized O(n) because we only do it when the vec would grow.
Can we check what the perf implications would be of having state in the struct for "self.obligations is already deduped"?
Currently the dedup struct is only constructed temporarily in method scope, it lives much shorter than the vec, so it can't remember whether the vec has already been deduped. It would require larger changes to plumb the dedup state through all method calls that pass the &mut Vec<Obligations>
around. That's what I meant with the following in the PR description
or we need to start plumbing the deduplication or a sorted set all the way through the APIs that currently use Vec.
ObligationsDedup { obligations: vec } | ||
} | ||
|
||
pub fn extend<'b>(&mut self, iter: impl ExactSizeIterator<Item = Cow<'b, Obligation<'tcx, T>>>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat confused. Why does Item
need to be a Cow
? I'm not seeing where borrows make sense, but I might be missing something ^_^
Wouldn't passing a &[Obligation<'tcx, T>]
give us maybe better perf? Because doing obligations.into_iter().map(...)
in the callers is incurring new allocations in the hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing where borrows make sense, but I might be missing something ^_^
Items from the projection cache are borrowed, items from a fresh project_type + normalization are owned.
Because doing obligations.into_iter().map(...) in the callers is incurring new allocations in the hot path.
Do you mean obligations.extend(result.obligations.into_iter().map(Cow::Owned));
? That doesn't allocate new obligations, it just stuffs already owned ones into a Cow::Owned
enum variant.
(reminder: after this patch will land on nightly, it's a good candidate for a beta-backport, mentioned on Zulip) |
@apiraino I'm unsure whether the PR should be taken as-is or refactored to replace the If it's wanted for beta-inclusion I guess the smaller scope makes more sense. |
df24a7f
to
a83720c
Compare
a83720c
to
f29ef49
Compare
r? @jackh726 I'll take over the review here, but it might take me a couple days due to holidays |
Let's check perf with new benchmark @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit f29ef49 with merge d644ece99d13e11fc68b6183ddb5b79b4ca2b14b... |
☀️ Try build successful - checks-actions |
Queued d644ece99d13e11fc68b6183ddb5b79b4ca2b14b with parent 8c7f2bf, future comparison URL. |
Finished benchmarking commit (d644ece99d13e11fc68b6183ddb5b79b4ca2b14b): comparison url. Summary: This benchmark run shows 4 relevant improvements 🎉 but 14 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
Does this still fix a perf regression for something not covered by the benchmarks? @the8472 |
@jackh726 not that I am aware of. There still is some duplication happening in obligations that needs to be deduplicated but there already is the deduplication happening later in It does show up as a few potential max-rss reductions in the latest benchmarks. But considering that this isn't consistent with previous perf runs that's possibly noise. I think there's still some optimization potential here, but it'll require more changes than just this. I'll close it for now. |
This adds a second deduplication site (in addition to the original one at
impl_or_trait_obligations
) inopt_normalize_projection_type
since that's where the OOMing allocations occured in #74456.Fixes an OOM and compile time blowup in the reduced test-case on that issue.
Since it only touches one place where obligations are processed it might not fix all of these blowups. If more places need deduplication then either they also need to have the vec locally wrapped in
ObligationsDedup
or we need to start plumbing the deduplication or a sorted set all the way through the APIs that currently useVec
.perf results from original PR: #90913
The
deep-vector debug incr-unchanged
regression is most likely spurious due to incremental verification noise, so the perf impact is mostly positive.