-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an early-exit to overlapping_impls
#69010
Add an early-exit to overlapping_impls
#69010
Conversation
This triggers in approx. 37% of all calls when building the stm32f0(x2) crate, saving a small amount of time.
r? @estebank (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
⌛ Trying commit d0bbdc3 with merge 96ce90522e7dba6b32713401457c46f5d2156065... |
☀️ Try build successful - checks-azure |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@rust-timer build 96ce905 |
Queued 96ce90522e7dba6b32713401457c46f5d2156065 with parent 71c7e14, future comparison URL. |
Finished benchmarking try commit 96ce90522e7dba6b32713401457c46f5d2156065, comparison URL. |
Hmm, this made things slightly worse for a few benchmarks, and shows no improvements |
I checked against nrf52810-pac, and there it reduces the time spent in coherence by ~45-50% (from ~5 seconds to 2.6), so it definitely helps in some cases. |
Would adding crates impacted by this to the rustc perf suite make sense? That the improvement you see locally do not appear in the suite suggests the these code paths are not properly exercised. |
Indeed, and I am planning to do that, however we plan to make a number of changes to the code generator for these crates, which could change the parts of rustc they stress. I also plan to change it so the generated code builds faster all around (in addition to improving rustc performance in general). |
I am slightly worried about the mild regressions, but @rust-lang/compiler Let's keep an eye out for this in case there are more regressions in the wild, but I believe that this should have a net positive impact. @bors r+ rollup=never |
📌 Commit d0bbdc3 has been approved by |
🌲 The tree is currently closed for pull requests below priority 100, this pull request will be tested once the tree is reopened |
@bors r- as it needs a rebase. |
syn-opt is extremely noisy, it probably didn't actually improve. I'll have to re-check the results anyways since other optimizations have landed since. |
[triagebot] I've seen eddyb mentioning that |
Yeah, the issue is #69060, and we keep seeing this pretty often. |
Triaged |
@jonas-schievink waiting on a rebase |
The benchmark still looks like a slight regression. I'll have to measure this again, but I've landed some other improvements in this area, so this might not even be necessary anymore. For now, closing this until I have more time to work on compile-time improvements again. |
Try fast_reject::simplify_type in coherence before doing full check This is a reattempt at landing rust-lang#69010 (by `@jonas-schievink).` The change adds a fast path for coherence checking to see if there's no way for types to unify since full coherence checking can be somewhat expensive. This has big effects on code generated by the [`windows`](https://github.com/microsoft/windows-rs) which in some cases spends as much as 20% of compilation time in the `specialization_graph_of` query. In local benchmarks this took a compilation that previously took ~500 seconds down to ~380 seconds. This is surely not going to make a difference on much smaller crates, so the question is whether it will have a negative impact. rust-lang#69010 was closed because some of the perf suite crates did show small regressions. Additional discussion of this issue is happening [here](https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/windows-rs.20perf).
This is hit in approx. 37% of all calls when building the stm32f0(x2) crate, saving a small amount of time.
This can probably be done more effectively than using
simplify_type
, since that only seems to descend by one layer.