-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] mir-opt: promoting const read-only arrays #125916
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…=<try> [WIP] mir-opt: promoting const read-only arrays Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy and easy to follow and review. I'll attempt to reuse structures from PromoteTemps when there is [consensus for this pass][zulip]. Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance. But anyway, for statistics, can I get a perf run? cc rust-lang#73825 r? ghost [zulip]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
&[&promote_pass, &simplify::SimplifyCfg::PromoteConsts, &coverage::InstrumentCoverage], | ||
&[ | ||
&promote_pass, | ||
&promote_array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is something that should not have user-visible effects (e.g. affecting dropck, const eval UB or borrowck), it should be run as part of the regular runtime optimization pipeline
let array_promoted = promote_array.promoted_fragments.into_inner(); | ||
promoted.extend(array_promoted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which does mean you won't be able to use the existing promotion scheme, but would need to start looking into create_def
and query feeding, which is probably not ready to support this use case yet. I have not yet given it much thought what is needed to fully support that, but if you want we can look into this together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then again, if all we're doing is creating non-generic static items, that already has precedent (we do that for nested statics), so likely you can do the same in an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though in that case I would expect this to fall out of GVN or some similar optimization, not be its own separate path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With GVN or similar you don't even need to create new constants and MIR bodies, you can just stick the fully evaluated constant into a MIR constant
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…=<try> [WIP] mir-opt: promoting const read-only arrays Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy or easy to follow and review. I'll to reuse structures from PromoteTemps when there is [consensus for this pass][zulip]. Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance. But anyway, for statistics, can I get a perf run? cc rust-lang#73825 r? ghost ### Current status - Waiting for [consensus][zulip]. - Fail simd tests: tests/assembly/simd-intrinsic-mask-load.rs#x86-avx512 - *~Fail test on nested arrays~*: hack fix, may possibly fail on struct containings arrays. - Maybe rewrite to [use GVN with mentor from oli][mentor] [zulip]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [mentor]: rust-lang#125916 (comment)
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c415513): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -5.8%, secondary -2.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 673.596s -> 672.754s (-0.13%) |
For the SIMD mention: the long-term goal is to stop allowing projections into |
This comment has been minimized.
This comment has been minimized.
32d9976
to
dca7207
Compare
Can I get another perf run before switching to use GVN? |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…=<try> [WIP] mir-opt: promoting const read-only arrays Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy or easy to follow and review. I'll to reuse structures from PromoteTemps when there is [consensus for this pass][zulip]. Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance. But anyway, for statistics, can I get a perf run? cc rust-lang#73825 r? ghost ### Current status - [ ] Waiting for [consensus][zulip]. - [ ] Maybe rewrite to [use GVN with mentor from oli][mentor] - [x] ~ICE on unstable feature: tests/assembly/simd-intrinsic-mask-load.rs#x86-avx512.~ In particular `Simd([literal array])` now transformed to `Simd(array_var)`. Maybe I should ignore array in constructor. - [x] *~Fail test on nested arrays~* [zulip]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [mentor]: rust-lang#125916 (comment)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (63ac52a): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -8.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 673.707s -> 672.398s (-0.19%) |
[WIP] gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used PromoteTemps pass. Fix rust-lang#73825 ### Current status - [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F). r? ghost
Closing in favor of #126444. |
[WIP] gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used PromoteTemps pass. Fix rust-lang#73825 ### Current status - [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F). r? ghost
promote_consts: some clean-up after experimenting This is some clean-up after experimenting in rust-lang#125916, Prefer to review commit-by-commit.
gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used `PromoteTemps` pass. This allows promoting constant local arrays as anonymous constants. So that's in codegen for a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case. See more in rust-lang#73825 or [zulip][opsem] for more info. [Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not. [Prior discussions on zulip][opsem]. This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`. ![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1) Fix rust-lang#73825 r? cjgillot ### Unresolved questions - [ ] Should we ignore nested arrays? I think that promoting nested arrays is bloating codegen. - [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed. If yes, the test should be updated to make arrays larger than 32 bytes. - [x] ~Is this concerning that `call(move _1)` is now `call(const [array])`?~ It reverted back to `call(move _1)` [opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [bench]: rust-lang/rust-clippy#12854 (comment) [perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used `PromoteTemps` pass. This allows promoting constant local arrays as anonymous constants. So that's in codegen for a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case. See more in rust-lang#73825 or [zulip][opsem] for more info. [Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not. [Prior discussions on zulip][opsem]. This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`. ![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1) Fix rust-lang#73825 r? cjgillot ### Unresolved questions - [ ] Should we ignore nested arrays? I think that promoting nested arrays is bloating codegen. - [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed. If yes, the test should be updated to make arrays larger than 32 bytes. - [x] ~Is this concerning that `call(move _1)` is now `call(const [array])`?~ It reverted back to `call(move _1)` [opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F [bench]: rust-lang/rust-clippy#12854 (comment) [perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy or easy to follow and review.
I'll to reuse structures from PromoteTemps when there is consensus for this pass.
Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance.
But anyway, for statistics, can I get a perf run?
cc #73825
r? ghost
Current status
ICE on unstable feature: tests/assembly/simd-intrinsic-mask-load.rs#x86-avx512.In particular
Simd([literal array])
now transformed toSimd(array_var)
. Maybe I should ignore array in constructor.Fail test on nested arrays