Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] mir-opt: promoting const read-only arrays #125916

Closed
wants to merge 17 commits into from

Conversation

tesuji
Copy link
Contributor

@tesuji tesuji commented Jun 3, 2024

Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy or easy to follow and review.
I'll to reuse structures from PromoteTemps when there is consensus for this pass.

Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance.
But anyway, for statistics, can I get a perf run?

cc #73825

r? ghost

Current status

  • Waiting for consensus.
  • Maybe rewrite to use GVN with mentor from oli
  • ICE on unstable feature: tests/assembly/simd-intrinsic-mask-load.rs#x86-avx512.
    In particular Simd([literal array]) now transformed to Simd(array_var). Maybe I should ignore array in constructor.
  • Fail test on nested arrays

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 3, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jun 3, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@Urgau
Copy link
Member

Urgau commented Jun 3, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 3, 2024
@bors
Copy link
Contributor

bors commented Jun 3, 2024

⌛ Trying commit 42d586c with merge 360f92e...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 3, 2024
…=<try>

[WIP] mir-opt: promoting const read-only arrays

Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy and easy to follow and review.
I'll attempt to reuse structures from PromoteTemps when there is [consensus for this pass][zulip].

Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance.
But anyway, for statistics, can I get a perf run?

cc rust-lang#73825

r? ghost

[zulip]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jun 3, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 3, 2024
&[&promote_pass, &simplify::SimplifyCfg::PromoteConsts, &coverage::InstrumentCoverage],
&[
&promote_pass,
&promote_array,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is something that should not have user-visible effects (e.g. affecting dropck, const eval UB or borrowck), it should be run as part of the regular runtime optimization pipeline

Comment on lines +370 to +361
let array_promoted = promote_array.promoted_fragments.into_inner();
promoted.extend(array_promoted);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which does mean you won't be able to use the existing promotion scheme, but would need to start looking into create_def and query feeding, which is probably not ready to support this use case yet. I have not yet given it much thought what is needed to fully support that, but if you want we can look into this together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then again, if all we're doing is creating non-generic static items, that already has precedent (we do that for nested statics), so likely you can do the same in an optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though in that case I would expect this to fall out of GVN or some similar optimization, not be its own separate path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With GVN or similar you don't even need to create new constants and MIR bodies, you can just stick the fully evaluated constant into a MIR constant

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lqd
Copy link
Member

lqd commented Jun 4, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 4, 2024
…=<try>

[WIP] mir-opt: promoting const read-only arrays

Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy or easy to follow and review.
I'll  to reuse structures from PromoteTemps when there is [consensus for this pass][zulip].

Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance.
But anyway, for statistics, can I get a perf run?

cc rust-lang#73825

r? ghost

### Current status
- Waiting for [consensus][zulip].
- Fail simd tests: tests/assembly/simd-intrinsic-mask-load.rs#x86-avx512
- *~Fail test on nested arrays~*: hack fix, may possibly fail on struct containings arrays.
- Maybe rewrite to [use GVN with mentor from oli][mentor]

[zulip]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[mentor]: rust-lang#125916 (comment)
@bors
Copy link
Contributor

bors commented Jun 4, 2024

⌛ Trying commit 0a91619 with merge c415513...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jun 4, 2024

☀️ Try build successful - checks-actions
Build commit: c415513 (c4155130fd61e7fa5e1b138de6a817f1cfb4e2fb)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c415513): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.3%] 13
Regressions ❌
(secondary)
0.6% [0.5%, 0.7%] 9
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 1
All ❌✅ (primary) 0.3% [0.2%, 0.3%] 13

Max RSS (memory usage)

Results (primary -5.8%, secondary -2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.8% [-5.8%, -5.8%] 1
Improvements ✅
(secondary)
-2.2% [-2.2%, -2.2%] 1
All ❌✅ (primary) -5.8% [-5.8%, -5.8%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 8
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.0% [-0.0%, -0.0%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.0%, 0.0%] 9

Bootstrap: 673.596s -> 672.754s (-0.13%)
Artifact size: 318.88 MiB -> 318.85 MiB (-0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 4, 2024
@scottmcm
Copy link
Member

scottmcm commented Jun 4, 2024

For the SIMD mention: the long-term goal is to stop allowing projections into repr(simd) types at all, just Transmute. So whatever fix is easiest is fine, as that situation will stop happening hopefully-soon.

@rust-log-analyzer

This comment has been minimized.

@tesuji tesuji force-pushed the mir-opt-const-array-locals branch from 32d9976 to dca7207 Compare June 10, 2024 06:57
@tesuji
Copy link
Contributor Author

tesuji commented Jun 10, 2024

Can I get another perf run before switching to use GVN?

@Kobzol
Copy link
Contributor

Kobzol commented Jun 10, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 10, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 10, 2024
…=<try>

[WIP] mir-opt: promoting const read-only arrays

Modified from a copy of PromoteTemps. It's kind of a hack so nothing fancy or easy to follow and review.
I'll  to reuse structures from PromoteTemps when there is [consensus for this pass][zulip].

Compiler is doing more work now with this opt. So I don't think this pass improves compiler performance.
But anyway, for statistics, can I get a perf run?

cc rust-lang#73825

r? ghost

### Current status
- [ ] Waiting for [consensus][zulip].
- [ ] Maybe rewrite to [use GVN with mentor from oli][mentor]
- [x] ~ICE on unstable feature:  tests/assembly/simd-intrinsic-mask-load.rs#x86-avx512.~
  In particular `Simd([literal array])` now transformed to `Simd(array_var)`. Maybe I should ignore array in constructor.
- [x] *~Fail test on nested arrays~*

[zulip]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[mentor]: rust-lang#125916 (comment)
@bors
Copy link
Contributor

bors commented Jun 10, 2024

⌛ Trying commit dca7207 with merge 63ac52a...

@bors
Copy link
Contributor

bors commented Jun 10, 2024

☀️ Try build successful - checks-actions
Build commit: 63ac52a (63ac52aeb8179ad1a9d0a60cc0cf82812d3ddb65)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (63ac52a): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.3%] 17
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.2%, 0.3%] 17

Max RSS (memory usage)

Results (primary -8.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-8.1% [-8.1%, -8.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -8.1% [-8.1%, -8.1%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary -0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.0% [-0.0%, -0.0%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.0% [-0.0%, -0.0%] 1

Bootstrap: 673.707s -> 672.398s (-0.19%)
Artifact size: 319.82 MiB -> 319.84 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 10, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 13, 2024
[WIP] gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used PromoteTemps pass.

Fix rust-lang#73825

### Current status

- [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F).

r? ghost
@tesuji
Copy link
Contributor Author

tesuji commented Jun 14, 2024

Closing in favor of #126444.
But there may have clean-up commits for PromoteTemps.

@tesuji tesuji closed this Jun 14, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 14, 2024
[WIP] gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used PromoteTemps pass.

Fix rust-lang#73825

### Current status

- [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F).

r? ghost
@tesuji tesuji deleted the mir-opt-const-array-locals branch June 16, 2024 09:34
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 21, 2024
promote_consts: some clean-up after experimenting

This is some clean-up after experimenting in rust-lang#125916,
Prefer to review commit-by-commit.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 14, 2024
gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used `PromoteTemps` pass.

This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series
of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.
See more in rust-lang#73825 or [zulip][opsem] for more info.

[Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not.

[Prior discussions on zulip][opsem].

This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`.
![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1)

Fix rust-lang#73825

r? cjgillot

### Unresolved questions
- [ ] Should we ignore nested arrays?
    I think that promoting nested arrays is bloating codegen.
- [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed.
    If yes, the test should be updated to make arrays larger than 32 bytes.
- [x] ~Is this concerning that  `call(move _1)` is now `call(const [array])`?~
  It reverted back to `call(move _1)`

[opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[bench]: rust-lang/rust-clippy#12854 (comment)
[perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 14, 2024
gvn: Promote/propagate const local array

Rewriting of rust-lang#125916 which used `PromoteTemps` pass.

This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs `llvm.memcpy` (which is easy for LLVM to optimize) instead of a series
of `store` on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.
See more in rust-lang#73825 or [zulip][opsem] for more info.

[Here is a simple micro benchmark][bench] that shows the performance differences between promoting arrays or not.

[Prior discussions on zulip][opsem].

This patch [saves about 600 KB][perf] (~0.5%) of `librustc_driver.so`.
![image](https://github.com/rust-lang/rust/assets/15225902/0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1)

Fix rust-lang#73825

r? cjgillot

### Unresolved questions
- [ ] Should we ignore nested arrays?
    I think that promoting nested arrays is bloating codegen.
- [ ] Should stack_threshold be at least 32 bytes? Like the benchmark showed.
    If yes, the test should be updated to make arrays larger than 32 bytes.
- [x] ~Is this concerning that  `call(move _1)` is now `call(const [array])`?~
  It reverted back to `call(move _1)`

[opsem]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F
[bench]: rust-lang/rust-clippy#12854 (comment)
[perf]: https://perf.rust-lang.org/compare.html?start=f9515fdd5aa132e27d9b580a35b27f4b453251c1&end=7e160d4b55bb5a27be0696f45db247ccc2e166d9&stat=size%3Alinked_artifact&tab=artifact-size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-prop Area: Constant propagation A-mir-opt Area: MIR optimizations perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants