Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform instsimplify before inline to eliminate some trivial calls #128265

Merged
merged 1 commit into from
Jul 29, 2024

Conversation

DianQK
Copy link
Member

@DianQK DianQK commented Jul 27, 2024

I am currently working on #128081. In the current pipeline, we can get the following clone statements (godbolt):

    bb0: {
        StorageLive(_2);
        _2 = ((*_1).0: i32);
        StorageLive(_3);
        _3 = ((*_1).1: u64);
        _0 = Foo { a: move _2, b: move _3 };
        StorageDead(_3);
        StorageDead(_2);
        return;
    }

Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run InstSimplify, ReferencePropagation, and SimplifyCFG at least once. I can introduce a new pass, but I think the best place for it would be within InstSimplify.

I put InstSimplify before Inline, which takes some of the burden away from Inline.

r? @saethlin

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 27, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 27, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rust-log-analyzer

This comment has been minimized.

@DianQK
Copy link
Member Author

DianQK commented Jul 27, 2024

Hmm…

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 27, 2024
@bors
Copy link
Contributor

bors commented Jul 27, 2024

⌛ Trying commit 5dc8f5f with merge 3689ca8...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 27, 2024
…=<try>

 Perform instsimplify before inline to eliminate some trivial calls

I am currently working on rust-lang#128081. In the current pipeline, we can get the following clone statements ([godbolt](https://rust.godbolt.org/z/931316fhP)):

```
    bb0: {
        StorageLive(_2);
        _2 = ((*_1).0: i32);
        StorageLive(_3);
        _3 = ((*_1).1: u64);
        _0 = Foo { a: move _2, b: move _3 };
        StorageDead(_3);
        StorageDead(_2);
        return;
    }
```

Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run `InstSimplify`, `ReferencePropagation`, and `SimplifyCFG` at least once. I can introduce a new pass, but I think the best place for it would be within `InstSimplify`.

I put `InstSimplify` before `Inline`, which takes some of the burden away from `Inline`.

r? `@saethlin`
@bors
Copy link
Contributor

bors commented Jul 27, 2024

☀️ Try build successful - checks-actions
Build commit: 3689ca8 (3689ca8f298a3bf6116ce7aaacb123f88df258df)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3689ca8): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.1% [0.3%, 2.7%] 5
Regressions ❌
(secondary)
1.5% [0.5%, 2.4%] 2
Improvements ✅
(primary)
-0.4% [-0.9%, -0.2%] 13
Improvements ✅
(secondary)
-0.4% [-0.5%, -0.3%] 3
All ❌✅ (primary) -0.0% [-0.9%, 2.7%] 18

Max RSS (memory usage)

Results (primary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.6% [2.3%, 6.1%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.2% [-6.9%, -2.4%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [-6.9%, 6.1%] 7

Cycles

Results (primary -3.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.3% [2.0%, 2.6%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.4% [-6.6%, -4.6%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.5% [-6.6%, 2.6%] 8

Binary size

Results (primary -0.1%, secondary -0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.0%] 44
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 3
Improvements ✅
(primary)
-0.3% [-1.0%, -0.0%] 45
Improvements ✅
(secondary)
-0.7% [-1.8%, -0.0%] 19
All ❌✅ (primary) -0.1% [-1.0%, 1.0%] 89

Bootstrap: 770.829s -> 772.303s (0.19%)
Artifact size: 328.92 MiB -> 328.97 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 27, 2024
Comment on lines 33 to 34
_4 = _14;
_3 = _4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to update the pass name in tests/mir-opt/dataflow-const-prop/slice_len.rs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I also changed all name from InstSimplify-before-inline to InstSimplify-after-simplifycfg, which can reduce some changes.

@DianQK DianQK force-pushed the instsimplify-before-inline branch 3 times, most recently from 8686a55 to 95b83d7 Compare July 28, 2024 08:33
@DianQK
Copy link
Member Author

DianQK commented Jul 28, 2024

Details

The job x86_64-gnu-llvm-17 failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)

------
 > importing cache manifest from ghcr.io/rust-lang/rust-ci-cache:3aacb9c90579defe09351ac5e8ee504359f8054da6326ff19038f1b7c90e3cb2aafe33685c6d9b76ee8d2ccbd187ca80c46ab5380485abdd8c0ce7d69cd8d8fd:
------
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-17]
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-17', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-17/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.thin-lto-import-instr-limit := 10
configure: change-id            := 99999999
---
failures:

---- [incremental] tests/incremental/hashes/call_expressions.rs stdout ----

error in revision `cfail2`: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/incremental/hashes/call_expressions.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--cfg" "cfail2" "--check-cfg" "cfg(FALSE,cfail1,cfail2,cfail3,cfail4,cfail5,cfail6)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/hashes/call_expressions/call_expressions.inc" "-Z" "incremental-verify-ich" "-O" "--error-format" "json" "--json" "future-incompat" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/hashes/call_expressions" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/hashes/call_expressions/auxiliary" "-Z" "query-dep-graph" "-O" "-Zincremental-ignore-spans"
--- stderr -------------------------------
--- stderr -------------------------------
error: `optimized_mir(change_to_ufcs)` should be clean but is not
   |
LL | pub fn change_to_ufcs() {
   | ^^^^^^^^^^^^^^^^^^^^^^^

I'm not sure what happened here. I added optimized_mir to except based on the history.

@DianQK
Copy link
Member Author

DianQK commented Jul 28, 2024

Execution time difference for ripgrep-13.0.0:

function time(s) delta
LLVM_passes 3.042 0.338
optimized_mir 0.007 0.007
codegen_crate 0.226 -0.127

It looks like this has triggered more optimization analysis by LLVM. I'm looking into it. I'm not sure what happened with image-0.24.1. But I believe the increase in their time is not caused by instsimplify itself.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 28, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081. Currently being blocked by rust-lang#128265.

`@rustbot` label +S-blocked

r? `@saethlin`
@saethlin
Copy link
Member

Perf reports like the above (no changes to LLVM itself, but build time changes primariliy in opt builds) are usually due to MIR inlining changes. This change probably reduces the size of some functions, which makes them inlined, which is sometimes good and sometimes bad.

@saethlin
Copy link
Member

I'm not sure what happened here. I added optimized_mir to except based on the history.

Yeah, that's the usual experience with these tests. They should probably have a blessing mechanism rather than manually updating the except list.

@DianQK
Copy link
Member Author

DianQK commented Jul 28, 2024

Perf reports like the above (no changes to LLVM itself, but build time changes primariliy in opt builds) are usually due to MIR inlining changes. This change probably reduces the size of some functions, which makes them inlined, which is sometimes good and sometimes bad.

This makes sense to me. There have been many changes to the binary size. I tried different codegen-units locally and the default of 16 is a regression, lower looks like an improvement. If I turn off LTO ("thin local LTO") the problem is not reproduced.

@saethlin
Copy link
Member

saethlin commented Jul 28, 2024

While it would be educational to know exactly what's up with the perf changes, whatever you find doesn't change whether the existing changes here are an improvement. I suspect it might motivate additional improvement, like what happened with #123174. I've wanted a tool that can generate a summary of the MIR inlining changes between two builds, because that would really explain exactly what's up here. I've had little luck attempting to slap something together by doing textual analysis of --emit=mir or rustc logs.

If you want to squash the 3 commits together, you can do that. I'm happy either way.

@bors delegate=DianQk

@bors
Copy link
Contributor

bors commented Jul 28, 2024

✌️ @DianQK, you can now approve this pull request!

If @saethlin told you to "r=me" after making some further change, please make that change, then do @bors r=@saethlin

@DianQK
Copy link
Member Author

DianQK commented Jul 28, 2024

I can notice locally that the LLVM backend (instruction selection) time has increased, so I'm thinking either the inline has added code or triggered some kind of slow matching pattern.

@DianQK DianQK force-pushed the instsimplify-before-inline branch from 95b83d7 to ac1c81b Compare July 28, 2024 22:14
@DianQK
Copy link
Member Author

DianQK commented Jul 28, 2024

@bors r=saethlin

@bors
Copy link
Contributor

bors commented Jul 28, 2024

📌 Commit ac1c81b has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 28, 2024
@DianQK
Copy link
Member Author

DianQK commented Jul 28, 2024

I suspect it might motivate additional improvement, like what happened with #123174. I've wanted a tool that can generate a summary of the MIR inlining changes between two builds, because that would really explain exactly what's up here. I've had little luck attempting to slap something together by doing textual analysis of --emit=mir or rustc logs.

It may make sense for the LLVM-like --stats argument to summarize the transformation results of various passes. I also prefer that we could have usable runtime performance tests.

@DianQK
Copy link
Member Author

DianQK commented Jul 29, 2024

@bors r-
Conflict with #125443.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 29, 2024
@bors
Copy link
Contributor

bors commented Jul 29, 2024

☔ The latest upstream changes (presumably #125443) made this pull request unmergeable. Please resolve the merge conflicts.

@DianQK DianQK force-pushed the instsimplify-before-inline branch from ac1c81b to ae681c9 Compare July 29, 2024 10:14
@DianQK
Copy link
Member Author

DianQK commented Jul 29, 2024

@bors r=saethlin

@bors
Copy link
Contributor

bors commented Jul 29, 2024

📌 Commit ae681c9 has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 29, 2024
@bors
Copy link
Contributor

bors commented Jul 29, 2024

⌛ Testing commit ae681c9 with merge 4db3d12...

@DianQK
Copy link
Member Author

DianQK commented Jul 29, 2024

Considering the increasing time in LLVM, it might be useful to summarize the changes in IR:

# before
$ opt -passes=instcount --stats llir-a526d7ce45fd2284e0e7c7556ccba2425b9d25e5-ripgrep-13.0.0-Opt-Full --disable-output

 64915 instcount - Number of Call insts
 53656 instcount - Number of basic blocks
  5650 instcount - Number of non-external functions
332469 instcount - Number of instructions (of all types)

# after
$ opt -passes=instcount --stats llir-3689ca8f298a3bf6116ce7aaacb123f88df258df-ripgrep-13.0.0-Opt-Full --disable-output

 64614 instcount - Number of Call insts
 53643 instcount - Number of basic blocks
  5648 instcount - Number of non-external functions
331044 instcount - Number of instructions (of all types)

I can see that the IR has become smaller (note: this might not be the IR I want to examine).

@bors
Copy link
Contributor

bors commented Jul 29, 2024

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing 4db3d12 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 29, 2024
@bors bors merged commit 4db3d12 into rust-lang:master Jul 29, 2024
7 checks passed
@rustbot rustbot added this to the 1.82.0 milestone Jul 29, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4db3d12): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [0.2%, 2.6%] 4
Regressions ❌
(secondary)
0.5% [0.5%, 0.5%] 1
Improvements ✅
(primary)
-0.5% [-0.8%, -0.2%] 12
Improvements ✅
(secondary)
-0.3% [-0.4%, -0.3%] 2
All ❌✅ (primary) -0.0% [-0.8%, 2.6%] 16

Max RSS (memory usage)

Results (primary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.1% [4.2%, 8.0%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.1% [-5.9%, -4.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [-5.9%, 8.0%] 4

Cycles

Results (primary 1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [1.7%, 2.4%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.1% [-1.1%, -1.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [-1.1%, 2.4%] 3

Binary size

Results (primary -0.1%, secondary -0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.0%] 42
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 3
Improvements ✅
(primary)
-0.3% [-1.0%, -0.0%] 44
Improvements ✅
(secondary)
-0.6% [-1.7%, -0.0%] 20
All ❌✅ (primary) -0.1% [-1.0%, 1.0%] 86

Bootstrap: 768.493s -> 770.467s (0.26%)
Artifact size: 331.80 MiB -> 331.84 MiB (0.01%)

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 29, 2024
Simplify the canonical clone method and the copy-like forms to copy

Fixes rust-lang#128081. Currently being blocked by rust-lang#128265.

`@rustbot` label +S-blocked

r? `@saethlin`
@DianQK DianQK deleted the instsimplify-before-inline branch July 30, 2024 00:21
@pnkfelix
Copy link
Member

Visiting for weekly rustc-perf triage

  • main primary regressions are to ripgrep opt full and image opt-full
  • these changes were anticipated during review, seems likely result of changes to inlining decisions
  • marked as triaged

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants