Perform instsimplify before inline to eliminate some trivial calls #128265

DianQK · 2024-07-27T07:28:03Z

I am currently working on #128081. In the current pipeline, we can get the following clone statements (godbolt):

    bb0: {
        StorageLive(_2);
        _2 = ((*_1).0: i32);
        StorageLive(_3);
        _3 = ((*_1).1: u64);
        _0 = Foo { a: move _2, b: move _3 };
        StorageDead(_3);
        StorageDead(_2);
        return;
    }

Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run InstSimplify, ReferencePropagation, and SimplifyCFG at least once. I can introduce a new pass, but I think the best place for it would be within InstSimplify.

I put InstSimplify before Inline, which takes some of the burden away from Inline.

r? @saethlin

rustbot · 2024-07-27T07:28:11Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

DianQK · 2024-07-27T08:15:14Z

Hmm…

@bors try @rust-timer queue

bors · 2024-07-27T08:16:25Z

⌛ Trying commit 5dc8f5f with merge 3689ca8...

…=<try> Perform instsimplify before inline to eliminate some trivial calls I am currently working on rust-lang#128081. In the current pipeline, we can get the following clone statements ([godbolt](https://rust.godbolt.org/z/931316fhP)): ``` bb0: { StorageLive(_2); _2 = ((*_1).0: i32); StorageLive(_3); _3 = ((*_1).1: u64); _0 = Foo { a: move _2, b: move _3 }; StorageDead(_3); StorageDead(_2); return; } ``` Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run `InstSimplify`, `ReferencePropagation`, and `SimplifyCFG` at least once. I can introduce a new pass, but I think the best place for it would be within `InstSimplify`. I put `InstSimplify` before `Inline`, which takes some of the burden away from `Inline`. r? `@saethlin`

bors · 2024-07-27T10:01:35Z

☀️ Try build successful - checks-actions
Build commit: 3689ca8 (3689ca8f298a3bf6116ce7aaacb123f88df258df)

rust-timer · 2024-07-27T12:06:35Z

Finished benchmarking commit (3689ca8): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.1%	[0.3%, 2.7%]	5
Regressions ❌ (secondary)	1.5%	[0.5%, 2.4%]	2
Improvements ✅ (primary)	-0.4%	[-0.9%, -0.2%]	13
Improvements ✅ (secondary)	-0.4%	[-0.5%, -0.3%]	3
All ❌✅ (primary)	-0.0%	[-0.9%, 2.7%]	18

Max RSS (memory usage)

Results (primary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.6%	[2.3%, 6.1%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.2%	[-6.9%, -2.4%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[-6.9%, 6.1%]	7

Cycles

Results (primary -3.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.3%	[2.0%, 2.6%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-5.4%	[-6.6%, -4.6%]	6
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-3.5%	[-6.6%, 2.6%]	8

Binary size

Results (primary -0.1%, secondary -0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 1.0%]	44
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	3
Improvements ✅ (primary)	-0.3%	[-1.0%, -0.0%]	45
Improvements ✅ (secondary)	-0.7%	[-1.8%, -0.0%]	19
All ❌✅ (primary)	-0.1%	[-1.0%, 1.0%]	89

Bootstrap: 770.829s -> 772.303s (0.19%)
Artifact size: 328.92 MiB -> 328.97 MiB (0.01%)

compiler/rustc_mir_transform/src/lib.rs

saethlin · 2024-07-27T18:00:35Z

tests/mir-opt/dataflow-const-prop/slice_len.main.DataflowConstProp.32bit.panic-abort.diff

-          _4 = _14;
-          _3 = _4;


I think you need to update the pass name in tests/mir-opt/dataflow-const-prop/slice_len.rs

Ah, I also changed all name from InstSimplify-before-inline to InstSimplify-after-simplifycfg, which can reduce some changes.

DianQK · 2024-07-28T08:35:24Z

Details

The job x86_64-gnu-llvm-17 failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)

------
 > importing cache manifest from ghcr.io/rust-lang/rust-ci-cache:3aacb9c90579defe09351ac5e8ee504359f8054da6326ff19038f1b7c90e3cb2aafe33685c6d9b76ee8d2ccbd187ca80c46ab5380485abdd8c0ce7d69cd8d8fd:
------
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-17]
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-17', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-17/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.thin-lto-import-instr-limit := 10
configure: change-id            := 99999999
---
failures:

---- [incremental] tests/incremental/hashes/call_expressions.rs stdout ----

error in revision `cfail2`: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/incremental/hashes/call_expressions.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--cfg" "cfail2" "--check-cfg" "cfg(FALSE,cfail1,cfail2,cfail3,cfail4,cfail5,cfail6)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/hashes/call_expressions/call_expressions.inc" "-Z" "incremental-verify-ich" "-O" "--error-format" "json" "--json" "future-incompat" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/hashes/call_expressions" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/hashes/call_expressions/auxiliary" "-Z" "query-dep-graph" "-O" "-Zincremental-ignore-spans"
--- stderr -------------------------------
--- stderr -------------------------------
error: `optimized_mir(change_to_ufcs)` should be clean but is not
   |
LL | pub fn change_to_ufcs() {
   | ^^^^^^^^^^^^^^^^^^^^^^^

I'm not sure what happened here. I added optimized_mir to except based on the history.

DianQK · 2024-07-28T08:56:18Z

Execution time difference for ripgrep-13.0.0:

function	time(s)	delta
LLVM_passes	3.042	0.338
optimized_mir	0.007	0.007
codegen_crate	0.226	-0.127

It looks like this has triggered more optimization analysis by LLVM. I'm looking into it. I'm not sure what happened with image-0.24.1. But I believe the increase in their time is not caused by instsimplify itself.

Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. Currently being blocked by rust-lang#128265. `@rustbot` label +S-blocked r? `@saethlin`

saethlin · 2024-07-28T14:49:12Z

Perf reports like the above (no changes to LLVM itself, but build time changes primariliy in opt builds) are usually due to MIR inlining changes. This change probably reduces the size of some functions, which makes them inlined, which is sometimes good and sometimes bad.

saethlin · 2024-07-28T14:54:46Z

I'm not sure what happened here. I added optimized_mir to except based on the history.

Yeah, that's the usual experience with these tests. They should probably have a blessing mechanism rather than manually updating the except list.

DianQK · 2024-07-28T15:01:41Z

Perf reports like the above (no changes to LLVM itself, but build time changes primariliy in opt builds) are usually due to MIR inlining changes. This change probably reduces the size of some functions, which makes them inlined, which is sometimes good and sometimes bad.

This makes sense to me. There have been many changes to the binary size. I tried different codegen-units locally and the default of 16 is a regression, lower looks like an improvement. If I turn off LTO ("thin local LTO") the problem is not reproduced.

saethlin · 2024-07-28T15:04:38Z

While it would be educational to know exactly what's up with the perf changes, whatever you find doesn't change whether the existing changes here are an improvement. I suspect it might motivate additional improvement, like what happened with #123174. I've wanted a tool that can generate a summary of the MIR inlining changes between two builds, because that would really explain exactly what's up here. I've had little luck attempting to slap something together by doing textual analysis of --emit=mir or rustc logs.

If you want to squash the 3 commits together, you can do that. I'm happy either way.

@bors delegate=DianQk

bors · 2024-07-28T15:04:40Z

✌️ @DianQK, you can now approve this pull request!

If @saethlin told you to "r=me" after making some further change, please make that change, then do @bors r=@saethlin

DianQK · 2024-07-28T15:04:44Z

I can notice locally that the LLVM backend (instruction selection) time has increased, so I'm thinking either the inline has added code or triggered some kind of slow matching pattern.

DianQK · 2024-07-28T22:15:53Z

@bors r=saethlin

bors · 2024-07-28T22:15:55Z

📌 Commit ac1c81b has been approved by saethlin

It is now in the queue for this repository.

DianQK · 2024-07-28T22:29:10Z

I suspect it might motivate additional improvement, like what happened with #123174. I've wanted a tool that can generate a summary of the MIR inlining changes between two builds, because that would really explain exactly what's up here. I've had little luck attempting to slap something together by doing textual analysis of --emit=mir or rustc logs.

It may make sense for the LLVM-like --stats argument to summarize the transformation results of various passes. I also prefer that we could have usable runtime performance tests.

DianQK · 2024-07-29T06:00:29Z

@bors r-
Conflict with #125443.

bors · 2024-07-29T06:06:07Z

☔ The latest upstream changes (presumably #125443) made this pull request unmergeable. Please resolve the merge conflicts.

DianQK · 2024-07-29T10:21:28Z

@bors r=saethlin

bors · 2024-07-29T10:21:31Z

📌 Commit ae681c9 has been approved by saethlin

It is now in the queue for this repository.

bors · 2024-07-29T12:37:00Z

⌛ Testing commit ae681c9 with merge 4db3d12...

DianQK · 2024-07-29T12:49:28Z

Considering the increasing time in LLVM, it might be useful to summarize the changes in IR:

# before
$ opt -passes=instcount --stats llir-a526d7ce45fd2284e0e7c7556ccba2425b9d25e5-ripgrep-13.0.0-Opt-Full --disable-output

 64915 instcount - Number of Call insts
 53656 instcount - Number of basic blocks
  5650 instcount - Number of non-external functions
332469 instcount - Number of instructions (of all types)

# after
$ opt -passes=instcount --stats llir-3689ca8f298a3bf6116ce7aaacb123f88df258df-ripgrep-13.0.0-Opt-Full --disable-output

 64614 instcount - Number of Call insts
 53643 instcount - Number of basic blocks
  5648 instcount - Number of non-external functions
331044 instcount - Number of instructions (of all types)

I can see that the IR has become smaller (note: this might not be the IR I want to examine).

bors · 2024-07-29T14:59:04Z

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing 4db3d12 to master...

rust-timer · 2024-07-29T16:17:10Z

Finished benchmarking commit (4db3d12): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.2%	[0.2%, 2.6%]	4
Regressions ❌ (secondary)	0.5%	[0.5%, 0.5%]	1
Improvements ✅ (primary)	-0.5%	[-0.8%, -0.2%]	12
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.3%]	2
All ❌✅ (primary)	-0.0%	[-0.8%, 2.6%]	16

Max RSS (memory usage)

Results (primary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.1%	[4.2%, 8.0%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-5.1%	[-5.9%, -4.3%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[-5.9%, 8.0%]	4

Cycles

Results (primary 1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.1%	[1.7%, 2.4%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.1%	[-1.1%, -1.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.0%	[-1.1%, 2.4%]	3

Binary size

Results (primary -0.1%, secondary -0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 1.0%]	42
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	3
Improvements ✅ (primary)	-0.3%	[-1.0%, -0.0%]	44
Improvements ✅ (secondary)	-0.6%	[-1.7%, -0.0%]	20
All ❌✅ (primary)	-0.1%	[-1.0%, 1.0%]	86

Bootstrap: 768.493s -> 770.467s (0.26%)
Artifact size: 331.80 MiB -> 331.84 MiB (0.01%)

Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. Currently being blocked by rust-lang#128265. `@rustbot` label +S-blocked r? `@saethlin`

pnkfelix · 2024-07-31T19:03:53Z

Visiting for weekly rustc-perf triage

main primary regressions are to ripgrep opt full and image opt-full
these changes were anticipated during review, seems likely result of changes to inlining decisions
marked as triaged

@rustbot label: +perf-regression-triaged

rustbot assigned saethlin Jul 27, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 27, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 27, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 27, 2024

saethlin reviewed Jul 27, 2024

View reviewed changes

compiler/rustc_mir_transform/src/lib.rs Outdated Show resolved Hide resolved

saethlin reviewed Jul 27, 2024

View reviewed changes

DianQK force-pushed the instsimplify-before-inline branch 3 times, most recently from 8686a55 to 95b83d7 Compare July 28, 2024 08:33

DianQK mentioned this pull request Jul 28, 2024

Simplify the canonical clone method and the copy-like forms to copy #128299

Merged

DianQK force-pushed the instsimplify-before-inline branch from 95b83d7 to ac1c81b Compare July 28, 2024 22:14

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 28, 2024

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 29, 2024

Perform instsimplify before inline to eliminate some trivial calls

ae681c9

DianQK force-pushed the instsimplify-before-inline branch from ac1c81b to ae681c9 Compare July 29, 2024 10:14

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 29, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 29, 2024

bors merged commit 4db3d12 into rust-lang:master Jul 29, 2024
7 checks passed

rustbot added this to the 1.82.0 milestone Jul 29, 2024

DianQK deleted the instsimplify-before-inline branch July 30, 2024 00:21

rustbot added the perf-regression-triaged The performance regression has been triaged. label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform instsimplify before inline to eliminate some trivial calls #128265

Perform instsimplify before inline to eliminate some trivial calls #128265

DianQK commented Jul 27, 2024

rustbot commented Jul 27, 2024

This comment has been minimized.

DianQK commented Jul 27, 2024

This comment has been minimized.

bors commented Jul 27, 2024

bors commented Jul 27, 2024

This comment has been minimized.

rust-timer commented Jul 27, 2024

saethlin Jul 27, 2024

DianQK Jul 28, 2024

DianQK commented Jul 28, 2024

DianQK commented Jul 28, 2024 •

edited

Loading

saethlin commented Jul 28, 2024

saethlin commented Jul 28, 2024

DianQK commented Jul 28, 2024

saethlin commented Jul 28, 2024 •

edited

Loading

bors commented Jul 28, 2024

DianQK commented Jul 28, 2024

DianQK commented Jul 28, 2024

bors commented Jul 28, 2024

DianQK commented Jul 28, 2024

DianQK commented Jul 29, 2024

bors commented Jul 29, 2024

DianQK commented Jul 29, 2024

bors commented Jul 29, 2024

bors commented Jul 29, 2024

DianQK commented Jul 29, 2024

bors commented Jul 29, 2024

rust-timer commented Jul 29, 2024

pnkfelix commented Jul 31, 2024

Perform instsimplify before inline to eliminate some trivial calls #128265

Perform instsimplify before inline to eliminate some trivial calls #128265

Conversation

DianQK commented Jul 27, 2024

rustbot commented Jul 27, 2024

This comment has been minimized.

DianQK commented Jul 27, 2024

This comment has been minimized.

bors commented Jul 27, 2024

bors commented Jul 27, 2024

This comment has been minimized.

rust-timer commented Jul 27, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DianQK commented Jul 28, 2024

DianQK commented Jul 28, 2024 • edited Loading

saethlin commented Jul 28, 2024

saethlin commented Jul 28, 2024

DianQK commented Jul 28, 2024

saethlin commented Jul 28, 2024 • edited Loading

bors commented Jul 28, 2024

DianQK commented Jul 28, 2024

DianQK commented Jul 28, 2024

bors commented Jul 28, 2024

DianQK commented Jul 28, 2024

DianQK commented Jul 29, 2024

bors commented Jul 29, 2024

DianQK commented Jul 29, 2024

bors commented Jul 29, 2024

bors commented Jul 29, 2024

DianQK commented Jul 29, 2024

bors commented Jul 29, 2024

rust-timer commented Jul 29, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

pnkfelix commented Jul 31, 2024

DianQK commented Jul 28, 2024 •

edited

Loading

saethlin commented Jul 28, 2024 •

edited

Loading