[compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization #14796

vineethk · 2024-09-29T20:41:22Z

Description

Note: this PR will not be merged until appropriate comparison testing is carried out on this branch.

In this PR, we implement a peephole optimization for addressing inefficient loads generated (load constant that is immediately stored, and then later moved back to the stack). This optimization subsumes a previous optimization, which has been removed.

With this optimization, when compiling the aptos-framework, the difference between number of instructions generated by v1 vs. v2 is down to an increase of 0.7%.

This PR also includes some refactoring in the peephole optimization traits, and some small changes to the output of the instruction count comparison script.

How Has This Been Tested?

Existing tests pass, baselines have been verified to confirm that there is no regression in terms of number of instructions generated, only improvements.
A few tests have been added to show case improvements due to this optimization, which can be quite significant in some cases (eg., in opt_load_01.move, test03 goes from 34 instructions to 14).

Key Areas to Review

Correctness of the optimization.

Type of Change

Performance improvement

Which Components or Systems Does This Change Impact?

Move Compiler

trunk-io · 2024-09-29T20:41:26Z

⏱️ 1h 29m total CI duration on this PR

Job	Cumulative Duration	Recent Runs
rust-move-unit-coverage	17m	🟩
rust-move-unit-coverage	15m	🟩
rust-move-unit-coverage	11m	🟩
rust-move-tests	10m	🟩
rust-move-tests	9m	🟩
rust-move-tests	9m	🟩
check-dynamic-deps	7m	🟩 🟩 🟩
rust-cargo-deny	6m	🟩 🟩 🟩
general-lints	2m	🟩 🟩 🟩
semgrep/ci	1m	🟩 🟩 🟩
file_change_determinator	35s	🟩 🟩 🟩
permission-check	12s	🟩 🟩 🟩
permission-check	8s	🟩 🟩 🟩
permission-check	7s	🟩 🟩 🟩

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

vineethk · 2024-09-29T20:41:36Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @vineethk and the rest of your teammates on Graphite

codecov · 2024-09-29T20:58:38Z

Codecov Report

Attention: Patch coverage is 97.77778% with 1 line in your changes missing coverage. Please review.

Project coverage is 60.1%. Comparing base (55e368c) to head (a48efa4).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...at_generator/peephole_optimizer/reducible_pairs.rs	92.3%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #14796   +/-   ##
=======================================
  Coverage    60.1%    60.1%           
=======================================
  Files         856      856           
  Lines      211013   211019    +6     
=======================================
+ Hits       126824   126830    +6     
  Misses      84189    84189

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

brmataptos

Need a comment, otherwise, good quick solution.

brmataptos · 2024-09-30T17:34:38Z

...arty/move/move-compiler-v2/src/file_format_generator/peephole_optimizer/inefficient_loads.rs

+
+impl InefficientLoads {
+    // We need at least 3 instructions, corresponding to points 1, 2, and 4 in the pattern
+    // described in the module documentation.


module documentation --> file documentation, above.

It is indeed the module documentation (made using //! at the beginning of the file).

I'll add saying "at the top of the file" to clarify, but we have used this terminology elsewhere in the code base in the same sense as here.

brmataptos · 2024-09-30T17:41:38Z

...arty/move/move-compiler-v2/src/file_format_generator/peephole_optimizer/inefficient_loads.rs

+}
+
+impl WindowOptimizer for InefficientLoads {
+    fn optimize_window(&self, window: &[Bytecode]) -> Option<(Vec<Bytecode>, usize)> {


Document this function. The Vec seems to be a new copy of the window which is optimized. What is the usize?

Note that this is an implementation of a trait method (WindowOptimizer::optimize_window), which contains the documentation:

/// Given a `window` of bytecode, return a tuple containing: /// 1. an optimized version of a non-empty prefix of the `window`. /// 2. size of this prefix (should be non-zero). /// If `None` is returned, the `window` is not optimized.

I don't think it makes sense to copy paste this documentation of the trait method into every implementation: this is not the style followed either in our repo or generally in Rust or other languages I have worked with. So I am leaving this as is.

brmataptos · 2024-09-30T17:45:12Z

third_party/move/move-compiler-v2/src/file_format_generator/peephole_optimizer/optimizers.rs


-impl<T: FixedWindowOptimizer> BasicBlockOptimizer for FixedWindowProcessor<T> {
+impl<T: WindowOptimizer> BasicBlockOptimizer for WindowProcessor<T> {
    fn optimize(&self, block: &[Bytecode]) -> Vec<Bytecode> {
        let mut old_block = block.to_vec();
        // Run single passes until code stops changing.


It seems likely this can be optimized, but we can wait and see how slow it is first.

SG. Right now, there does not seem to be any statistical difference between compilation with this on and off, when compiling the aptos framework and its dependencies.

brmataptos · 2024-09-30T17:55:09Z

third_party/move/move-compiler-v2/tests/file-format-generator/opt_load_02.opt.exp

+	0: LdU64(1)
+	1: LdU64(2)
+	2: LdU64(3)
+	3: CopyLoc[0](Arg0: &S)


We really should fix the bytecode generator. This kind of thing doesn't look easily amenable to peephole opts.

I added this test to showcase a case where just the peephole won't be sufficient, as you point out. I have a follow up PR where I have reduced a bunch of remaining cases from the aptos-framework where we have optimizable code-gen behavior.

github-actions · 2024-10-12T00:47:00Z

✅ Forge suite `compat` success on `7eeba4cd15892717741a614add1afde004c7855f` ==> `a48efa472ae83f7a4133ee47c22f094c616d8dcd`

Compatibility test results for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd (PR)
1. Check liveness of validators at old version: 7eeba4cd15892717741a614add1afde004c7855f
compatibility::simple-validator-upgrade::liveness-check : committed: 13126.88 txn/s, latency: 2199.34 ms, (p50: 1800 ms, p70: 2000, p90: 2300 ms, p99: 15700 ms), latency samples: 513840
2. Upgrading first Validator to new version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7166.57 txn/s, latency: 3962.48 ms, (p50: 4500 ms, p70: 4700, p90: 4900 ms, p99: 4900 ms), latency samples: 133880
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6077.90 txn/s, latency: 5172.32 ms, (p50: 5400 ms, p70: 5500, p90: 6900 ms, p99: 7200 ms), latency samples: 228560
3. Upgrading rest of first batch to new version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7126.95 txn/s, latency: 3921.06 ms, (p50: 4300 ms, p70: 4600, p90: 4900 ms, p99: 5000 ms), latency samples: 132300
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7177.69 txn/s, latency: 4458.73 ms, (p50: 4500 ms, p70: 5000, p90: 6100 ms, p99: 6300 ms), latency samples: 244180
4. upgrading second batch to new version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11384.09 txn/s, latency: 2386.74 ms, (p50: 2500 ms, p70: 2700, p90: 3300 ms, p99: 3700 ms), latency samples: 199320
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11411.41 txn/s, latency: 2703.00 ms, (p50: 2600 ms, p70: 2800, p90: 3800 ms, p99: 5200 ms), latency samples: 370280
5. check swarm health
Compatibility test for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd passed
Test Ok

github-actions · 2024-10-12T00:47:22Z

✅ Forge suite `realistic_env_max_load` success on `a48efa472ae83f7a4133ee47c22f094c616d8dcd`

two traffics test: inner traffic : committed: 13621.66 txn/s, latency: 2921.54 ms, (p50: 2700 ms, p70: 3000, p90: 3200 ms, p99: 3600 ms), latency samples: 5179220
two traffics test : committed: 100.00 txn/s, latency: 2592.31 ms, (p50: 2500 ms, p70: 2600, p90: 2900 ms, p99: 7100 ms), latency samples: 1760
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.233, avg: 0.218", "QsPosToProposal: max: 0.309, avg: 0.252", "ConsensusProposalToOrdered: max: 0.318, avg: 0.299", "ConsensusOrderedToCommit: max: 0.514, avg: 0.474", "ConsensusProposalToCommit: max: 0.815, avg: 0.773"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.84s no progress at version 2668000 (avg 0.21s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.52s no progress at version 2667998 (avg 8.52s) [limit 15].
Test Ok

github-actions · 2024-10-12T00:48:46Z

✅ Forge suite `framework_upgrade` success on `7eeba4cd15892717741a614add1afde004c7855f` ==> `a48efa472ae83f7a4133ee47c22f094c616d8dcd`

Compatibility test results for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd (PR)
Upgrade the nodes to version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1206.86 txn/s, submitted: 1209.82 txn/s, failed submission: 2.96 txn/s, expired: 2.96 txn/s, latency: 2752.91 ms, (p50: 2100 ms, p70: 2700, p90: 5400 ms, p99: 7000 ms), latency samples: 105920
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1031.45 txn/s, submitted: 1033.01 txn/s, failed submission: 1.56 txn/s, expired: 1.56 txn/s, latency: 2862.59 ms, (p50: 2300 ms, p70: 3000, p90: 5200 ms, p99: 8200 ms), latency samples: 92620
5. check swarm health
Compatibility test for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd passed
Upgrade the remaining nodes to version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1055.79 txn/s, submitted: 1058.23 txn/s, failed submission: 2.43 txn/s, expired: 2.43 txn/s, latency: 2779.46 ms, (p50: 2400 ms, p70: 3000, p90: 4800 ms, p99: 6700 ms), latency samples: 95400
Test Ok

vineethk changed the title ~~[compiler-v2] Variable window peephole optimization~~ [compiler-v2] Inefficient loads: window peephole optimization Sep 29, 2024

vineethk marked this pull request as ready for review September 29, 2024 21:38

vineethk requested review from brmataptos, fEst1ck, rahxephon89 and wrwg September 29, 2024 21:41

vineethk force-pushed the vk/variable-window-peephole branch from b9e661a to 17a143c Compare September 30, 2024 15:02

vineethk mentioned this pull request Sep 30, 2024

[compiler-v2] Test cases reduced from the framework showcasing need for stack optimizations #14800

Merged

2 tasks

brmataptos approved these changes Sep 30, 2024

View reviewed changes

[compiler-v2] Variable window peephole optimization

3c9937d

vineethk force-pushed the vk/variable-window-peephole branch from 17a143c to 3c9937d Compare October 1, 2024 20:56

rahxephon89 approved these changes Oct 1, 2024

View reviewed changes

vineethk changed the title ~~[compiler-v2] Inefficient loads: window peephole optimization~~ [compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization Oct 2, 2024

Merge branch 'main' into vk/variable-window-peephole

a48efa4

vineethk enabled auto-merge (squash) October 12, 2024 00:19

This comment has been minimized.

Sign in to view

vineethk merged commit 8199026 into main Oct 12, 2024
89 of 95 checks passed

vineethk deleted the vk/variable-window-peephole branch October 12, 2024 00:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization #14796

[compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization #14796

vineethk commented Sep 29, 2024 •

edited

Loading

trunk-io bot commented Sep 29, 2024 •

edited

Loading

vineethk commented Sep 29, 2024 •

edited

Loading

codecov bot commented Sep 29, 2024 •

edited

Loading

brmataptos left a comment

brmataptos Sep 30, 2024

vineethk Oct 1, 2024

brmataptos Sep 30, 2024

vineethk Oct 1, 2024

brmataptos Sep 30, 2024

vineethk Oct 1, 2024

brmataptos Sep 30, 2024

vineethk Oct 1, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Oct 12, 2024

github-actions bot commented Oct 12, 2024

github-actions bot commented Oct 12, 2024

[compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization #14796

[compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization #14796

Conversation

vineethk commented Sep 29, 2024 • edited Loading

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

Which Components or Systems Does This Change Impact?

trunk-io bot commented Sep 29, 2024 • edited Loading

vineethk commented Sep 29, 2024 • edited Loading

codecov bot commented Sep 29, 2024 • edited Loading

Codecov Report

brmataptos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Oct 12, 2024

✅ Forge suite compat success on 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd

github-actions bot commented Oct 12, 2024

✅ Forge suite realistic_env_max_load success on a48efa472ae83f7a4133ee47c22f094c616d8dcd

github-actions bot commented Oct 12, 2024

✅ Forge suite framework_upgrade success on 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd

vineethk commented Sep 29, 2024 •

edited

Loading

trunk-io bot commented Sep 29, 2024 •

edited

Loading

vineethk commented Sep 29, 2024 •

edited

Loading

codecov bot commented Sep 29, 2024 •

edited

Loading

✅ Forge suite `compat` success on `7eeba4cd15892717741a614add1afde004c7855f` ==> `a48efa472ae83f7a4133ee47c22f094c616d8dcd`

✅ Forge suite `realistic_env_max_load` success on `a48efa472ae83f7a4133ee47c22f094c616d8dcd`

✅ Forge suite `framework_upgrade` success on `7eeba4cd15892717741a614add1afde004c7855f` ==> `a48efa472ae83f7a4133ee47c22f094c616d8dcd`