Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization #14796

Merged
merged 2 commits into from
Oct 12, 2024

Conversation

vineethk
Copy link
Contributor

@vineethk vineethk commented Sep 29, 2024

Description

Closes #14169.

Note: this PR will not be merged until appropriate comparison testing is carried out on this branch.

In this PR, we implement a peephole optimization for addressing inefficient loads generated (load constant that is immediately stored, and then later moved back to the stack). This optimization subsumes a previous optimization, which has been removed.

With this optimization, when compiling the aptos-framework, the difference between number of instructions generated by v1 vs. v2 is down to an increase of 0.7%.

This PR also includes some refactoring in the peephole optimization traits, and some small changes to the output of the instruction count comparison script.

How Has This Been Tested?

  • Existing tests pass, baselines have been verified to confirm that there is no regression in terms of number of instructions generated, only improvements.
  • A few tests have been added to show case improvements due to this optimization, which can be quite significant in some cases (eg., in opt_load_01.move, test03 goes from 34 instructions to 14).

Key Areas to Review

  • Correctness of the optimization.

Type of Change

  • Performance improvement

Which Components or Systems Does This Change Impact?

  • Move Compiler

Copy link

trunk-io bot commented Sep 29, 2024

⏱️ 1h 29m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-move-unit-coverage 17m 🟩
rust-move-unit-coverage 15m 🟩
rust-move-unit-coverage 11m 🟩
rust-move-tests 10m 🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩
check-dynamic-deps 7m 🟩🟩🟩
rust-cargo-deny 6m 🟩🟩🟩
general-lints 2m 🟩🟩🟩
semgrep/ci 1m 🟩🟩🟩
file_change_determinator 35s 🟩🟩🟩
permission-check 12s 🟩🟩🟩
permission-check 8s 🟩🟩🟩
permission-check 7s 🟩🟩🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link
Contributor Author

vineethk commented Sep 29, 2024

Copy link

codecov bot commented Sep 29, 2024

Codecov Report

Attention: Patch coverage is 97.77778% with 1 line in your changes missing coverage. Please review.

Project coverage is 60.1%. Comparing base (55e368c) to head (a48efa4).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...at_generator/peephole_optimizer/reducible_pairs.rs 92.3% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14796   +/-   ##
=======================================
  Coverage    60.1%    60.1%           
=======================================
  Files         856      856           
  Lines      211013   211019    +6     
=======================================
+ Hits       126824   126830    +6     
  Misses      84189    84189           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vineethk vineethk changed the title [compiler-v2] Variable window peephole optimization [compiler-v2] Inefficient loads: window peephole optimization Sep 29, 2024
@vineethk vineethk marked this pull request as ready for review September 29, 2024 21:38
@vineethk vineethk force-pushed the vk/variable-window-peephole branch from b9e661a to 17a143c Compare September 30, 2024 15:02
Copy link
Contributor

@brmataptos brmataptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a comment, otherwise, good quick solution.


impl InefficientLoads {
// We need at least 3 instructions, corresponding to points 1, 2, and 4 in the pattern
// described in the module documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module documentation --> file documentation, above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed the module documentation (made using //! at the beginning of the file).

I'll add saying "at the top of the file" to clarify, but we have used this terminology elsewhere in the code base in the same sense as here.

}

impl WindowOptimizer for InefficientLoads {
fn optimize_window(&self, window: &[Bytecode]) -> Option<(Vec<Bytecode>, usize)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this function. The Vec seems to be a new copy of the window which is optimized. What is the usize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is an implementation of a trait method (WindowOptimizer::optimize_window), which contains the documentation:

    /// Given a `window` of bytecode, return a tuple containing:
    ///   1. an optimized version of a non-empty prefix of the `window`.
    ///   2. size of this prefix (should be non-zero).
    /// If `None` is returned, the `window` is not optimized.

I don't think it makes sense to copy paste this documentation of the trait method into every implementation: this is not the style followed either in our repo or generally in Rust or other languages I have worked with. So I am leaving this as is.


impl<T: FixedWindowOptimizer> BasicBlockOptimizer for FixedWindowProcessor<T> {
impl<T: WindowOptimizer> BasicBlockOptimizer for WindowProcessor<T> {
fn optimize(&self, block: &[Bytecode]) -> Vec<Bytecode> {
let mut old_block = block.to_vec();
// Run single passes until code stops changing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems likely this can be optimized, but we can wait and see how slow it is first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Right now, there does not seem to be any statistical difference between compilation with this on and off, when compiling the aptos framework and its dependencies.

0: LdU64(1)
1: LdU64(2)
2: LdU64(3)
3: CopyLoc[0](Arg0: &S)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should fix the bytecode generator. This kind of thing doesn't look easily amenable to peephole opts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test to showcase a case where just the peephole won't be sufficient, as you point out. I have a follow up PR where I have reduced a bunch of remaining cases from the aptos-framework where we have optimizable code-gen behavior.

@vineethk vineethk force-pushed the vk/variable-window-peephole branch from 17a143c to 3c9937d Compare October 1, 2024 20:56
@vineethk vineethk changed the title [compiler-v2] Inefficient loads: window peephole optimization [compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization Oct 2, 2024
@vineethk vineethk enabled auto-merge (squash) October 12, 2024 00:19

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd

Compatibility test results for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd (PR)
1. Check liveness of validators at old version: 7eeba4cd15892717741a614add1afde004c7855f
compatibility::simple-validator-upgrade::liveness-check : committed: 13126.88 txn/s, latency: 2199.34 ms, (p50: 1800 ms, p70: 2000, p90: 2300 ms, p99: 15700 ms), latency samples: 513840
2. Upgrading first Validator to new version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7166.57 txn/s, latency: 3962.48 ms, (p50: 4500 ms, p70: 4700, p90: 4900 ms, p99: 4900 ms), latency samples: 133880
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6077.90 txn/s, latency: 5172.32 ms, (p50: 5400 ms, p70: 5500, p90: 6900 ms, p99: 7200 ms), latency samples: 228560
3. Upgrading rest of first batch to new version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7126.95 txn/s, latency: 3921.06 ms, (p50: 4300 ms, p70: 4600, p90: 4900 ms, p99: 5000 ms), latency samples: 132300
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7177.69 txn/s, latency: 4458.73 ms, (p50: 4500 ms, p70: 5000, p90: 6100 ms, p99: 6300 ms), latency samples: 244180
4. upgrading second batch to new version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11384.09 txn/s, latency: 2386.74 ms, (p50: 2500 ms, p70: 2700, p90: 3300 ms, p99: 3700 ms), latency samples: 199320
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11411.41 txn/s, latency: 2703.00 ms, (p50: 2600 ms, p70: 2800, p90: 3800 ms, p99: 5200 ms), latency samples: 370280
5. check swarm health
Compatibility test for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on a48efa472ae83f7a4133ee47c22f094c616d8dcd

two traffics test: inner traffic : committed: 13621.66 txn/s, latency: 2921.54 ms, (p50: 2700 ms, p70: 3000, p90: 3200 ms, p99: 3600 ms), latency samples: 5179220
two traffics test : committed: 100.00 txn/s, latency: 2592.31 ms, (p50: 2500 ms, p70: 2600, p90: 2900 ms, p99: 7100 ms), latency samples: 1760
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.233, avg: 0.218", "QsPosToProposal: max: 0.309, avg: 0.252", "ConsensusProposalToOrdered: max: 0.318, avg: 0.299", "ConsensusOrderedToCommit: max: 0.514, avg: 0.474", "ConsensusProposalToCommit: max: 0.815, avg: 0.773"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.84s no progress at version 2668000 (avg 0.21s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.52s no progress at version 2667998 (avg 8.52s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd

Compatibility test results for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd (PR)
Upgrade the nodes to version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1206.86 txn/s, submitted: 1209.82 txn/s, failed submission: 2.96 txn/s, expired: 2.96 txn/s, latency: 2752.91 ms, (p50: 2100 ms, p70: 2700, p90: 5400 ms, p99: 7000 ms), latency samples: 105920
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1031.45 txn/s, submitted: 1033.01 txn/s, failed submission: 1.56 txn/s, expired: 1.56 txn/s, latency: 2862.59 ms, (p50: 2300 ms, p70: 3000, p90: 5200 ms, p99: 8200 ms), latency samples: 92620
5. check swarm health
Compatibility test for 7eeba4cd15892717741a614add1afde004c7855f ==> a48efa472ae83f7a4133ee47c22f094c616d8dcd passed
Upgrade the remaining nodes to version: a48efa472ae83f7a4133ee47c22f094c616d8dcd
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1055.79 txn/s, submitted: 1058.23 txn/s, failed submission: 2.43 txn/s, expired: 2.43 txn/s, latency: 2779.46 ms, (p50: 2400 ms, p70: 3000, p90: 4800 ms, p99: 6700 ms), latency samples: 95400
Test Ok

@vineethk vineethk merged commit 8199026 into main Oct 12, 2024
89 of 95 checks passed
@vineethk vineethk deleted the vk/variable-window-peephole branch October 12, 2024 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request][compiler-v2] Extend peephole optimizations from fixed window to arbitrary length
3 participants