-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compiler-v2] [waiting for comparison testing before merging] Inefficient loads: window peephole optimization #14796
Conversation
⏱️ 1h 29m total CI duration on this PR
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14796 +/- ##
=======================================
Coverage 60.1% 60.1%
=======================================
Files 856 856
Lines 211013 211019 +6
=======================================
+ Hits 126824 126830 +6
Misses 84189 84189 ☔ View full report in Codecov by Sentry. |
b9e661a
to
17a143c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a comment, otherwise, good quick solution.
|
||
impl InefficientLoads { | ||
// We need at least 3 instructions, corresponding to points 1, 2, and 4 in the pattern | ||
// described in the module documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module documentation --> file documentation, above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed the module documentation (made using //!
at the beginning of the file).
I'll add saying "at the top of the file" to clarify, but we have used this terminology elsewhere in the code base in the same sense as here.
} | ||
|
||
impl WindowOptimizer for InefficientLoads { | ||
fn optimize_window(&self, window: &[Bytecode]) -> Option<(Vec<Bytecode>, usize)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this function. The Vec
seems to be a new copy of the window which is optimized. What is the usize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is an implementation of a trait method (WindowOptimizer::optimize_window
), which contains the documentation:
/// Given a `window` of bytecode, return a tuple containing:
/// 1. an optimized version of a non-empty prefix of the `window`.
/// 2. size of this prefix (should be non-zero).
/// If `None` is returned, the `window` is not optimized.
I don't think it makes sense to copy paste this documentation of the trait method into every implementation: this is not the style followed either in our repo or generally in Rust or other languages I have worked with. So I am leaving this as is.
|
||
impl<T: FixedWindowOptimizer> BasicBlockOptimizer for FixedWindowProcessor<T> { | ||
impl<T: WindowOptimizer> BasicBlockOptimizer for WindowProcessor<T> { | ||
fn optimize(&self, block: &[Bytecode]) -> Vec<Bytecode> { | ||
let mut old_block = block.to_vec(); | ||
// Run single passes until code stops changing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems likely this can be optimized, but we can wait and see how slow it is first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Right now, there does not seem to be any statistical difference between compilation with this on and off, when compiling the aptos framework and its dependencies.
0: LdU64(1) | ||
1: LdU64(2) | ||
2: LdU64(3) | ||
3: CopyLoc[0](Arg0: &S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should fix the bytecode generator. This kind of thing doesn't look easily amenable to peephole opts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this test to showcase a case where just the peephole won't be sufficient, as you point out. I have a follow up PR where I have reduced a bunch of remaining cases from the aptos-framework where we have optimizable code-gen behavior.
17a143c
to
3c9937d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
Description
Closes #14169.
Note: this PR will not be merged until appropriate comparison testing is carried out on this branch.
In this PR, we implement a peephole optimization for addressing inefficient loads generated (load constant that is immediately stored, and then later moved back to the stack). This optimization subsumes a previous optimization, which has been removed.
With this optimization, when compiling the
aptos-framework
, the difference between number of instructions generated by v1 vs. v2 is down to an increase of 0.7%.This PR also includes some refactoring in the peephole optimization traits, and some small changes to the output of the instruction count comparison script.
How Has This Been Tested?
opt_load_01.move
,test03
goes from 34 instructions to 14).Key Areas to Review
Type of Change
Which Components or Systems Does This Change Impact?