-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Executor] Merge sequential & parallel execution flow #4683
Conversation
just a quick comment as I am reviewing this, and it will be slow for me as it touches a lot of core stuff I have not seen before. Though a wonderful exercise, thanks! Also please review errors which look legit? |
@dariorussi - yes, if we like it, we should roll this in after main-net. Should be nothing breaking, more tests, common flow and should facilitate lots of future improvements (e.g. even just the virtue of having a single executor object for callbacks or what not). Will def fix all the linter and related errors. And (TODO to self): will also experiment with Dashset/Dashmap configuration settings (num shards mainly), we use defaults and maybe there is some perf to squeeze. |
e7ebf45
to
1511c1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
early comments, need to do another pass
f597fd3
to
c53ccd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are splitting this up in multiple PRs which is a very good idea, so love it and maybe you want to mark it somehow.
Just commenting given I have a small comment I think you may enjoy.
Will do ASAP. To document here, the plan is to do proptest changes in a separate diff. I will make sure to stack the proptest PR on top though and not land the first PR without ensuring the second one (with new and stronger tests) passes. |
dc6df84
to
e3cddb9
Compare
Split the PR, this is the first one that refactors the aptos-vm execution flow, removing unused status and merging sequential & parallel execution. Renamed parallel_executor -> block_executor as per @dariorussi 's suggestion, but since that's a lot of lines, made it a separate commit for the ease of reviewing. @runtian-zhou can you have a look now? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Lets give some more time for review from Move team members. (I know you filed this a while ago so sorry for no earlier attention.) |
I was never going to land this without @runtian-zhou 's approval, but appreciate more eyes. For context: But it all starts here, and once we have all executor flow nicely in the block_executor, opens up all the extension and gas hooks that we need for different outstanding tasks (can explain offline). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally seems to go into the right direction (that is unifying sequential/parallel block execution). We need a larger refactoring of the adapter (see comment below), but this PR can be an incremental step for this. Leaving detail review to folks more acquainted with PE.
} | ||
|
||
// Wrapper to avoid orphan rule | ||
pub(crate) struct AptosTransactionOutput(TransactionOutputExt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of avoiding the orphan rule which is there for a reason (and not a technical one). It makes perhaps notations a bit more convenient, but generally the code harder to understand. Not saying this needs to be differently done, but just an opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dislike this as well and tried to get rid of it, can solve some issues but the problem I couldn't easily overcome is a potential circular dependency between aptos-aggregator (where TransactionOutputExt is defined) and block-executor crates (that defines traits, and also uses aggregators).. Hopefully we will get to a place where we restructure this wrapper, or in general restructure the crates and factor out the common types (like we do currently at the aptos-core level, something similar maybe we could do the same at the aptos-vm or aptos-move).
|
||
pub(crate) struct AptosVMWrapper<'a, S> { | ||
vm: AptosVM, | ||
base_view: &'a S, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This S
is always a StateView
, right? Then this is really strange and the need for this wrapper type just demonstrates how broken the AptosVM is (and requires a larger overhaul). Because if you look at AptosVM
, you see that it wraps AptosVMImpl
, which in turn is created from a state view. That state view is then hidden inside of the storage adapter. The need to be able to get hand a StateView already created acrobatic code in other places.
This is probably another set of PRs after this one, but I really wish we could drastically simplify the architecture here:
- Only one
AptosVM
(no VMImpl, BlockVM, MoveVmExt, and other complications) - That
AptosVM
implements all the parallel and sequential execution logic. Multipleimpl AptosVM
split over files can help to tam the complexity.
Because parallel execution is build into our Move adapter there is really no need to maintain a code layer underneath without PE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be a great place to get to.
Next PR will help a bit with the StateView wrapping business, but it won't fully solve it - more like unify what we currently use (StateViewCache and VersionedView) to represent storage stateview + some writes from the block, and isolate it within the block_executor as implementation details (together with helping delta resolution also become an implementation detail) as a better temporary place. But I fully subscribe to revamping these boundaries as soon as possible and will try to make it more feasible with the current queue of changes.
Totally agreed, that's precisely the intention. There are some specific fixes, but regarding the overall refactoring aspect, my rationale at the moment is to try and make some incremental local simplifications / improvements while hopefully going in the good global direction (allow having a single "executor" that can be cleanly connected to other parts of aptos-vm, re-used across blocks, etc). Hopefully with some iteration (and there are indeed 2/3 PRs coming right after), we will also start seeing the big picture better. One heuristic for me for now is to push some pieces of logic to the block_executor side (e.g. in the next PR, the StateView wrapper business currently done in different ways in aptos-vm), since that code is newer and should have more fixed structure / flow. However, then we should absolutely look into precisely the question of how the block executor is incorporated into aptos_vm. For some context, I believe the current state of affairs is due to two things - @runtian-zhou can confirm or deny as the author and the ultimate expert on the wrapper flow: |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The way how parallel executor is structured like this is exactly what @gelash suggests. The whole intention of this
The testing aspect is probably more important in this context cuz it's generally quite hard to test parallel code like this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost looks good to me! Agreed with @wrwg that we need more refactoring here but this is a great step moving forward!
@@ -3,9 +3,6 @@ | |||
|
|||
#[derive(Debug, PartialEq, Eq)] | |||
pub enum Error<E> { | |||
/// Invariant violation that happens internally inside of scheduler, usually an indication of | |||
/// implementation error. | |||
InvariantViolation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this error removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was never used - happy to bring it back whenever needed.
) -> Result< | ||
( | ||
Vec<E::Output>, | ||
OutputDeltaResolver<<T as Transaction>::Key, <T as Transaction>::Value>, | ||
), | ||
E::Error, | ||
> { | ||
assert!(self.concurrency_level > 1, "Must use sequential execution"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was wondering if those assertion would fail in production as they are a good source for the network availability attack. Would it be better to return an error when such condition is violated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, must be true due to
https://github.com/aptos-labs/aptos-core/blob/e197a64f990b349b888eec4624c1adf945d0ef67/aptos-move/aptos-vm/src/block_executor/mod.rs#L150, and second follow-up PR moves this dispatching inside block_executor crate:
let mut ret = if self.concurrency_level > 1 { |
If you really think we should return an error here, let me know what kind of error and what should happen in that case, we skip the whole block? But it will complicate some places and we should worry about determinism and probably not worth for some invariant that trivially should never happen.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
* Merge sequential and parallel flows * rename parallel to block
De-spagettify the aptos-vm execution flow
Benefits:
EDIT: These are coming in a different diff.
Additional improvements to tests:
- Test Over / Underflows based on storage values (close to 0 or close to u128::MAX)
- test different number of cores
- Refactors: ModulePath enum, isolate BaselineState for generating baseline with different configurations (i.e. aggregator values materialized or not), check delta sequence, etc.
- Make sure StorageError precedence over DeltaApplicationFailure in speculative executions when aggregator is deleted but deltas on top also fail. Not relevant for current use-case (no under/over flows and deletes), also also the proper algorithm for handling general case may not have the issue anyway.
After the testing PR, @perryjrandall we should start running on different platforms too.