Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor benchmark revamps #15127

Merged
merged 3 commits into from
Nov 7, 2024
Merged

Conversation

igor-aptos
Copy link
Contributor

Description

  • Separately reporting signature_verification and ledger_update stages.
  • changing "block execution time" from being VM_EXECUTE_BLOCK counter to BLOCK_EXECUTOR_EXECUTE_BLOCK - as it is counting BlockSTM + VM, instead of just VM. adding BLOCK_EXECUTOR_INNER_EXECUTE_BLOCK when needed better granularity.
  • Changed so that AptosVM is decoupled from BlockSTM. I.e. AptosVM doesn't implement TransactionBlockExecutor any more, but there is new AptosVMBlockExecutor. That allows for creating NativeVMBlockExecutor in a following PR. Allowing TransactionBlockExecutor to have state if needed, with having new() and &self argument.
  • fixed split_stages to split all pipeline stages, and for initial delay to only create transacitons, but not start the pipline (i.e. verification) beforehand.

Followup PR will introduce different native executors.

How Has This Been Tested?

performance benchmark

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Oct 30, 2024

⏱️ 9h 47m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 7h 48m 🟥🟥🟥🟥🟥 (+2 more)
execution-performance / test-target-determinator 26m 🟩🟩🟩🟩🟩 (+2 more)
test-target-determinator 19m 🟩🟩🟩🟩🟩
check-dynamic-deps 11m 🟩🟩🟩🟩🟩 (+3 more)
rust-move-tests 10m 🟩
rust-move-tests 9m 🟩
rust-cargo-deny 9m 🟩🟩🟩🟩🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩
general-lints 3m 🟩🟩🟩🟩🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 59s 🟩🟩🟩🟩🟩
file_change_determinator 54s 🟩🟩🟩🟩🟩
permission-check 24s 🟩🟩🟩🟩🟩 (+2 more)

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 1h 37m 16m +517%

settingsfeedbackdocs ⋅ learn more about trunk.io

@@ -204,7 +207,7 @@ pub fn new_test_context_inner(
rng,
root_key,
validator_owner,
Box::new(BlockExecutor::<AptosVM>::new(db_rw)),
Box::new(BlockExecutor::<AptosVMBlockExecutor>::new(db_rw)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore me: I like AptosVm, AptosVmBlockExecutor better, can't help saying it.

)),
));
vm_executor
.execute_and_state_checkpoint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe open and use this instead so we can know the txn succeeds and it works for the entire execution workflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I see you had the code to check the results commented out..

#[default]
AptosVMWithBlockSTM,
NativeLooseSpeculative,
PtxExecutor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it even still work, btw? :trollface:

transactions: ExecutableTransactions,
state_view: CachedStateView,
onchain_config: BlockExecutorConfigFromOnchain,
append_state_checkpoint_to_block: Option<HashValue>,
) -> Result<ExecutionOutput> {
let _timer = BLOCK_EXECUTOR_INNER_EXECUTE_BLOCK.start_timer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this measures almost exactly the same with BLOCK_EXECUTOR_EXECUTE_BLOCK excpet it measures only when the vm is AptosVM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for native VMs this is very different, but that is in a separate PR, so motivation here is not as clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without seeing your follow up PR, I feel you might want to measure at an even inner place?

-- DoGetExecutionOutput::* parses the VM raw output and will be doing the speculative state (not the smt) update soon.

enum BlockExecutorTypeOpt {
#[default]
AptosVMWithBlockSTM,
NativeLooseSpeculative,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say the meaning of "speculative" isn't clear here, and I don't have a better suggestion for now -_-

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the comments to all variants here, but I was surprised to see that current implementation was replacing BlockSTM as well, not just AptosVM - so wanted to be more clearer in name.

confusing name is better than name that suggests wrong thing - as folks will ask around if they don't understand it :)

but open to all name suggestions. These are all the names:

     AptosVMWithBlockSTM,
     NativeVMWithBlockSTM,
     NativeLooseSpeculative, 
     NativeValueCacheLooseSpeculative,
     NativeNoStorageLooseSpeculative,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please please please add documentation to all variants here: it is not clear at all without context what are those. Also, as a nit:
BlockSTMWith...VM reads better? Block executor which uses some particular VM impl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already in #15152, adding here as well

@igor-aptos igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from b6f1484 to e350fa7 Compare October 31, 2024 06:41
@@ -138,7 +138,7 @@ impl AptosTransactionOutput {
self.committed_output.get().unwrap()
}

fn take_output(mut self) -> TransactionOutput {
pub fn take_output(mut self) -> TransactionOutput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we should document these if we make these interface public because if output is not set, here we panic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this to a PR that uses it, and we can see there

@@ -13,6 +13,7 @@ const BLOCK_EXECUTION_TIME_BUCKETS: [f64; 16] = [
0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.0, 1.25, 1.5, 1.75, 2.0, 3.0, 4.0, 5.0,
];

// TODO - disambiguate against BLOCK_EXECUTOR_EXECUTE_BLOCK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more context? Issue number? Github username for person to fix this?

// metric name
"aptos_executor_block_executor_inner_execute_block_seconds",
// metric description
"The time spent in seconds of BlockExecutor inner block execution in Aptos executor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence just doesn't make sense?😂

}

impl BlockPreparationStage {
pub fn new(num_shards: usize, partitioner_config: &dyn PartitionerConfig) -> Self {
pub fn new(
sig_verify_num_threads: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: num_sig_verify_threads?

@@ -30,24 +30,24 @@ use std::{
#[derive(Debug, Derivative)]
#[derivative(Default)]
pub struct PipelineConfig {
pub delay_execution_start: bool,
pub delay_pipeline_start: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think it would be great to have comments about these and below?

// metric description
"The time spent in seconds of vm block execution in Aptos executor",
"The time spent in seconds of BlockExecutor block execution in Aptos executor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems duplicated with what we have in counters.rs?. The comment surely is (and is also not clear)

execution/executor/src/block_executor/mod.rs Show resolved Hide resolved
pub fn get_concurrency_level() -> usize {
match NATIVE_EXECUTOR_CONCURRENCY_LEVEL.get() {
Some(concurrency_level) => *concurrency_level,
None => 32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to cap it to number of cores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed default to 1. but otherwise - test can run with higher number of threads than cores, if useful

execution/executor-benchmark/src/lib.rs Show resolved Hide resolved
@igor-aptos igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from e350fa7 to 6bc2844 Compare November 5, 2024 21:09
@igor-aptos igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 6bc2844 to 5855472 Compare November 5, 2024 21:23
Base automatically changed from igor/event_v2_rust to main November 6, 2024 01:51
@igor-aptos igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 5855472 to 7486aa1 Compare November 6, 2024 06:19

This comment has been minimized.

This comment has been minimized.

@igor-aptos igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 122a6ca to cdf9d3a Compare November 7, 2024 20:12
@igor-aptos igor-aptos disabled auto-merge November 7, 2024 20:21

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@igor-aptos igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch 3 times, most recently from 449b401 to 88332b1 Compare November 7, 2024 21:41
@igor-aptos igor-aptos enabled auto-merge (squash) November 7, 2024 21:41

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@igor-aptos igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 88332b1 to 75cdcc5 Compare November 7, 2024 22:17

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Nov 7, 2024

✅ Forge suite realistic_env_max_load success on 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69

two traffics test: inner traffic : committed: 14451.02 txn/s, latency: 2749.31 ms, (p50: 2700 ms, p70: 2700, p90: 2900 ms, p99: 3300 ms), latency samples: 5494880
two traffics test : committed: 100.05 txn/s, latency: 1464.84 ms, (p50: 1400 ms, p70: 1400, p90: 1500 ms, p99: 8600 ms), latency samples: 1780
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.002, avg: 1.565", "ConsensusProposalToOrdered: max: 0.324, avg: 0.292", "ConsensusOrderedToCommit: max: 0.369, avg: 0.358", "ConsensusProposalToCommit: max: 0.658, avg: 0.650"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.90s no progress at version 2833450 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.61s no progress at version 2833448 (avg 8.61s) [limit 15].
Test Ok

Copy link
Contributor

github-actions bot commented Nov 7, 2024

✅ Forge suite framework_upgrade success on 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69

Compatibility test results for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 (PR)
Upgrade the nodes to version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1476.00 txn/s, submitted: 1481.07 txn/s, failed submission: 5.07 txn/s, expired: 5.07 txn/s, latency: 2068.08 ms, (p50: 1800 ms, p70: 2100, p90: 3000 ms, p99: 4400 ms), latency samples: 128020
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1315.68 txn/s, submitted: 1318.41 txn/s, failed submission: 2.72 txn/s, expired: 2.72 txn/s, latency: 2298.70 ms, (p50: 2100 ms, p70: 2400, p90: 3700 ms, p99: 5400 ms), latency samples: 115940
5. check swarm health
Compatibility test for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 passed
Upgrade the remaining nodes to version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1261.34 txn/s, submitted: 1263.17 txn/s, failed submission: 1.84 txn/s, expired: 1.84 txn/s, latency: 2471.02 ms, (p50: 2200 ms, p70: 2700, p90: 3900 ms, p99: 5100 ms), latency samples: 109960
Test Ok

Copy link
Contributor

github-actions bot commented Nov 7, 2024

✅ Forge suite compat success on 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69

Compatibility test results for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 (PR)
1. Check liveness of validators at old version: 1086a5e00d773704731ab84fb4ed3538613b2250
compatibility::simple-validator-upgrade::liveness-check : committed: 17406.25 txn/s, latency: 1933.23 ms, (p50: 1900 ms, p70: 2100, p90: 2200 ms, p99: 2300 ms), latency samples: 558720
2. Upgrading first Validator to new version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6668.44 txn/s, latency: 4283.37 ms, (p50: 5000 ms, p70: 5200, p90: 5300 ms, p99: 5400 ms), latency samples: 120860
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5752.37 txn/s, latency: 5319.77 ms, (p50: 5700 ms, p70: 5900, p90: 6300 ms, p99: 7200 ms), latency samples: 217620
3. Upgrading rest of first batch to new version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6093.09 txn/s, latency: 4634.30 ms, (p50: 5200 ms, p70: 5300, p90: 6000 ms, p99: 6200 ms), latency samples: 120940
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6439.96 txn/s, latency: 5005.09 ms, (p50: 5300 ms, p70: 5500, p90: 6900 ms, p99: 7200 ms), latency samples: 217640
4. upgrading second batch to new version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10456.43 txn/s, latency: 2643.84 ms, (p50: 2800 ms, p70: 3000, p90: 3500 ms, p99: 3900 ms), latency samples: 182640
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9294.04 txn/s, latency: 3412.61 ms, (p50: 3000 ms, p70: 3300, p90: 6800 ms, p99: 7800 ms), latency samples: 351400
5. check swarm health
Compatibility test for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 passed
Test Ok

@igor-aptos igor-aptos merged commit 0d53727 into main Nov 7, 2024
49 checks passed
@igor-aptos igor-aptos deleted the igor/executor_benchmark_setup_improvement branch November 7, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-execution-performance-full-test Run execution performance test (full version) CICD:run-execution-performance-test Run execution performance test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants