Executor benchmark revamps #15127

igor-aptos · 2024-10-30T20:56:03Z

Description

Separately reporting signature_verification and ledger_update stages.
changing "block execution time" from being VM_EXECUTE_BLOCK counter to BLOCK_EXECUTOR_EXECUTE_BLOCK - as it is counting BlockSTM + VM, instead of just VM. adding BLOCK_EXECUTOR_INNER_EXECUTE_BLOCK when needed better granularity.
Changed so that AptosVM is decoupled from BlockSTM. I.e. AptosVM doesn't implement TransactionBlockExecutor any more, but there is new AptosVMBlockExecutor. That allows for creating NativeVMBlockExecutor in a following PR. Allowing TransactionBlockExecutor to have state if needed, with having new() and &self argument.
fixed split_stages to split all pipeline stages, and for initial delay to only create transacitons, but not start the pipline (i.e. verification) beforehand.

Followup PR will introduce different native executors.

How Has This Been Tested?

performance benchmark

Key Areas to Review

Type of Change

Which Components or Systems Does This Change Impact?

Checklist

I have read and followed the CONTRIBUTING doc
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I identified and added all stakeholders and component owners affected by this change as reviewers
I tested both happy and unhappy path of the functionality
I have made corresponding changes to the documentation

trunk-io · 2024-10-30T20:56:07Z

⏱️ 9h 47m total CI duration on this PR

Slowest 15 Jobs	Cumulative Duration	Recent Runs
execution-performance / single-node-performance	7h 48m	🟥 🟥 🟥 🟥 🟥 (+2 more)
execution-performance / test-target-determinator	26m	🟩 🟩 🟩 🟩 🟩 (+2 more)
test-target-determinator	19m	🟩 🟩 🟩 🟩 🟩
check-dynamic-deps	11m	🟩 🟩 🟩 🟩 🟩 (+3 more)
rust-move-tests	10m	🟩
rust-move-tests	9m	🟩
rust-cargo-deny	9m	🟩 🟩 🟩 🟩 🟩
rust-move-tests	9m	🟩
rust-move-tests	9m	🟩
rust-move-tests	9m	🟩
general-lints	3m	🟩 🟩 🟩 🟩 🟩
semgrep/ci	2m	🟩 🟩 🟩 🟩 🟩 (+2 more)
file_change_determinator	59s	🟩 🟩 🟩 🟩 🟩
file_change_determinator	54s	🟩 🟩 🟩 🟩 🟩
permission-check	24s	🟩 🟩 🟩 🟩 🟩 (+2 more)

🚨 1 job on the last run was significantly faster/slower than expected

Job	Duration	vs 7d avg	Delta
execution-performance / single-node-performance	1h 37m	16m

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

msmouse · 2024-10-30T21:24:14Z

api/test-context/src/test_context.rs

@@ -204,7 +207,7 @@ pub fn new_test_context_inner(
        rng,
        root_key,
        validator_owner,
-        Box::new(BlockExecutor::<AptosVM>::new(db_rw)),
+        Box::new(BlockExecutor::<AptosVMBlockExecutor>::new(db_rw)),


ignore me: I like AptosVm, AptosVmBlockExecutor better, can't help saying it.

msmouse · 2024-10-30T21:38:41Z

execution/executor-benchmark/src/lib.rs

+                )),
+            ));
+            vm_executor
+                .execute_and_state_checkpoint(


Maybe open and use this instead so we can know the txn succeeds and it works for the entire execution workflow?

aptos-core/execution/executor-types/src/lib.rs

Line 130 in c44f1a3

fn execute_block(

okay I see you had the code to check the results commented out..

msmouse · 2024-10-30T21:41:41Z

execution/executor-benchmark/src/main.rs

+    #[default]
+    AptosVMWithBlockSTM,
+    NativeLooseSpeculative,
+    PtxExecutor,


does it even still work, btw?

msmouse · 2024-10-30T21:58:31Z

execution/executor/src/block_executor/mod.rs

        transactions: ExecutableTransactions,
        state_view: CachedStateView,
        onchain_config: BlockExecutorConfigFromOnchain,
        append_state_checkpoint_to_block: Option<HashValue>,
    ) -> Result<ExecutionOutput> {
+        let _timer = BLOCK_EXECUTOR_INNER_EXECUTE_BLOCK.start_timer();


looks like this measures almost exactly the same with BLOCK_EXECUTOR_EXECUTE_BLOCK excpet it measures only when the vm is AptosVM?

for native VMs this is very different, but that is in a separate PR, so motivation here is not as clear

Without seeing your follow up PR, I feel you might want to measure at an even inner place?

-- DoGetExecutionOutput::* parses the VM raw output and will be doing the speculative state (not the smt) update soon.

msmouse · 2024-10-30T22:03:58Z

execution/executor-benchmark/src/main.rs

+enum BlockExecutorTypeOpt {
+    #[default]
+    AptosVMWithBlockSTM,
+    NativeLooseSpeculative,


I have to say the meaning of "speculative" isn't clear here, and I don't have a better suggestion for now -_-

I'll add the comments to all variants here, but I was surprised to see that current implementation was replacing BlockSTM as well, not just AptosVM - so wanted to be more clearer in name.

confusing name is better than name that suggests wrong thing - as folks will ask around if they don't understand it :)

but open to all name suggestions. These are all the names:

AptosVMWithBlockSTM, NativeVMWithBlockSTM, NativeLooseSpeculative, NativeValueCacheLooseSpeculative, NativeNoStorageLooseSpeculative,

Please please please add documentation to all variants here: it is not clear at all without context what are those. Also, as a nit:
BlockSTMWith...VM reads better? Block executor which uses some particular VM impl?

already in #15152, adding here as well

georgemitenkov · 2024-11-02T12:51:44Z

aptos-move/aptos-vm/src/block_executor/mod.rs

@@ -138,7 +138,7 @@ impl AptosTransactionOutput {
        self.committed_output.get().unwrap()
    }

-    fn take_output(mut self) -> TransactionOutput {
+    pub fn take_output(mut self) -> TransactionOutput {


nit: maybe we should document these if we make these interface public because if output is not set, here we panic?

I'll move this to a PR that uses it, and we can see there

georgemitenkov · 2024-11-02T12:52:13Z

aptos-move/aptos-vm/src/counters.rs

@@ -13,6 +13,7 @@ const BLOCK_EXECUTION_TIME_BUCKETS: [f64; 16] = [
    0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 1.0, 1.25, 1.5, 1.75, 2.0, 3.0, 4.0, 5.0,
 ];

+// TODO - disambiguate against BLOCK_EXECUTOR_EXECUTE_BLOCK


Add more context? Issue number? Github username for person to fix this?

georgemitenkov · 2024-11-02T12:52:59Z

aptos-move/block-executor/src/counters.rs

+        // metric name
+        "aptos_executor_block_executor_inner_execute_block_seconds",
+        // metric description
+        "The time spent in seconds of BlockExecutor inner block execution in Aptos executor",


This sentence just doesn't make sense?😂

execution/executor-benchmark/src/block_preparation.rs

georgemitenkov · 2024-11-02T12:56:34Z

execution/executor-benchmark/src/block_preparation.rs

 }

 impl BlockPreparationStage {
-    pub fn new(num_shards: usize, partitioner_config: &dyn PartitionerConfig) -> Self {
+    pub fn new(
+        sig_verify_num_threads: usize,


super nit: num_sig_verify_threads?

georgemitenkov · 2024-11-02T13:03:48Z

execution/executor-benchmark/src/pipeline.rs

@@ -30,24 +30,24 @@ use std::{
 #[derive(Debug, Derivative)]
 #[derivative(Default)]
 pub struct PipelineConfig {
-    pub delay_execution_start: bool,
+    pub delay_pipeline_start: bool,


I really think it would be great to have comments about these and below?

georgemitenkov · 2024-11-02T13:05:17Z

execution/executor/src/metrics.rs

        // metric description
-        "The time spent in seconds of vm block execution in Aptos executor",
+        "The time spent in seconds of BlockExecutor block execution in Aptos executor",


This seems duplicated with what we have in counters.rs?. The comment surely is (and is also not clear)

execution/executor/src/block_executor/mod.rs

georgemitenkov · 2024-11-02T13:09:00Z

execution/executor-benchmark/src/native/native_config.rs

+    pub fn get_concurrency_level() -> usize {
+        match NATIVE_EXECUTOR_CONCURRENCY_LEVEL.get() {
+            Some(concurrency_level) => *concurrency_level,
+            None => 32,


We probably want to cap it to number of cores?

changed default to 1. but otherwise - test can run with higher number of threads than cores, if useful

execution/executor-benchmark/src/lib.rs

github-actions · 2024-11-07T22:43:58Z

✅ Forge suite `realistic_env_max_load` success on `75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69`

two traffics test: inner traffic : committed: 14451.02 txn/s, latency: 2749.31 ms, (p50: 2700 ms, p70: 2700, p90: 2900 ms, p99: 3300 ms), latency samples: 5494880
two traffics test : committed: 100.05 txn/s, latency: 1464.84 ms, (p50: 1400 ms, p70: 1400, p90: 1500 ms, p99: 8600 ms), latency samples: 1780
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.002, avg: 1.565", "ConsensusProposalToOrdered: max: 0.324, avg: 0.292", "ConsensusOrderedToCommit: max: 0.369, avg: 0.358", "ConsensusProposalToCommit: max: 0.658, avg: 0.650"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.90s no progress at version 2833450 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.61s no progress at version 2833448 (avg 8.61s) [limit 15].
Test Ok

github-actions · 2024-11-07T22:45:30Z

✅ Forge suite `framework_upgrade` success on `1086a5e00d773704731ab84fb4ed3538613b2250` ==> `75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69`

Compatibility test results for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 (PR)
Upgrade the nodes to version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1476.00 txn/s, submitted: 1481.07 txn/s, failed submission: 5.07 txn/s, expired: 5.07 txn/s, latency: 2068.08 ms, (p50: 1800 ms, p70: 2100, p90: 3000 ms, p99: 4400 ms), latency samples: 128020
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1315.68 txn/s, submitted: 1318.41 txn/s, failed submission: 2.72 txn/s, expired: 2.72 txn/s, latency: 2298.70 ms, (p50: 2100 ms, p70: 2400, p90: 3700 ms, p99: 5400 ms), latency samples: 115940
5. check swarm health
Compatibility test for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 passed
Upgrade the remaining nodes to version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1261.34 txn/s, submitted: 1263.17 txn/s, failed submission: 1.84 txn/s, expired: 1.84 txn/s, latency: 2471.02 ms, (p50: 2200 ms, p70: 2700, p90: 3900 ms, p99: 5100 ms), latency samples: 109960
Test Ok

github-actions · 2024-11-07T22:46:45Z

✅ Forge suite `compat` success on `1086a5e00d773704731ab84fb4ed3538613b2250` ==> `75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69`

Compatibility test results for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 (PR)
1. Check liveness of validators at old version: 1086a5e00d773704731ab84fb4ed3538613b2250
compatibility::simple-validator-upgrade::liveness-check : committed: 17406.25 txn/s, latency: 1933.23 ms, (p50: 1900 ms, p70: 2100, p90: 2200 ms, p99: 2300 ms), latency samples: 558720
2. Upgrading first Validator to new version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6668.44 txn/s, latency: 4283.37 ms, (p50: 5000 ms, p70: 5200, p90: 5300 ms, p99: 5400 ms), latency samples: 120860
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5752.37 txn/s, latency: 5319.77 ms, (p50: 5700 ms, p70: 5900, p90: 6300 ms, p99: 7200 ms), latency samples: 217620
3. Upgrading rest of first batch to new version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6093.09 txn/s, latency: 4634.30 ms, (p50: 5200 ms, p70: 5300, p90: 6000 ms, p99: 6200 ms), latency samples: 120940
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6439.96 txn/s, latency: 5005.09 ms, (p50: 5300 ms, p70: 5500, p90: 6900 ms, p99: 7200 ms), latency samples: 217640
4. upgrading second batch to new version: 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10456.43 txn/s, latency: 2643.84 ms, (p50: 2800 ms, p70: 3000, p90: 3500 ms, p99: 3900 ms), latency samples: 182640
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9294.04 txn/s, latency: 3412.61 ms, (p50: 3000 ms, p70: 3300, p90: 6800 ms, p99: 7800 ms), latency samples: 351400
5. check swarm health
Compatibility test for 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69 passed
Test Ok

igor-aptos added CICD:run-execution-performance-test Run execution performance test CICD:run-execution-performance-full-test Run execution performance test (full version) labels Oct 30, 2024

igor-aptos requested review from msmouse, grao1991, manudhundi and sitalkedia October 30, 2024 20:56

igor-aptos requested review from lightmark, zekun000, sasha8, ibalajiarun, davidiw, wrwg, vgao1996, georgemitenkov, banool, gregnazario and 0xmaayan as code owners October 30, 2024 20:56

msmouse approved these changes Oct 30, 2024

View reviewed changes

igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from b6f1484 to e350fa7 Compare October 31, 2024 06:41

igor-aptos requested review from gelash and danielxiangzl as code owners October 31, 2024 06:41

georgemitenkov reviewed Nov 2, 2024

View reviewed changes

igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from e350fa7 to 6bc2844 Compare November 5, 2024 21:09

igor-aptos force-pushed the igor/event_v2_rust branch from eded498 to cc94b3b Compare November 5, 2024 21:23

igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 6bc2844 to 5855472 Compare November 5, 2024 21:23

igor-aptos force-pushed the igor/event_v2_rust branch from cc94b3b to b905dd7 Compare November 6, 2024 01:18

Base automatically changed from igor/event_v2_rust to main November 6, 2024 01:51

igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 5855472 to 7486aa1 Compare November 6, 2024 06:19

igor-aptos requested a review from georgemitenkov November 6, 2024 06:20

This comment has been minimized.

Sign in to view

igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 122a6ca to cdf9d3a Compare November 7, 2024 20:12

igor-aptos disabled auto-merge November 7, 2024 20:21

This comment has been minimized.

Sign in to view

igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch 3 times, most recently from 449b401 to 88332b1 Compare November 7, 2024 21:41

igor-aptos enabled auto-merge (squash) November 7, 2024 21:41

This comment has been minimized.

Sign in to view

update recalibration

75cdcc5

igor-aptos force-pushed the igor/executor_benchmark_setup_improvement branch from 88332b1 to 75cdcc5 Compare November 7, 2024 22:17

This comment has been minimized.

Sign in to view

igor-aptos merged commit 0d53727 into main Nov 7, 2024
49 checks passed

igor-aptos deleted the igor/executor_benchmark_setup_improvement branch November 7, 2024 23:01

Executor benchmark revamps #15127

Executor benchmark revamps #15127

Conversation

igor-aptos commented Oct 30, 2024

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

Which Components or Systems Does This Change Impact?

Checklist

trunk-io bot commented Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Nov 7, 2024

✅ Forge suite realistic_env_max_load success on 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69

github-actions bot commented Nov 7, 2024

✅ Forge suite framework_upgrade success on 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69

github-actions bot commented Nov 7, 2024

✅ Forge suite compat success on 1086a5e00d773704731ab84fb4ed3538613b2250 ==> 75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69

trunk-io bot commented Oct 30, 2024 •

edited

Loading

✅ Forge suite `realistic_env_max_load` success on `75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69`

✅ Forge suite `framework_upgrade` success on `1086a5e00d773704731ab84fb4ed3538613b2250` ==> `75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69`

✅ Forge suite `compat` success on `1086a5e00d773704731ab84fb4ed3538613b2250` ==> `75cdcc5e9d340b8ca863e7c6b5efc3efeb4b4c69`