[Parallel Executor] Rolling Commit #6079

danielxiangzl · 2023-01-05T19:21:14Z

This PR implements the methodology called rolling commit to track the prefix of committed transaction in the parallel execution accurately, without introducing much overhead compared to the current lazy commit approach which can only commit all the transactions together.

Here are some p2p benchmark numbers compared to the previous commit method (lazy commit of the whole block) on my MBP with 10 cores/threads. The time unit is milliseconds. Note that the numbers here can be noisy and better experiments can be done if needed. Two versions of rolling commit protect txn status with Mutex and RwLock, respectively.

(# account, # txns)	(100, 1k)	(100, 10k)	(100, 50k)	(1k, 1k)	(1k, 10k)	(1k, 50k)
Lazy cmt	57	301	1318	52	257	1145
Rolling cmt w. Mutex	60	327	1480	55	299	1284
Rolling cmt w. RwLock	52	284	1271	50	249	1098
PR #5948	60	335	1525	56	314	1412

UPDATE
p2p benchmark numbers compared to the previous commit method (lazy commit of the whole block) on AWS ubuntu c5a.16xlarge instance. The time unit is milliseconds.

With 8 threads

(# account, # txns)	(100, 1k)	(100, 10k)	(1k, 1k)	(1k, 10k)
Lazy cmt	81	562	76	496
Rolling cmt	82	558	75	494

With 16 threads

(# account, # txns)	(100, 1k)	(100, 10k)	(1k, 1k)	(1k, 10k)
Lazy cmt	64	375	57	291
Rolling cmt	64	372	57	289

Broader Context
Rolling commit is needed for per-block limit & aggregator. We have a dedicated-thread solution implemented in this PR #5948 (which has additional validation overhead), but this is an attempt to remove the trade-offs and get the rolling commit almost for free. The idea is to track the waves of validations, and know when the last one happens.

gelash · 2023-01-11T02:02:43Z

Awesome!

Before a full review, a few points we may want to consider before landing:

whether to yield (std::hint) and when
can committing thread when there are no tasks starve other threads out of locks for non negligible time (fairness)
try_lock (but likely why not)
can required_wave be simplified or removed (I think not)

Otherwise, as noted, let's try to add a few extra unit / proptests that stress the committing logic

…ptos-core into daniel-rolling-commit-debug

gelash

Let's make sure try_commit does not starve other threads.
We can use parking lot rwlock for execution and validation statuses - please adjust the uses based on whether we only require a read, or whether we also update the respective statuses.
Another way to use it is to grab read lock (upgradable) first, and then upgrade it if writing becomes necessary, I suggested this flow for the try_commit execution status, as an example, not sure if it's useful elsewhere.

We can then also revert changes to mutex (no need to wrap try_lock, we won't be using it on aptos infallable).

aptos-move/block-executor/src/scheduler.rs

gelash · 2023-01-20T15:01:16Z

Let's make sure try_commit does not starve other threads. We can use parking lot rwlock for execution and validation statuses - please adjust the uses based on whether we only require a read, or whether we also update the respective statuses. Another way to use it is to grab read lock (upgradable) first, and then upgrade it if writing becomes necessary, I suggested this flow for the try_commit execution status, as an example, not sure if it's useful elsewhere.

We can then also revert changes to mutex (no need to wrap try_lock, we won't be using it on aptos infallable).

and let's regenerate those numbers make sure we don't lose performance somehow (we shouldn't)

zekun000

beautiful

zekun000 · 2023-01-27T14:22:08Z

aptos-move/block-executor/src/scheduler.rs

        Arc, Condvar,
    },
 };

+const TXN_IDX_MASK: u64 = (1 << 32) - 1;
+
 // Type aliases.
 pub type TxnIndex = usize;


should we change this to u32? it seems cleaner for the validator index use

Since TxnIndex is also used in many other places other than validation_idx, I will keep the current one.

I am merging the definitions in my diff (MVHashMap redefined and a TODO anyway), and I can try to redefine it there.

aptos-move/block-executor/src/scheduler.rs

sasha8 · 2023-01-27T22:34:46Z

aptos-move/block-executor/src/scheduler.rs

+
+        if let Some(validation_status) = self.txn_status[*commit_idx].1.try_read() {
+            // Acquired the validation status lock, now try the status lock.
+            if let Some(status) = self.txn_status[*commit_idx].0.try_upgradable_read() {


Nit: similar to Zekun's comment, .0 and .1 are not very readable.

Added comments.

github-actions · 2023-01-28T16:08:29Z

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `15c7a147ac567f39763553b12e9fdeee7e43eabf`

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 15c7a147ac567f39763553b12e9fdeee7e43eabf (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 8060 TPS, 4738 ms latency, 6900 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 15c7a147ac567f39763553b12e9fdeee7e43eabf
compatibility::simple-validator-upgrade::single-validator-upgrade : 4604 TPS, 8958 ms latency, 11800 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 15c7a147ac567f39763553b12e9fdeee7e43eabf
compatibility::simple-validator-upgrade::half-validator-upgrade : 4638 TPS, 8773 ms latency, 11500 ms p99 latency,no expired txns
4. upgrading second batch to new version: 15c7a147ac567f39763553b12e9fdeee7e43eabf
compatibility::simple-validator-upgrade::rest-validator-upgrade : 7069 TPS, 5342 ms latency, 9200 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 15c7a147ac567f39763553b12e9fdeee7e43eabf passed
Test Ok

github-actions · 2023-01-28T18:29:57Z

✅ Forge suite `land_blocking` success on `15c7a147ac567f39763553b12e9fdeee7e43eabf`

performance benchmark with full nodes : 6260 TPS, 6315 ms latency, 11100 ms p99 latency,(!) expired 600 out of 2673880 txns
Test Ok

gelash and others added 6 commits January 2, 2023 17:55

[WIP] Rolling commit

22144e7

Merge branch 'main' into rollingcommit

f0cb53e

fix liveness

28af84c

debug unit test

06c4517

committed is executed, sometimes

d7611aa

fix unit tests and cleanup

594e3b9

danielxiangzl added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Jan 6, 2023

danielxiangzl and others added 2 commits January 6, 2023 15:23

lint

c9f9424

Merge branch 'main' into daniel-rolling-commit-debug

c5dc7c5

This comment has been minimized.

Sign in to view

add comments

9b3955e

This comment has been minimized.

Sign in to view

danielxiangzl requested review from gelash, sasha8 and zekun000 January 7, 2023 00:22

danielxiangzl marked this pull request as ready for review January 7, 2023 00:24

danielxiangzl requested a review from grao1991 January 9, 2023 19:43

gelash requested a review from runtian-zhou January 11, 2023 00:40

danielxiangzl and others added 3 commits January 11, 2023 15:18

Merge branch 'daniel-rolling-commit-debug' of github.com:aptos-labs/a…

4009405

…ptos-core into daniel-rolling-commit-debug

add thread yielding

9aa3225

Merge branch 'main' into daniel-rolling-commit-debug

d6dface

This comment has been minimized.

Sign in to view

gelash reviewed Jan 20, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

zekun000 approved these changes Jan 27, 2023

View reviewed changes

sasha8 approved these changes Jan 27, 2023

View reviewed changes

danielxiangzl and others added 3 commits January 27, 2023 15:06

address comments

906b08a

Merge branch 'main' into daniel-rolling-commit-debug

84c34fa

Merge branch 'main' into daniel-rolling-commit-debug

db2867d

This comment has been minimized.

Sign in to view

Merge branch 'main' into daniel-rolling-commit-debug

15c7a14

This comment has been minimized.

Sign in to view

gelash merged commit f8283fb into main Jan 28, 2023

gelash deleted the daniel-rolling-commit-debug branch January 28, 2023 18:31

danielxiangzl mentioned this pull request Mar 6, 2023

[Parallel Executor] Halt parallel execution when module r/w intersects #5850

Closed

danielxiangzl mentioned this pull request Apr 12, 2023

[BlockSTM] Per-block Gas Limit #7488

Merged

2 tasks

thepomeranian mentioned this pull request May 8, 2023

[AIP-33][Block Gas Limit] aptos-foundation/AIPs#132

Closed

St0nersdash mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 1.6.0 to 1.6.3 St0nersdash/aptos-core#44

Merged

MuMianliu mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.3 MuMianliu/aptos-core#14

Open

MuMianliu mentioned this pull request Jan 5, 2024

[Snyk] Security upgrade axios from 0.27.2 to 1.6.4 MuMianliu/aptos-core#17

Open

Abuchtela mentioned this pull request Jan 5, 2024

[Snyk] Security upgrade axios from 0.27.2 to 1.6.4 Abuchtela/aptos-core#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Parallel Executor] Rolling Commit #6079

[Parallel Executor] Rolling Commit #6079

danielxiangzl commented Jan 5, 2023 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

gelash commented Jan 11, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

gelash left a comment

gelash commented Jan 20, 2023

This comment has been minimized.

This comment has been minimized.

zekun000 left a comment

zekun000 Jan 27, 2023

danielxiangzl Jan 27, 2023

gelash Jan 28, 2023

sasha8 Jan 27, 2023

danielxiangzl Jan 27, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 28, 2023

This comment has been minimized.

github-actions bot commented Jan 28, 2023

[Parallel Executor] Rolling Commit #6079

[Parallel Executor] Rolling Commit #6079

Conversation

danielxiangzl commented Jan 5, 2023 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

gelash commented Jan 11, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

gelash left a comment

Choose a reason for hiding this comment

gelash commented Jan 20, 2023

This comment has been minimized.

This comment has been minimized.

zekun000 left a comment

Choose a reason for hiding this comment

zekun000 Jan 27, 2023

Choose a reason for hiding this comment

danielxiangzl Jan 27, 2023

Choose a reason for hiding this comment

gelash Jan 28, 2023

Choose a reason for hiding this comment

sasha8 Jan 27, 2023

Choose a reason for hiding this comment

danielxiangzl Jan 27, 2023

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 28, 2023

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 15c7a147ac567f39763553b12e9fdeee7e43eabf

This comment has been minimized.

github-actions bot commented Jan 28, 2023

✅ Forge suite land_blocking success on 15c7a147ac567f39763553b12e9fdeee7e43eabf

danielxiangzl commented Jan 5, 2023 •

edited

Loading

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `15c7a147ac567f39763553b12e9fdeee7e43eabf`

✅ Forge suite `land_blocking` success on `15c7a147ac567f39763553b12e9fdeee7e43eabf`