[BlockSTM] Lazy (potentially iterable) txns input to BlockSTM #14568

manudhundi · 2024-09-09T18:44:39Z

Currently BlockSTM takes in a block (vec) of txns and executes them.
This commits adds a capability where we don't need to provide all the
txns in the block upfront, rather provide them as per any desired logic
in the system.

The commit has a default implementation 'DefaultTxnProvider' where all
txns are provided upfront as per current logic, and also a reference
implementation of 'BlockingTxnsProvider' where txns can be provided
after BlockSTM starts execution.

Note: One should be careful while using 'BlockingTxnsProvider' because
if BlockSTM chooses to execute a txn that is not yet provided, then that
thread gets blocked until such a txn is provided. This could lead to
performance degradation.

Description

Type of Change

Which Components or Systems Does This Change Impact?

How Has This Been Tested?

Key Areas to Review

Checklist

I have read and followed the CONTRIBUTING doc
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I identified and added all stakeholders and component owners affected by this change as reviewers
I tested both happy and unhappy path of the functionality
I have made corresponding changes to the documentation

trunk-io · 2024-09-09T18:44:43Z

⏱️ 1h 40m total CI duration on this PR

Slowest 15 Jobs	Cumulative Duration	Recent Runs
forge-e2e-test / forge	29m	🟩 🟩
test-target-determinator	12m	🟩 🟩
rust-move-tests	10m	🟩
rust-move-tests	9m	🟩
rust-move-tests	9m	🟩
rust-move-tests	8m	🟩
general-lints	7m	🟩 🟩 🟩 🟩
rust-cargo-deny	7m	🟩 🟩 🟩 🟩
check-dynamic-deps	5m	🟩 🟩 🟩 🟩
semgrep/ci	1m	🟩 🟩 🟩 🟩
file_change_determinator	48s	🟩 🟩 🟩 🟩
file_change_determinator	47s	🟩 🟩 🟩 🟩
file_change_determinator	20s	🟩 🟩
permission-check	11s	🟩 🟩 🟩 🟩
permission-check	10s	🟩 🟩

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

aptos-move/block-executor/src/txn_provider/mod.rs

gelash · 2024-09-26T07:05:01Z

aptos-move/block-executor/src/txn_provider/blocking_txns_provider.rs

+        match *status {
+            BlockingTransactionStatus::Ready(ref txn) => txn.clone(),
+            BlockingTransactionStatus::Waiting => {
+                status = txn.cvar.wait(status).unwrap();


from the documentation of condvar wait:

Note that this function is susceptible to spurious wakeups. Condition variables normally have a boolean predicate associated with them, and the predicate must always be checked each time this function returns to protect against spurious wakeups.

This can lead to panic on Line 70 below. A typical pattern is to do it in a loop, we have an example here:

aptos-core/aptos-move/block-executor/src/view.rs

Line 427 in 5ad601c

let (lock, cvar) = &*dep_condition;

DependencyCondvar defined here:

aptos-core/aptos-move/block-executor/src/scheduler.rs

Line 63 in 5ad601c

type DependencyCondvar = Arc<(Mutex<DependencyStatus>, Condvar)>;

I think what we can do here is have separate condvar with a status lock, use in a loop, and separately the (option of) transaction (or ExplicitSyncWrapper of Option if it needs sync bounds), so we can have an interface for naked &.
An Arc isn't really a big issue but in general I want us to try to reduce reliance on them in cases they aren't actually needed (lifetime guaranteed and reference counting not needed), and this seems like a good example of such a case.

@ziaptos: minor question here out of curiosity, if it's not too easy to change to non-arc, would it be better to return &Arc? (or triopmhe arc?)

bchocho

LGTM

Currently BlockSTM takes in a block (vec) of txns and executes them. This commits adds a capability where we don't need to provide all the txns in the block upfront, rather provide them as per any desired logic in the system. The commit has a default implementation 'DefaultTxnProvider' where all txns are provided upfront as per current logic, and also a reference implementation of 'BlockingTxnsProvider' where txns can be provided after BlockSTM starts execution. Note: One should be careful while using 'BlockingTxnsProvider' because if BlockSTM chooses to execute a txn that is not yet provided, then that thread gets blocked until such a txn is provided. This could lead to performance degradation.

github-actions · 2024-11-15T04:25:17Z

✅ Forge suite `realistic_env_max_load` success on `28156410d71e66e06876d4f16d00ee4b0aee8158`

two traffics test: inner traffic : committed: 14424.19 txn/s, latency: 2757.24 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3300 ms), latency samples: 5484400
two traffics test : committed: 99.91 txn/s, latency: 1545.01 ms, (p50: 1400 ms, p70: 1500, p90: 1500 ms, p99: 11600 ms), latency samples: 1700
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.018, avg: 1.571", "ConsensusProposalToOrdered: max: 0.319, avg: 0.291", "ConsensusOrderedToCommit: max: 0.380, avg: 0.368", "ConsensusProposalToCommit: max: 0.671, avg: 0.659"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 1.38s no progress at version 66012 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.57s no progress at version 2261641 (avg 8.57s) [limit 15].
Test Ok

github-actions · 2024-11-15T04:26:28Z

✅ Forge suite `framework_upgrade` success on `2bb2d43037a93d883729869d65c7c6c75b028fa1` ==> `28156410d71e66e06876d4f16d00ee4b0aee8158`

Compatibility test results for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 (PR)
Upgrade the nodes to version: 28156410d71e66e06876d4f16d00ee4b0aee8158
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1167.32 txn/s, submitted: 1171.63 txn/s, failed submission: 4.31 txn/s, expired: 4.31 txn/s, latency: 2809.63 ms, (p50: 2100 ms, p70: 3000, p90: 5400 ms, p99: 6200 ms), latency samples: 102840
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1222.16 txn/s, submitted: 1225.33 txn/s, failed submission: 3.18 txn/s, expired: 3.18 txn/s, latency: 2523.34 ms, (p50: 1800 ms, p70: 2400, p90: 5400 ms, p99: 7000 ms), latency samples: 107740
5. check swarm health
Compatibility test for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 passed
Upgrade the remaining nodes to version: 28156410d71e66e06876d4f16d00ee4b0aee8158
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1434.50 txn/s, submitted: 1436.98 txn/s, failed submission: 2.47 txn/s, expired: 2.47 txn/s, latency: 2267.31 ms, (p50: 2100 ms, p70: 2400, p90: 3300 ms, p99: 4900 ms), latency samples: 127580
Test Ok

github-actions · 2024-11-15T04:28:15Z

✅ Forge suite `compat` success on `2bb2d43037a93d883729869d65c7c6c75b028fa1` ==> `28156410d71e66e06876d4f16d00ee4b0aee8158`

Compatibility test results for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 (PR)
1. Check liveness of validators at old version: 2bb2d43037a93d883729869d65c7c6c75b028fa1
compatibility::simple-validator-upgrade::liveness-check : committed: 14512.60 txn/s, latency: 2348.78 ms, (p50: 2100 ms, p70: 2200, p90: 2400 ms, p99: 6500 ms), latency samples: 466020
2. Upgrading first Validator to new version: 28156410d71e66e06876d4f16d00ee4b0aee8158
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7033.41 txn/s, latency: 3882.74 ms, (p50: 4300 ms, p70: 4800, p90: 5200 ms, p99: 5500 ms), latency samples: 128200
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7196.80 txn/s, latency: 4380.92 ms, (p50: 4600 ms, p70: 4700, p90: 6500 ms, p99: 6800 ms), latency samples: 240260
3. Upgrading rest of first batch to new version: 28156410d71e66e06876d4f16d00ee4b0aee8158
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7187.49 txn/s, latency: 3853.46 ms, (p50: 4200 ms, p70: 4400, p90: 5100 ms, p99: 5400 ms), latency samples: 136080
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7730.23 txn/s, latency: 4148.62 ms, (p50: 4200 ms, p70: 4400, p90: 6300 ms, p99: 6600 ms), latency samples: 255940
4. upgrading second batch to new version: 28156410d71e66e06876d4f16d00ee4b0aee8158
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11286.17 txn/s, latency: 2484.20 ms, (p50: 2600 ms, p70: 2800, p90: 3200 ms, p99: 3400 ms), latency samples: 197500
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11045.00 txn/s, latency: 2854.75 ms, (p50: 2700 ms, p70: 3000, p90: 4500 ms, p99: 5400 ms), latency samples: 355840
5. check swarm health
Compatibility test for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 passed
Test Ok

* [BlockSTM] Iterable txns input to BlockSTM Currently BlockSTM takes in a block (vec) of txns and executes them. This commits adds a capability where we don't need to provide all the txns in the block upfront, rather provide them as per any desired logic in the system. The commit has a default implementation 'DefaultTxnProvider' where all txns are provided upfront as per current logic, and also a reference implementation of 'BlockingTxnsProvider' where txns can be provided after BlockSTM starts execution. This is done by rust's OnceCell<>. Note: One should be careful while using 'BlockingTxnsProvider' because if BlockSTM chooses to execute a txn that is not yet provided, then that thread gets blocked until such a txn is provided. This could lead to performance degradation. * Address issues that can arise from spurious wakeups in Condvar * In BlockingTransaction use OnceCell<> instead of Mutex and Cvar * Keep BlockingTransaction internal to BlockingTxnProvider * Remove BlockingTransaction struct; instead use OnceCell directly

manudhundi marked this pull request as ready for review September 9, 2024 18:46

manudhundi requested review from msmouse, lightmark, grao1991, gelash, zekun000, sasha8, danielxiangzl, davidiw, wrwg, vgao1996 and georgemitenkov as code owners September 9, 2024 18:46

manudhundi requested review from bchocho, sitalkedia and zjma September 9, 2024 18:46

manudhundi force-pushed the manu/BlockSTM_iterator branch from f8ed8c1 to 8ba7e89 Compare September 10, 2024 20:26

bchocho added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Sep 12, 2024

This comment has been minimized.

Sign in to view

manudhundi force-pushed the manu/BlockSTM_iterator branch from 8ba7e89 to ba24673 Compare September 13, 2024 00:59

This comment has been minimized.

Sign in to view

brmataptos mentioned this pull request Sep 16, 2024

[Bug][move-compiler-v2] constants should not be able to refer to other constants #14648

Closed

manudhundi changed the title ~~[BlockSTM] Iterable txns input to BlockSTM~~ [BlockSTM] Lazy (potentially iterable) txns input to BlockSTM Sep 16, 2024

gelash reviewed Sep 25, 2024

View reviewed changes

aptos-move/block-executor/src/txn_provider/mod.rs Outdated Show resolved Hide resolved

gelash reviewed Sep 26, 2024

View reviewed changes

gelash requested a review from ziaptos September 26, 2024 07:20

This comment has been minimized.

Sign in to view

bchocho approved these changes Nov 10, 2024

View reviewed changes

manudhundi enabled auto-merge (squash) November 10, 2024 23:58

This comment has been minimized.

Sign in to view

manudhundi force-pushed the manu/BlockSTM_iterator branch from 01d817c to 22df6f2 Compare November 12, 2024 16:14

This comment has been minimized.

Sign in to view

manudhundi added 5 commits November 14, 2024 19:43

Address issues that can arise from spurious wakeups in Condvar

deb50e5

In BlockingTransaction use OnceCell<> instead of Mutex and Cvar

c272025

Keep BlockingTransaction internal to BlockingTxnProvider

7520c59

Remove BlockingTransaction struct; instead use OnceCell directly

2815641

manudhundi force-pushed the manu/BlockSTM_iterator branch from 22df6f2 to 2815641 Compare November 15, 2024 03:56

This comment has been minimized.

Sign in to view

manudhundi merged commit 23db584 into main Nov 15, 2024
48 checks passed

manudhundi deleted the manu/BlockSTM_iterator branch November 15, 2024 04:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BlockSTM] Lazy (potentially iterable) txns input to BlockSTM #14568

[BlockSTM] Lazy (potentially iterable) txns input to BlockSTM #14568

manudhundi commented Sep 9, 2024 •

edited

Loading

trunk-io bot commented Sep 9, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

gelash Sep 26, 2024 •

edited

Loading

gelash Sep 26, 2024

This comment has been minimized.

This comment has been minimized.

bchocho left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

[BlockSTM] Lazy (potentially iterable) txns input to BlockSTM #14568

[BlockSTM] Lazy (potentially iterable) txns input to BlockSTM #14568

Conversation

manudhundi commented Sep 9, 2024 • edited Loading

Description

Type of Change

Which Components or Systems Does This Change Impact?

How Has This Been Tested?

Key Areas to Review

Checklist

trunk-io bot commented Sep 9, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

gelash Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

gelash Sep 26, 2024

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

bchocho left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Nov 15, 2024

✅ Forge suite realistic_env_max_load success on 28156410d71e66e06876d4f16d00ee4b0aee8158

github-actions bot commented Nov 15, 2024

✅ Forge suite framework_upgrade success on 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158

github-actions bot commented Nov 15, 2024

✅ Forge suite compat success on 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158

manudhundi commented Sep 9, 2024 •

edited

Loading

trunk-io bot commented Sep 9, 2024 •

edited

Loading

gelash Sep 26, 2024 •

edited

Loading

✅ Forge suite `realistic_env_max_load` success on `28156410d71e66e06876d4f16d00ee4b0aee8158`

✅ Forge suite `framework_upgrade` success on `2bb2d43037a93d883729869d65c7c6c75b028fa1` ==> `28156410d71e66e06876d4f16d00ee4b0aee8158`

✅ Forge suite `compat` success on `2bb2d43037a93d883729869d65c7c6c75b028fa1` ==> `28156410d71e66e06876d4f16d00ee4b0aee8158`