Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BlockSTM] Lazy (potentially iterable) txns input to BlockSTM #14568

Merged
merged 5 commits into from
Nov 15, 2024

Conversation

manudhundi
Copy link
Contributor

@manudhundi manudhundi commented Sep 9, 2024

Currently BlockSTM takes in a block (vec) of txns and executes them.
This commits adds a capability where we don't need to provide all the
txns in the block upfront, rather provide them as per any desired logic
in the system.

The commit has a default implementation 'DefaultTxnProvider' where all
txns are provided upfront as per current logic, and also a reference
implementation of 'BlockingTxnsProvider' where txns can be provided
after BlockSTM starts execution.

Note: One should be careful while using 'BlockingTxnsProvider' because
if BlockSTM chooses to execute a txn that is not yet provided, then that
thread gets blocked until such a txn is provided. This could lead to
performance degradation.

Description

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Key Areas to Review

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Sep 9, 2024

⏱️ 1h 40m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
forge-e2e-test / forge 29m 🟩🟩
test-target-determinator 12m 🟩🟩
rust-move-tests 10m 🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩
rust-move-tests 8m 🟩
general-lints 7m 🟩🟩🟩🟩
rust-cargo-deny 7m 🟩🟩🟩🟩
check-dynamic-deps 5m 🟩🟩🟩🟩
semgrep/ci 1m 🟩🟩🟩🟩
file_change_determinator 48s 🟩🟩🟩🟩
file_change_determinator 47s 🟩🟩🟩🟩
file_change_determinator 20s 🟩🟩
permission-check 11s 🟩🟩🟩🟩
permission-check 10s 🟩🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@manudhundi manudhundi changed the title [BlockSTM] Iterable txns input to BlockSTM [BlockSTM] Lazy (potentially iterable) txns input to BlockSTM Sep 16, 2024
match *status {
BlockingTransactionStatus::Ready(ref txn) => txn.clone(),
BlockingTransactionStatus::Waiting => {
status = txn.cvar.wait(status).unwrap();
Copy link
Contributor

@gelash gelash Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the documentation of condvar wait:

Note that this function is susceptible to spurious wakeups. Condition variables normally have a boolean predicate associated with them, and the predicate must always be checked each time this function returns to protect against spurious wakeups.

This can lead to panic on Line 70 below. A typical pattern is to do it in a loop, we have an example here:

let (lock, cvar) = &*dep_condition;

DependencyCondvar defined here:
type DependencyCondvar = Arc<(Mutex<DependencyStatus>, Condvar)>;

I think what we can do here is have separate condvar with a status lock, use in a loop, and separately the (option of) transaction (or ExplicitSyncWrapper of Option if it needs sync bounds), so we can have an interface for naked &.
An Arc isn't really a big issue but in general I want us to try to reduce reliance on them in cases they aren't actually needed (lifetime guaranteed and reference counting not needed), and this seems like a good example of such a case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ziaptos: minor question here out of curiosity, if it's not too easy to change to non-arc, would it be better to return &Arc? (or triopmhe arc?)

@gelash gelash requested a review from ziaptos September 26, 2024 07:20

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

@bchocho bchocho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manudhundi manudhundi enabled auto-merge (squash) November 10, 2024 23:58

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Currently BlockSTM takes in a block (vec) of txns and executes them.
This commits adds a capability where we don't need to provide all the
txns in the block upfront, rather provide them as per any desired logic
in the system.

The commit has a default implementation 'DefaultTxnProvider' where all
txns are provided upfront as per current logic, and also a reference
implementation of 'BlockingTxnsProvider' where txns can be provided
after BlockSTM starts execution.

Note: One should be careful while using 'BlockingTxnsProvider' because
if BlockSTM chooses to execute a txn that is not yet provided, then that
thread gets blocked until such a txn is provided. This could lead to
performance degradation.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 28156410d71e66e06876d4f16d00ee4b0aee8158

two traffics test: inner traffic : committed: 14424.19 txn/s, latency: 2757.24 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3300 ms), latency samples: 5484400
two traffics test : committed: 99.91 txn/s, latency: 1545.01 ms, (p50: 1400 ms, p70: 1500, p90: 1500 ms, p99: 11600 ms), latency samples: 1700
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.018, avg: 1.571", "ConsensusProposalToOrdered: max: 0.319, avg: 0.291", "ConsensusOrderedToCommit: max: 0.380, avg: 0.368", "ConsensusProposalToCommit: max: 0.671, avg: 0.659"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 1.38s no progress at version 66012 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.57s no progress at version 2261641 (avg 8.57s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158

Compatibility test results for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 (PR)
Upgrade the nodes to version: 28156410d71e66e06876d4f16d00ee4b0aee8158
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1167.32 txn/s, submitted: 1171.63 txn/s, failed submission: 4.31 txn/s, expired: 4.31 txn/s, latency: 2809.63 ms, (p50: 2100 ms, p70: 3000, p90: 5400 ms, p99: 6200 ms), latency samples: 102840
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1222.16 txn/s, submitted: 1225.33 txn/s, failed submission: 3.18 txn/s, expired: 3.18 txn/s, latency: 2523.34 ms, (p50: 1800 ms, p70: 2400, p90: 5400 ms, p99: 7000 ms), latency samples: 107740
5. check swarm health
Compatibility test for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 passed
Upgrade the remaining nodes to version: 28156410d71e66e06876d4f16d00ee4b0aee8158
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1434.50 txn/s, submitted: 1436.98 txn/s, failed submission: 2.47 txn/s, expired: 2.47 txn/s, latency: 2267.31 ms, (p50: 2100 ms, p70: 2400, p90: 3300 ms, p99: 4900 ms), latency samples: 127580
Test Ok

Copy link
Contributor

✅ Forge suite compat success on 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158

Compatibility test results for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 (PR)
1. Check liveness of validators at old version: 2bb2d43037a93d883729869d65c7c6c75b028fa1
compatibility::simple-validator-upgrade::liveness-check : committed: 14512.60 txn/s, latency: 2348.78 ms, (p50: 2100 ms, p70: 2200, p90: 2400 ms, p99: 6500 ms), latency samples: 466020
2. Upgrading first Validator to new version: 28156410d71e66e06876d4f16d00ee4b0aee8158
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7033.41 txn/s, latency: 3882.74 ms, (p50: 4300 ms, p70: 4800, p90: 5200 ms, p99: 5500 ms), latency samples: 128200
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7196.80 txn/s, latency: 4380.92 ms, (p50: 4600 ms, p70: 4700, p90: 6500 ms, p99: 6800 ms), latency samples: 240260
3. Upgrading rest of first batch to new version: 28156410d71e66e06876d4f16d00ee4b0aee8158
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7187.49 txn/s, latency: 3853.46 ms, (p50: 4200 ms, p70: 4400, p90: 5100 ms, p99: 5400 ms), latency samples: 136080
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7730.23 txn/s, latency: 4148.62 ms, (p50: 4200 ms, p70: 4400, p90: 6300 ms, p99: 6600 ms), latency samples: 255940
4. upgrading second batch to new version: 28156410d71e66e06876d4f16d00ee4b0aee8158
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11286.17 txn/s, latency: 2484.20 ms, (p50: 2600 ms, p70: 2800, p90: 3200 ms, p99: 3400 ms), latency samples: 197500
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11045.00 txn/s, latency: 2854.75 ms, (p50: 2700 ms, p70: 3000, p90: 4500 ms, p99: 5400 ms), latency samples: 355840
5. check swarm health
Compatibility test for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 28156410d71e66e06876d4f16d00ee4b0aee8158 passed
Test Ok

@manudhundi manudhundi merged commit 23db584 into main Nov 15, 2024
48 checks passed
@manudhundi manudhundi deleted the manu/BlockSTM_iterator branch November 15, 2024 04:34
rahxephon89 pushed a commit that referenced this pull request Nov 15, 2024
* [BlockSTM] Iterable txns input to BlockSTM

Currently BlockSTM takes in a block (vec) of txns and executes them.
This commits adds a capability where we don't need to provide all the
txns in the block upfront, rather provide them as per any desired logic
in the system.

The commit has a default implementation 'DefaultTxnProvider' where all
txns are provided upfront as per current logic, and also a reference
implementation of 'BlockingTxnsProvider' where txns can be provided
after BlockSTM starts execution. This is done by rust's OnceCell<>.

Note: One should be careful while using 'BlockingTxnsProvider' because
if BlockSTM chooses to execute a txn that is not yet provided, then that
thread gets blocked until such a txn is provided. This could lead to
performance degradation.

* Address issues that can arise from spurious wakeups in Condvar

* In BlockingTransaction use OnceCell<> instead of Mutex and Cvar

* Keep BlockingTransaction internal to BlockingTxnProvider

* Remove BlockingTransaction struct; instead use OnceCell directly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-forge-e2e-perf Run the e2e perf forge only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants