Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loader] Global caches for execution #15192

Merged
merged 21 commits into from
Nov 13, 2024
Merged

Conversation

georgemitenkov
Copy link
Contributor

@georgemitenkov georgemitenkov commented Nov 5, 2024

Description

This PR refactors code caching logic (both L1 and L2) in block executor (Block-STM), and implements a code cache manager for global (L2) caches. Also small cleanups of code around (see below).

Detailed list of changes:

  • Introduced test-only MockStateView in aptos-types that can be re-used across tests. Hence, there are some additional changes around MockVM and some unit tests.

  • Introduced test-only MockExtension in move-vm-types to mock extension stored in cached entries. This is better than () because we can search it by name and implement trait bounds for it in the future.

  • Split WithBytes trait into WithSize and WithBytes so that we can query size and bytes separately. This allows us to mock sized entries in global module cache, to test things easily.

  • Removed version from ModuleCode, and instead introduced VersionedModuleCode that has the version. This allows to store versioned code in Block-STM caches, but use non-versioned variant in global caches. For storage, a "default" version is used, that perfectly aligns with Option<TxnIndex>. As a result, had to change Sync/UnsyncModuleCache APIs, tests, etc.

  • Renamed ImmutableModuleCache to GlobalModuleCache for clarity. Removed ugly versioning from there. Added size tracking (in bytes) for serialised modules.

  • Implemented Aptos framework prefetching in case the global module caches are empty.

  • Introduced BlockExecutorModuleCacheLocalConfig in aptos-types which is stored in BlockExecutorLocalConfig and stores info about module caches. Added a few default constructors to make code cleaner and less verbose, also in some other places. This way, any module caching logic can be based on this config.

  • Introduced a GlobalModuleCacheManager struct which is stored in AptosVMBlockExecutor and contains global module cache + environment. can be in READY, EXECUTING and DONE states. Users can perform state transitions.

  • BlockAptosVM now takes additional optional GlobalModuleCacheManager. If not set, empty caches are used. BlockAptosVM transitions the manager from READY to EXECUTING state for block execution, and from EXECUTING to DONE when finished. DONE to READY is added to the actual block executor.

  • (NEW) Propagated parent/current block hashes throughout and down to execute_block APIs. We can clean up a bit later to avoid ugly Nones.

How Has This Been Tested?

  • Existing tests.
  • CI re-uses executor, so runs the tests.
  • New unit tests.

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Nov 5, 2024

⏱️ 23h 36m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 7h 20m 🟥🟥🟥🟩🟩 (+13 more)
single-node-performance 3h 28m 🟥🟥🟥🟥🟥 (+1 more)
execution-performance / test-target-determinator 1h 12m 🟩🟩🟩🟩🟩 (+13 more)
test-target-determinator 1h 9m 🟩🟩🟩🟩🟩 (+12 more)
rust-cargo-deny 1h 3m 🟩🟩🟩🟩🟩 (+31 more)
check 48m 🟩🟩🟩🟩🟩 (+8 more)
check-dynamic-deps 43m 🟩🟩🟩🟩🟩 (+32 more)
rust-move-tests 41m
fetch-last-released-docker-image-tag 21m 🟩🟩🟩🟩🟩 (+8 more)
test-target-determinator 21m 🟩🟩🟩🟩🟩 (+1 more)
general-lints 18m 🟩🟩🟩🟩🟩 (+31 more)
semgrep/ci 15m 🟩🟩🟩🟩🟩 (+32 more)
rust-targeted-unit-tests 11m
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link
Contributor Author

georgemitenkov commented Nov 5, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @georgemitenkov and the rest of your teammates on Graphite Graphite

@georgemitenkov georgemitenkov changed the title [refactoring] Move explicit sync wrapper to aptos-types [loader] Global caches for execution Nov 5, 2024
@georgemitenkov georgemitenkov added CICD:run-execution-performance-test Run execution performance test CICD:run-execution-performance-full-test Run execution performance test (full version) labels Nov 5, 2024
@georgemitenkov georgemitenkov force-pushed the george/parent-block-checks branch 3 times, most recently from 6ab79b7 to 813cb54 Compare November 6, 2024 09:05
@georgemitenkov georgemitenkov marked this pull request as ready for review November 6, 2024 09:19
  - Fixed a bug where on successful marking as executing we were re-creating environment
  - Removed optional manager in BlockAptosVM, instead a new instance can be provided
  - Removed clean state, instead manager is initialized to Done(None) and optional values
    need to be provided.
@georgemitenkov georgemitenkov force-pushed the george/parent-block-checks branch from 1a54836 to 4f1b9f9 Compare November 13, 2024 22:54

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 4f1b9f9ba077661a8fa29bee322d35909e906b4b

two traffics test: inner traffic : committed: 14558.17 txn/s, latency: 2732.29 ms, (p50: 2700 ms, p70: 2700, p90: 2900 ms, p99: 3000 ms), latency samples: 5535340
two traffics test : committed: 99.96 txn/s, latency: 1586.65 ms, (p50: 1400 ms, p70: 1400, p90: 1600 ms, p99: 8400 ms), latency samples: 1840
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.996, avg: 1.560", "ConsensusProposalToOrdered: max: 0.336, avg: 0.294", "ConsensusOrderedToCommit: max: 0.360, avg: 0.347", "ConsensusProposalToCommit: max: 0.648, avg: 0.641"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.82s no progress at version 2359491 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.51s no progress at version 2359489 (avg 8.51s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 4f1b9f9ba077661a8fa29bee322d35909e906b4b

Compatibility test results for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 4f1b9f9ba077661a8fa29bee322d35909e906b4b (PR)
Upgrade the nodes to version: 4f1b9f9ba077661a8fa29bee322d35909e906b4b
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1294.78 txn/s, submitted: 1298.15 txn/s, failed submission: 3.38 txn/s, expired: 3.38 txn/s, latency: 2383.33 ms, (p50: 2100 ms, p70: 2400, p90: 4200 ms, p99: 5500 ms), latency samples: 115080
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1261.83 txn/s, submitted: 1265.12 txn/s, failed submission: 3.29 txn/s, expired: 3.29 txn/s, latency: 2319.82 ms, (p50: 2100 ms, p70: 2400, p90: 3400 ms, p99: 4800 ms), latency samples: 115180
5. check swarm health
Compatibility test for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 4f1b9f9ba077661a8fa29bee322d35909e906b4b passed
Upgrade the remaining nodes to version: 4f1b9f9ba077661a8fa29bee322d35909e906b4b
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1335.00 txn/s, submitted: 1338.59 txn/s, failed submission: 3.59 txn/s, expired: 3.59 txn/s, latency: 2311.52 ms, (p50: 2100 ms, p70: 2400, p90: 3300 ms, p99: 5200 ms), latency samples: 118900
Test Ok

Copy link
Contributor

✅ Forge suite compat success on ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 4f1b9f9ba077661a8fa29bee322d35909e906b4b

Compatibility test results for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 4f1b9f9ba077661a8fa29bee322d35909e906b4b (PR)
1. Check liveness of validators at old version: ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70
compatibility::simple-validator-upgrade::liveness-check : committed: 18348.72 txn/s, latency: 1874.78 ms, (p50: 1900 ms, p70: 2000, p90: 2100 ms, p99: 2200 ms), latency samples: 585700
2. Upgrading first Validator to new version: 4f1b9f9ba077661a8fa29bee322d35909e906b4b
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 8021.49 txn/s, latency: 3599.16 ms, (p50: 4000 ms, p70: 4200, p90: 4300 ms, p99: 4400 ms), latency samples: 148820
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 3015.91 txn/s, submitted: 3016.03 txn/s, expired: 0.12 txn/s, latency: 4137.46 ms, (p50: 4300 ms, p70: 4300, p90: 4400 ms, p99: 6100 ms), latency samples: 268509
3. Upgrading rest of first batch to new version: 4f1b9f9ba077661a8fa29bee322d35909e906b4b
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 8005.76 txn/s, latency: 3595.61 ms, (p50: 4000 ms, p70: 4200, p90: 4300 ms, p99: 4400 ms), latency samples: 146540
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 8153.09 txn/s, latency: 3946.38 ms, (p50: 4200 ms, p70: 4300, p90: 5100 ms, p99: 5600 ms), latency samples: 269340
4. upgrading second batch to new version: 4f1b9f9ba077661a8fa29bee322d35909e906b4b
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 12647.81 txn/s, latency: 2170.24 ms, (p50: 2400 ms, p70: 2500, p90: 2600 ms, p99: 2800 ms), latency samples: 215900
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10835.91 txn/s, latency: 2890.22 ms, (p50: 2600 ms, p70: 2700, p90: 5800 ms, p99: 7400 ms), latency samples: 356760
5. check swarm health
Compatibility test for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 4f1b9f9ba077661a8fa29bee322d35909e906b4b passed
Test Ok

@georgemitenkov georgemitenkov merged commit cb4dd96 into main Nov 13, 2024
48 checks passed
@georgemitenkov georgemitenkov deleted the george/parent-block-checks branch November 13, 2024 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants