Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage][Sharding] Sharded state merkle pruner. #7857

Merged
merged 1 commit into from
Jun 21, 2023
Merged

Conversation

grao1991
Copy link
Contributor

@grao1991 grao1991 commented Apr 19, 2023

Description

Test Plan

Tested in executor-benchmark.

@grao1991 grao1991 force-pushed the grao_pruner_sharding branch from 8a50c73 to 3258416 Compare May 22, 2023 22:29
@grao1991 grao1991 requested a review from areshand May 22, 2023 22:31
@grao1991 grao1991 marked this pull request as ready for review May 22, 2023 22:31
@@ -27,16 +28,25 @@ mod test;

pub const STATE_MERKLE_PRUNER_NAME: &str = "state_merkle_pruner";

static POOL: Lazy<rayon::ThreadPool> = Lazy::new(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider a more specific name like TREE_PRUNER_THREAD_POOL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// to the DB.
fn prune_state_merkle_shard(
&self,
db: &DB,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you name this DB? too many DBs, sometime, hard to know which DB this refers to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let shard_min_readable_version = self.get_shard_progress(shard_id);
if shard_min_readable_version != target_version {
assert_lt!(shard_min_readable_version, target_version);
self.update_shard_progress(shard_id, target_version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, do we want to update shard progress once the prune is done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db progress is updated after, the in memory one is updated before, to tell the upper layer the data is not readable.

Copy link
Contributor Author

@grao1991 grao1991 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I said above has changed now (after #8532). Now we manage the "min_readable_version" in the pruner manager (updated before the work, and this is the value we serve to outside), and we manager the "progress" in the pruner structs (updated after the work is done).

) -> Result<Option<Version>> {
let batch = SchemaBatch::new();
let next_version = self.prune_state_merkle_shard(
self.state_merkle_db.metadata_db(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the top level SMT stored in metadata_db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@grao1991 grao1991 force-pushed the grao_pruner_sharding branch 2 times, most recently from f1d8b64 to 78c349d Compare May 23, 2023 00:31
@grao1991 grao1991 force-pushed the grao_pruner_sharding branch 3 times, most recently from def5a60 to a438cea Compare May 30, 2023 20:36
@grao1991 grao1991 force-pushed the grao_pruner_sharding branch 8 times, most recently from bf133f8 to b1bef72 Compare June 16, 2023 00:40
@@ -102,17 +104,19 @@ where

// used only by blanket `initialize()`, use the underlying implementation instead elsewhere.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

}

pub(in crate::pruner) fn progress(&self) -> Result<Version> {
Ok(get_progress(&self.metadata_db, &S::tag(None))?.unwrap_or(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From previous PR, but in get_progress() we should call DbMetadataValue::expect_version()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fn name() -> &'static str;
}

impl StaleNodeIndexSchemaTrait for StaleNodeIndexSchema {
fn tag() -> DbMetadataKey {
DbMetadataKey::StateMerklePrunerProgress
fn tag(shard_id: Option<u8>) -> DbMetadataKey {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my fault, but rename, maybe progress_metadata_key()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pub(in crate::pruner) fn new(metadata_db: Arc<DB>) -> Self {
Self {
metadata_db,
next_version: AtomicVersion::new(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man I have to say this field is useless.. looking at the code it's always been overwritten by current_progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, for most of the time it's always be larger than current_progress, and it's overwritten by either the second smallest version that is larger than current_progress (the smallest version >= current_progress get pruned), or the target_version (if we've already reach the end of stale index in this round)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.. needed a vc session to understand what you are saying, but seems legit 😞

@grao1991 grao1991 force-pushed the grao_pruner_sharding branch from b1bef72 to e645ab5 Compare June 17, 2023 00:40
pub(in crate::pruner) fn new(metadata_db: Arc<DB>) -> Self {
Self {
metadata_db,
next_version: AtomicVersion::new(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.. needed a vc session to understand what you are saying, but seems legit 😞

None
},
)
Ok(if let Some(v) = db.get::<DbMetadataSchema>(progress_key)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use .map()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@grao1991 grao1991 force-pushed the grao_pruner_sharding branch from e645ab5 to 2d6740f Compare June 20, 2023 21:43
@grao1991 grao1991 enabled auto-merge (squash) June 20, 2023 21:44
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@grao1991 grao1991 force-pushed the grao_pruner_sharding branch from 2d6740f to d7ff448 Compare June 21, 2023 01:32
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> d7ff448772cf5bcaefef6ba2e6575cf66764d842

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> d7ff448772cf5bcaefef6ba2e6575cf66764d842 (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : committed: 8991 txn/s, latency: 3669 ms, (p50: 3600 ms, p90: 5100 ms, p99: 6000 ms), latency samples: 305720
2. Upgrading first Validator to new version: d7ff448772cf5bcaefef6ba2e6575cf66764d842
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 4553 txn/s, latency: 6848 ms, (p50: 7500 ms, p90: 8300 ms, p99: 9300 ms), latency samples: 177600
3. Upgrading rest of first batch to new version: d7ff448772cf5bcaefef6ba2e6575cf66764d842
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 4429 txn/s, latency: 7320 ms, (p50: 8400 ms, p90: 9000 ms, p99: 9400 ms), latency samples: 163880
4. upgrading second batch to new version: d7ff448772cf5bcaefef6ba2e6575cf66764d842
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 6944 txn/s, latency: 4720 ms, (p50: 4800 ms, p90: 6500 ms, p99: 7900 ms), latency samples: 243060
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> d7ff448772cf5bcaefef6ba2e6575cf66764d842 passed
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite land_blocking success on d7ff448772cf5bcaefef6ba2e6575cf66764d842

performance benchmark : committed: 5536 txn/s, submitted: 5537 txn/s, latency: 7199 ms, (p50: 6000 ms, p90: 9900 ms, p99: 27100 ms), latency samples: 2364200
Max round gap was 1 [limit 4] at version 640704. Max no progress secs was 3.710849 [limit 10] at version 1472738.
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite framework_upgrade success on aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> d7ff448772cf5bcaefef6ba2e6575cf66764d842

Compatibility test results for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> d7ff448772cf5bcaefef6ba2e6575cf66764d842 (PR)
Upgrade the nodes to version: d7ff448772cf5bcaefef6ba2e6575cf66764d842
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 4471 txn/s, latency: 7159 ms, (p50: 7500 ms, p90: 9600 ms, p99: 13000 ms), latency samples: 169900
5. check swarm health
Compatibility test for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> d7ff448772cf5bcaefef6ba2e6575cf66764d842 passed
Test Ok

@grao1991 grao1991 merged commit 63722ca into main Jun 21, 2023
@grao1991 grao1991 deleted the grao_pruner_sharding branch June 21, 2023 03:35
xbtmatt pushed a commit to xbtmatt/aptos-core that referenced this pull request Jul 25, 2023
xbtmatt pushed a commit to xbtmatt/aptos-core that referenced this pull request Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants