Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable dedup and shuffling as defaults (for test networks) #9397

Merged
merged 6 commits into from
Aug 2, 2023

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Jul 31, 2023

Description

Introduce a new Missing version for OnChainExecutionConfig that will maintain backwards compatibility for before the config was registered. Use this when the config is missing from epoch manager.

Otherwise, use default_for_genesis for all new networks e.g., forge, devnet.

Test Plan

Observe that e2e tests use both shuffle and dedup.

@bchocho bchocho added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Jul 31, 2023
@bchocho bchocho requested a review from sitalkedia July 31, 2023 21:41
Copy link
Contributor

@sitalkedia sitalkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@bchocho bchocho marked this pull request as ready for review July 31, 2023 22:16
@bchocho bchocho requested review from igor-aptos and zekun000 July 31, 2023 22:17
@github-actions

This comment has been minimized.

@bchocho bchocho requested a review from JoshLind August 1, 2023 00:38
@bchocho bchocho changed the title Re-enable dedup in forge, and cleanup turning on shuffling for tests Re-enable dedup and shuffling as defaults (for test networks) Aug 2, 2023
@bchocho bchocho requested a review from sitalkedia August 2, 2023 00:14
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@@ -106,7 +106,7 @@ pub struct ExecutionConfigV3 {
impl Default for ExecutionConfigV3 {
fn default() -> Self {
Self {
transaction_shuffler_type: TransactionShufflerType::NoShuffling,
transaction_shuffler_type: TransactionShufflerType::SenderAwareV2(32),
block_gas_limit: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielxiangzl - Should we enable block gas limit by default for Forge and other testings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only after the execution on-chain config registry is fixed. I still find it hacky to set the config values via default for test networks. Maybe do what Igor suggested and set the block gas limit to be 35000 for genesis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be used only for Forge and existing integration tests, so I say we enable gas limit so that our tests exercise the block gas limit before rolling this out. For testnet/mainnet, this is no-op as they are still using the V1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I checked and we are on V3 in mainnet and testnet. But it still shouldn't matter, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the execution on-chain config registry is not fixed, and main/testnet reads config via default (V3), then it will be a problem. So just to double check, is it fixed already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we on V3 in mainnet? If so, I think this is a dangerous change to make..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The governance proposal for v1.5 was V3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the registry fix is not rolled out yet, I think we should not enable shuffling by default in V3 - if we are already in V3 in mainnet. The reason for that being if by any chance we roll out this change without the registry fix, then we break the network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add V4 below Missing, and return no shuffling in v3? Then we don't need new shuffling enum either

@@ -106,7 +106,7 @@ pub struct ExecutionConfigV3 {
impl Default for ExecutionConfigV3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to permanently have different things, I would have :
default_for_genesis
default_if_missing
basically default_for_genesis is used for any new network. default_if_missing is the backward compatible thing (needed for reply for example), to have old behavior if flag is missing

and I would remove Default macro completely, so people don't explicitly pick the one they want.

currently it is not critical, but if at any point in the future we add anything in the execution config that affects transaction output (not the order/which ones get on chain), we would need to have a separation, as replay will start with missing config

@@ -106,7 +106,7 @@ pub struct ExecutionConfigV3 {
impl Default for ExecutionConfigV3 {
fn default() -> Self {
Self {
transaction_shuffler_type: TransactionShufflerType::NoShuffling,
transaction_shuffler_type: TransactionShufflerType::SenderAwareV2(32),
block_gas_limit: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the registry fix is not rolled out yet, I think we should not enable shuffling by default in V3 - if we are already in V3 in mainnet. The reason for that being if by any chance we roll out this change without the registry fix, then we break the network.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@@ -11,13 +11,17 @@ pub enum OnChainExecutionConfig {
V1(ExecutionConfigV1),
V2(ExecutionConfigV2),
V3(ExecutionConfigV3),
/// To maintain backwards compatibility on replay, we must ensure that any new features resolve
/// to previous behavior (before OnChainExecutionConfig was registered) in case of Missing.
Missing,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty easy to make a mistake and add the next version V4 after V3 and before Missing and thus breaking backward compatibility. Can you add a comment here to add V4 after Missing ?

@@ -106,8 +118,8 @@ pub struct ExecutionConfigV3 {
impl Default for ExecutionConfigV3 {
fn default() -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used anywhere? If so, better to change them to default_for_genesis or default_if_missing and get rid of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, just realized, this is called from default_for_genesis function, but still may be it will be less confusing to remove Default trait for ExecutionConfigV3 and return the desired value from default_for_genesis function..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this, maybe even just inline there

Copy link
Contributor

@igor-aptos igor-aptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bchocho bchocho requested a review from sitalkedia August 2, 2023 21:44
Copy link
Contributor

@sitalkedia sitalkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the config mess @bchocho 🙌

@bchocho bchocho enabled auto-merge (squash) August 2, 2023 21:47
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

✅ Forge suite compat success on aptos-node-v1.5.1 ==> c85d9849bd99882d69e15e437881867a51490dee

Compatibility test results for aptos-node-v1.5.1 ==> c85d9849bd99882d69e15e437881867a51490dee (PR)
1. Check liveness of validators at old version: aptos-node-v1.5.1
compatibility::simple-validator-upgrade::liveness-check : committed: 4911 txn/s, latency: 6586 ms, (p50: 6300 ms, p90: 9700 ms, p99: 12400 ms), latency samples: 181720
2. Upgrading first Validator to new version: c85d9849bd99882d69e15e437881867a51490dee
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1817 txn/s, latency: 15907 ms, (p50: 18700 ms, p90: 22300 ms, p99: 22600 ms), latency samples: 92680
3. Upgrading rest of first batch to new version: c85d9849bd99882d69e15e437881867a51490dee
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1858 txn/s, latency: 15799 ms, (p50: 18700 ms, p90: 22200 ms, p99: 22400 ms), latency samples: 92900
4. upgrading second batch to new version: c85d9849bd99882d69e15e437881867a51490dee
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3160 txn/s, latency: 9540 ms, (p50: 10200 ms, p90: 12700 ms, p99: 13600 ms), latency samples: 135900
5. check swarm health
Compatibility test for aptos-node-v1.5.1 ==> c85d9849bd99882d69e15e437881867a51490dee passed
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

✅ Forge suite realistic_env_max_load success on c85d9849bd99882d69e15e437881867a51490dee

two traffics test: inner traffic : committed: 6404 txn/s, latency: 6127 ms, (p50: 5900 ms, p90: 7600 ms, p99: 12500 ms), latency samples: 2766800
two traffics test : committed: 100 txn/s, latency: 3002 ms, (p50: 2900 ms, p90: 3700 ms, p99: 4300 ms), latency samples: 1840
Max round gap was 1 [limit 4] at version 1442381. Max no progress secs was 3.9159079 [limit 10] at version 1442381.
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

✅ Forge suite framework_upgrade success on aptos-node-v1.5.1 ==> c85d9849bd99882d69e15e437881867a51490dee

Compatibility test results for aptos-node-v1.5.1 ==> c85d9849bd99882d69e15e437881867a51490dee (PR)
Upgrade the nodes to version: c85d9849bd99882d69e15e437881867a51490dee
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 4461 txn/s, latency: 7312 ms, (p50: 7800 ms, p90: 10500 ms, p99: 11000 ms), latency samples: 165060
5. check swarm health
Compatibility test for aptos-node-v1.5.1 ==> c85d9849bd99882d69e15e437881867a51490dee passed
Test Ok

@bchocho bchocho merged commit 285a61c into main Aug 2, 2023
@bchocho bchocho deleted the brian/enable-dedup-in-forge branch August 2, 2023 22:27
xbtmatt pushed a commit that referenced this pull request Aug 13, 2023
### Description

Introduce a new `Missing` version for `OnChainExecutionConfig` that will maintain backwards compatibility for before the config was registered. Use this when the config is missing from epoch manager.

Otherwise, use `default_for_genesis` for all new networks e.g., forge, devnet.

### Test Plan

Observe that e2e tests use both shuffle and dedup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants