Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve txn-emitter stability and visibility on Forge #8697

Merged
merged 1 commit into from
Jun 17, 2023

Conversation

igor-aptos
Copy link
Contributor

@igor-aptos igor-aptos commented Jun 15, 2023

  • visibility - add few more log lines, and print txn-emitter stats (every 1m) as well
  • make txn-emitter work more precisely. With large number of accounts/submission_workes (and graceful test has 180000 of them), there were a few issues. First creating workers is slow, and so they start with some divergence to begin with - fixing by creating them first before starting, and passing the same "start" instant to all. second divergence wasn't visible because it logged only after 240s divergence (changed to log on 5s divergence)
  • fix db flag that got changed recently for test tuned for throughput

Description

Test Plan

@igor-aptos igor-aptos added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Jun 15, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@igor-aptos igor-aptos force-pushed the igor/stats_during_txn_emitter branch from 1580203 to 917d076 Compare June 16, 2023 00:36
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@igor-aptos igor-aptos force-pushed the igor/stats_during_txn_emitter branch from 917d076 to a9dea3f Compare June 16, 2023 05:20
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@igor-aptos igor-aptos force-pushed the igor/stats_during_txn_emitter branch from a9dea3f to ff8636e Compare June 16, 2023 18:17
@github-actions

This comment has been minimized.

We have a few odd cases, and need to figure out if emitter is causing issues, or nodes.
@igor-aptos igor-aptos force-pushed the igor/stats_during_txn_emitter branch from ff8636e to 1c59c35 Compare June 16, 2023 18:22
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@igor-aptos igor-aptos changed the title Add so that Forge tests print (less frequently - every 1m) emit stats. Improve txn-emitter stability and visibility on Forge Jun 16, 2023
@github-actions

This comment has been minimized.

@igor-aptos igor-aptos requested a review from sionescu June 16, 2023 18:47
@github-actions
Copy link
Contributor

✅ Forge suite land_blocking success on 1c59c35762e6f19f53dec1496717d26fb133eea5

performance benchmark : committed: 5661 txn/s, latency: 7012 ms, (p50: 6300 ms, p90: 9000 ms, p99: 26000 ms), latency samples: 2417327
Max round gap was 1 [limit 4] at version 1404300. Max no progress secs was 3.4734979 [limit 10] at version 1531484.
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 1c59c35762e6f19f53dec1496717d26fb133eea5

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 1c59c35762e6f19f53dec1496717d26fb133eea5 (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : committed: 9211 txn/s, latency: 3611 ms, (p50: 3500 ms, p90: 5000 ms, p99: 5800 ms), latency samples: 313180
2. Upgrading first Validator to new version: 1c59c35762e6f19f53dec1496717d26fb133eea5
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5217 txn/s, latency: 6128 ms, (p50: 6500 ms, p90: 8100 ms, p99: 8700 ms), latency samples: 198260
3. Upgrading rest of first batch to new version: 1c59c35762e6f19f53dec1496717d26fb133eea5
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 4160 txn/s, latency: 7558 ms, (p50: 8300 ms, p90: 9700 ms, p99: 10400 ms), latency samples: 158100
4. upgrading second batch to new version: 1c59c35762e6f19f53dec1496717d26fb133eea5
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 6912 txn/s, latency: 4767 ms, (p50: 4600 ms, p90: 6900 ms, p99: 8400 ms), latency samples: 241920
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 1c59c35762e6f19f53dec1496717d26fb133eea5 passed
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite framework_upgrade success on aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 1c59c35762e6f19f53dec1496717d26fb133eea5

Compatibility test results for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 1c59c35762e6f19f53dec1496717d26fb133eea5 (PR)
Upgrade the nodes to version: 1c59c35762e6f19f53dec1496717d26fb133eea5
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 4747 txn/s, latency: 6810 ms, (p50: 7200 ms, p90: 9300 ms, p99: 9800 ms), latency samples: 175640
5. check swarm health
Compatibility test for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> 1c59c35762e6f19f53dec1496717d26fb133eea5 passed
Test Ok

Comment on lines +651 to +653
// Creating workers is slow with many workers (TODO check why)
// so we create them all first, before starting them - so they start at the right time for
// traffic pattern to be correct.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it takes 5 minutes to go through the loop below, in the graceful overload case (https://aptoslabs.grafana.net/goto/GaWX1nu4g?orgId=1) with 180k workers to create, which makes no sense - but at least with this change it doesn't affect the measurements.

Copy link
Contributor

@bchocho bchocho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix db flag that got changed recently for test tuned for throughput

what was this change?

@igor-aptos igor-aptos merged commit de3a664 into main Jun 17, 2023
@igor-aptos igor-aptos deleted the igor/stats_during_txn_emitter branch June 17, 2023 07:09
vusirikala pushed a commit that referenced this pull request Jun 21, 2023
#8697)

We have a few odd cases, and need to figure out if emitter is causing issues, or nodes.
banool pushed a commit that referenced this pull request Jul 7, 2023
#8697)

We have a few odd cases, and need to figure out if emitter is causing issues, or nodes.
xbtmatt pushed a commit to xbtmatt/aptos-core that referenced this pull request Jul 25, 2023
aptos-labs#8697)

We have a few odd cases, and need to figure out if emitter is causing issues, or nodes.
xbtmatt pushed a commit to xbtmatt/aptos-core that referenced this pull request Jul 25, 2023
aptos-labs#8697)

We have a few odd cases, and need to figure out if emitter is causing issues, or nodes.
gedigi pushed a commit that referenced this pull request Aug 2, 2023
#8697)

We have a few odd cases, and need to figure out if emitter is causing issues, or nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants