Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randomness #4: RandManager update from randomnet #12224

Merged
merged 3 commits into from
Feb 28, 2024

Conversation

zjma
Copy link
Contributor

@zjma zjma commented Feb 25, 2024

Description

Test Plan

Copy link

trunk-io bot commented Feb 25, 2024

⏱️ 16h 56m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-coverage 4h 35m 🟩
rust-smoke-coverage 3h 12m 🟩
windows-build 1h 45m 🟩🟩🟩🟩🟩
rust-unit-tests 1h 21m 🟩🟩
rust-smoke-tests 1h 16m 🟩🟩
execution-performance / single-node-performance 1h 15m 🟩🟩🟩🟩
rust-images / rust-all 46m 🟩🟩🟩
forge-e2e-test / forge 32m 🟩🟩
forge-compat-test / forge 29m 🟩🟩
rust-lints 23m 🟩🟩🟩
run-tests-main-branch 20m 🟥🟥🟥
cli-e2e-tests / run-cli-tests 16m 🟩🟩
check 15m 🟩🟩🟩
check-dynamic-deps 11m 🟩🟩🟩🟩🟩
general-lints 7m 🟩🟩🟩
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 7m 🟩🟩🟩
node-api-compatibility-tests / node-api-compatibility-tests 3m 🟩🟩🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩
file_change_determinator 47s 🟩🟩🟩🟩🟩
file_change_determinator 46s 🟩🟩🟩🟩
execution-performance / file_change_determinator 41s 🟩🟩🟩🟩
file_change_determinator 40s 🟩🟩🟩🟩
permission-check 18s 🟩🟩🟩🟩🟩
permission-check 15s 🟩🟩🟩🟩🟩
permission-check 13s 🟩🟩🟩🟩🟩
upload-to-codecov 13s 🟩
permission-check 11s 🟩🟩🟩🟩🟩
determine-docker-build-metadata 10s 🟩🟩🟩🟩
permission-check 9s 🟩🟩🟩🟩

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
run-tests-main-branch 6m 4m +66%
rust-images / rust-all 19m 14m +35%

settingsfeedbackdocs ⋅ learn more about trunk.io

@zjma zjma marked this pull request as ready for review February 25, 2024 03:05
@zjma zjma added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Feb 25, 2024
@zjma zjma requested review from danielxiangzl and removed request for ibalajiarun and sasha8 February 25, 2024 03:05

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link

codecov bot commented Feb 25, 2024

Codecov Report

Attention: Patch coverage is 37.36730% with 295 lines in your changes are missing coverage. Please review.

Project coverage is 70.0%. Comparing base (c4fcbb9) to head (ac14b75).
Report is 22 commits behind head on main.

❗ Current head ac14b75 differs from pull request most recent head 4f992f3. Consider uploading reports for the commit 4f992f3 to get more accurate results

Files Patch % Lines
consensus/src/rand/rand_gen/types.rs 15.2% 144 Missing ⚠️
consensus/src/rand/rand_gen/rand_manager.rs 0.0% 64 Missing ⚠️
consensus/src/rand/rand_gen/storage/db.rs 0.0% 21 Missing ⚠️
consensus/src/counters.rs 0.0% 14 Missing ⚠️
...nsus/src/rand/rand_gen/reliable_broadcast_state.rs 0.0% 12 Missing ⚠️
consensus/src/rand/rand_gen/storage/schema.rs 0.0% 12 Missing ⚠️
consensus/src/rand/rand_gen/aug_data_store.rs 0.0% 10 Missing ⚠️
consensus/src/rand/rand_gen/storage/in_memory.rs 0.0% 10 Missing ⚠️
consensus/src/rand/rand_gen/block_queue.rs 0.0% 6 Missing ⚠️
consensus/src/rand/rand_gen/rand_store.rs 98.5% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #12224    +/-   ##
========================================
  Coverage    70.0%    70.0%            
========================================
  Files        2240     2240            
  Lines      422197   422544   +347     
========================================
+ Hits       295551   295878   +327     
- Misses     126646   126666    +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pub fn observe_queue(&self) {
let queue = &self.block_queue.queue;
RAND_QUEUE_SIZE.set(queue.len() as i64);
info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downgrade to debug? Or maybe we only print if the queue is too large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@@ -85,9 +86,20 @@ impl QueueItem {

/// Maintain ordered blocks that have pending randomness
pub struct BlockQueue {
queue: BTreeMap<Round, QueueItem>,
pub queue: BTreeMap<Round, QueueItem>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this pub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

let _guard = self.broadcast_certified_aug_data(certified_data);

let _guard = self.broadcast_aug_data().await;
let mut interval = tokio::time::interval(Duration::from_millis(5000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just use from_secs

This comment has been minimized.

This comment has been minimized.

@zjma zjma enabled auto-merge (squash) February 28, 2024 02:26

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.9.5 ==> 4f992f31c1bdc4e264990f65ef3e9c187c7eedc2

Compatibility test results for aptos-node-v1.9.5 ==> 4f992f31c1bdc4e264990f65ef3e9c187c7eedc2 (PR)
1. Check liveness of validators at old version: aptos-node-v1.9.5
compatibility::simple-validator-upgrade::liveness-check : committed: 6045 txn/s, latency: 4891 ms, (p50: 4800 ms, p90: 7800 ms, p99: 8700 ms), latency samples: 241800
2. Upgrading first Validator to new version: 4f992f31c1bdc4e264990f65ef3e9c187c7eedc2
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1530 txn/s, latency: 18067 ms, (p50: 18500 ms, p90: 23200 ms, p99: 24100 ms), latency samples: 78040
3. Upgrading rest of first batch to new version: 4f992f31c1bdc4e264990f65ef3e9c187c7eedc2
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 47 txn/s, submitted: 597 txn/s, expired: 550 txn/s, latency: 21644 ms, (p50: 22900 ms, p90: 22900 ms, p99: 39200 ms), latency samples: 3187
4. upgrading second batch to new version: 4f992f31c1bdc4e264990f65ef3e9c187c7eedc2
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 2353 txn/s, latency: 12807 ms, (p50: 13900 ms, p90: 17200 ms, p99: 17800 ms), latency samples: 105920
5. check swarm health
Compatibility test for aptos-node-v1.9.5 ==> 4f992f31c1bdc4e264990f65ef3e9c187c7eedc2 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 4f992f31c1bdc4e264990f65ef3e9c187c7eedc2

two traffics test: inner traffic : committed: 6779 txn/s, latency: 5771 ms, (p50: 5400 ms, p90: 7500 ms, p99: 13200 ms), latency samples: 2935600
two traffics test : committed: 100 txn/s, latency: 2102 ms, (p50: 1900 ms, p90: 2400 ms, p99: 9000 ms), latency samples: 1760
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.297, avg: 0.211", "QsPosToProposal: max: 0.368, avg: 0.274", "ConsensusProposalToOrdered: max: 0.617, avg: 0.564", "ConsensusOrderedToCommit: max: 0.401, avg: 0.365", "ConsensusProposalToCommit: max: 0.954, avg: 0.930"]
Max round gap was 1 [limit 4] at version 1367664. Max no progress secs was 3.810017 [limit 15] at version 1367666.
Test Ok

@zjma zjma merged commit 6f0109f into main Feb 28, 2024
68 of 83 checks passed
@zjma zjma deleted the zjma/rand_manager_update_from_randomnet branch February 28, 2024 03:01
zjma added a commit that referenced this pull request Mar 1, 2024
* RandManager update from randomnet

* lint

* lint
zjma added a commit that referenced this pull request Mar 2, 2024
* RandManager update from randomnet

* lint

* lint
zjma added a commit that referenced this pull request Mar 3, 2024
* randomness type update 3 (#12202)

* randomness #4: RandManager update from randomnet (#12224)

* RandManager update from randomnet

* lint

* lint

* randomness #5: consensus update from randomnet (#12225)

* consensus update from randomnet

* update

* randomness #6: dkg manager update from randomnet (#12226)

* consensus update from randomnet

* update

* update execution client api

* dkg manager update from randomnet

* avoid panic

* make api, indexer, fake aptos db aware of block metadata ext txns (#12227)

* randomness #8: framework update from randomnet (#12228)

* framework update from randomnet

Squashed commit of the following to fix jwk smoke tests:

commit 3bd0154
Author: zhoujun.ma <[email protected]>
Date:   Tue Feb 27 02:47:57 2024 -0800

    update

commit 2eb6add
Author: zhoujun.ma <[email protected]>
Date:   Tue Feb 27 02:12:27 2024 -0800

    update

commit 9d82151
Author: zhoujun.ma <[email protected]>
Date:   Tue Feb 27 01:51:08 2024 -0800

    debug

fix doc test

fix spec

fix doc

update initialization in genesis

update features.move

initialize randomness in genesis

update golden files

private entry fun check and vm updates

* postpone release builder changes

* update goldenfiles

* fix is_safe_call spec

* randomness #9: smoke tests from randomnet (#12282)

* smoke test deps and 1st case from randomnet

* update

* more smoke tests

* randomness #10: randomness API update from randomnet (#12335)

* [move] fixes to `randomness.move` (#12250)

* [move] fixes to randomness.move

* Fixed the Prover spec

Fixed the spec to unblock the PR.

Need to prove the introduced assumptions with proper loop invariants, which should be provable.

* lint

---------

Co-authored-by: Junkil Park <[email protected]>
Co-authored-by: danielxiangzl <[email protected]>

* fix specs

---------

Co-authored-by: Alin Tomescu <[email protected]>
Co-authored-by: Junkil Park <[email protected]>
Co-authored-by: danielxiangzl <[email protected]>

* lint

* update genesis

* on-chain resources to indicate dkg/randomness failure injection (#12345)

* dkg/randomness failure injection

* update

* smoke test

* update

* update

* fix scripts

---------

Co-authored-by: Alin Tomescu <[email protected]>
Co-authored-by: Junkil Park <[email protected]>
Co-authored-by: danielxiangzl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants