Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[consensus] verify proposals in parallel, add counters #12209

Merged
merged 3 commits into from
Feb 26, 2024
Merged

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Feb 23, 2024

Description

For blocks with many batches, using par_iter significantly improves proposal processing time.

The counters were used to find this issue in prep for previewnet.

Test Plan

Observe counters changed in forge. Existing forge tests.

Copy link

trunk-io bot commented Feb 23, 2024

@bchocho bchocho marked this pull request as ready for review February 23, 2024 22:23
@bchocho bchocho requested a review from sitalkedia February 23, 2024 22:23
proof_with_status
.proofs
.par_iter()
.try_for_each(|proof| proof.verify(validator))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to chunk them instead of every single one, spawn task is non trivial as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, rayon spawn is very expensive - especially done on small item. Use a min size with the parallel item.

@@ -85,17 +85,24 @@ impl UnverifiedEvent {
max_num_batches: usize,
max_batch_expiry_gap_usecs: u64,
) -> Result<VerifiedEvent, VerifyError> {
let start_time = Instant::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use monitor! macro instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want to create a separate one for each event. Isn't it a bit cleaner to have a separate HistogramVec?

@bchocho bchocho enabled auto-merge (squash) February 26, 2024 22:25

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 5569cc43e269465e3217523d35756fdf482655bd

two traffics test: inner traffic : committed: 7535 txn/s, latency: 5188 ms, (p50: 4800 ms, p90: 6600 ms, p99: 10200 ms), latency samples: 3270580
two traffics test : committed: 100 txn/s, latency: 1904 ms, (p50: 1900 ms, p90: 2200 ms, p99: 2500 ms), latency samples: 1920
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.300, avg: 0.211", "QsPosToProposal: max: 0.216, avg: 0.196", "ConsensusProposalToOrdered: max: 0.556, avg: 0.512", "ConsensusOrderedToCommit: max: 0.363, avg: 0.334", "ConsensusProposalToCommit: max: 0.871, avg: 0.847"]
Max round gap was 1 [limit 4] at version 1146349. Max no progress secs was 3.960878 [limit 15] at version 1146349.
Test Ok

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.9.5 ==> 5569cc43e269465e3217523d35756fdf482655bd

Compatibility test results for aptos-node-v1.9.5 ==> 5569cc43e269465e3217523d35756fdf482655bd (PR)
1. Check liveness of validators at old version: aptos-node-v1.9.5
compatibility::simple-validator-upgrade::liveness-check : committed: 6359 txn/s, latency: 5028 ms, (p50: 4800 ms, p90: 8700 ms, p99: 10800 ms), latency samples: 235300
2. Upgrading first Validator to new version: 5569cc43e269465e3217523d35756fdf482655bd
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1232 txn/s, latency: 23285 ms, (p50: 25100 ms, p90: 31200 ms, p99: 34400 ms), latency samples: 64080
3. Upgrading rest of first batch to new version: 5569cc43e269465e3217523d35756fdf482655bd
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 296 txn/s, submitted: 574 txn/s, expired: 277 txn/s, latency: 36970 ms, (p50: 42700 ms, p90: 59600 ms, p99: 60700 ms), latency samples: 21084
4. upgrading second batch to new version: 5569cc43e269465e3217523d35756fdf482655bd
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 2188 txn/s, latency: 13580 ms, (p50: 15700 ms, p90: 17800 ms, p99: 18600 ms), latency samples: 100680
5. check swarm health
Compatibility test for aptos-node-v1.9.5 ==> 5569cc43e269465e3217523d35756fdf482655bd passed
Test Ok

@bchocho bchocho merged commit e80291b into main Feb 26, 2024
80 checks passed
@bchocho bchocho deleted the brian/par-iter branch February 26, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants