Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jwk #4: jwk update quorum certification #11857

Merged
merged 25 commits into from
Feb 5, 2024
Merged

jwk #4: jwk update quorum certification #11857

merged 25 commits into from
Feb 5, 2024

Conversation

zjma
Copy link
Contributor

@zjma zjma commented Feb 1, 2024

Description

Test Plan

Copy link

trunk-io bot commented Feb 1, 2024

⏱️ 39h 4m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-tests 14h 39m 🟩🟩 (+12 more)
windows-build 6h 1m 🟩🟩🟩🟩🟩 (+13 more)
rust-unit-coverage 4h 38m 🟩🟥
rust-smoke-coverage 3h 28m 🟩🟥
execution-performance / single-node-performance 2h 37m 🟩🟩🟩🟩🟩 (+4 more)
rust-smoke-tests 1h 12m 🟩🟩 (+3 more)
rust-lints 1h 9m 🟩🟩 (+12 more)
run-tests-main-branch 1h 8m 🟩🟩🟩🟩 (+13 more)
check 52m 🟩🟩 (+13 more)
rust-images / rust-all 35m 🟩🟩 (+3 more)
general-lints 34m 🟩🟩🟩🟩 (+12 more)
check-dynamic-deps 33m 🟩🟩🟩🟩🟩 (+13 more)
forge-e2e-test / forge 28m 🟩🟩
forge-compat-test / forge 27m 🟩🟩
cli-e2e-tests / run-cli-tests 16m 🟩🟩
semgrep/ci 6m 🟩🟩🟩🟩🟩 (+13 more)
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 3m 🟩🟩
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+13 more)
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+13 more)
node-api-compatibility-tests / node-api-compatibility-tests 2m 🟩🟩
execution-performance / file_change_determinator 1m 🟩🟩🟩🟩🟩 (+4 more)
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+3 more)
permission-check 55s 🟩🟩🟩🟩🟩 (+13 more)
permission-check 49s 🟩🟩🟩🟩🟩 (+13 more)
permission-check 48s 🟩🟩🟩🟩🟩 (+13 more)
permission-check 47s 🟩🟩🟩🟩🟩 (+13 more)
permission-check 24s 🟩🟩🟩🟩🟩 (+3 more)
determine-docker-build-metadata 18s 🟩🟩🟩🟩🟩 (+3 more)
upload-to-codecov 16s 🟩

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
rust-lints 12m 8m +49%
windows-build 27m 18m +47%

settingsfeedbackdocs ⋅ learn more about trunk.io

@zjma zjma force-pushed the zjma/jwk_update_qc branch from 1c7a9b9 to b019753 Compare February 1, 2024 15:37
@zjma zjma changed the title jwk update quorum certification jwk #4: jwk update quorum certification Feb 1, 2024
@zjma zjma added the JWK label Feb 1, 2024
Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: 379 lines in your changes are missing coverage. Please review.

Comparison is base (793563f) 70.1% compared to head (b019753) 69.9%.
Report is 2 commits behind head on main.

❗ Current head b019753 differs from pull request most recent head dd03efb. Consider uploading reports for the commit dd03efb to get more accurate results

Files Patch % Lines
aptos-move/aptos-vm/src/validator_txns/jwk.rs 0.0% 93 Missing ⚠️
crates/aptos-jwk-consensus/src/network.rs 0.0% 86 Missing ⚠️
types/src/jwks/mod.rs 19.6% 41 Missing ⚠️
...tos-jwk-consensus/src/certified_update_producer.rs 0.0% 27 Missing ⚠️
types/src/jwks/jwk/mod.rs 28.5% 20 Missing ⚠️
types/src/jwks/rsa/mod.rs 9.5% 19 Missing ⚠️
types/src/jwks/unsupported/mod.rs 55.5% 16 Missing ⚠️
...rates/aptos-jwk-consensus/src/network_interface.rs 0.0% 15 Missing ⚠️
crates/aptos-jwk-consensus/src/types.rs 0.0% 14 Missing ⚠️
consensus/safety-rules/src/safety_rules_manager.rs 0.0% 10 Missing ⚠️
... and 11 more
Additional details and impacted files
@@            Coverage Diff            @@
##             main   #11857     +/-   ##
=========================================
- Coverage    70.1%    69.9%   -0.3%     
=========================================
  Files        2182     2187      +5     
  Lines      414855   413937    -918     
=========================================
- Hits       291207   289676   -1531     
- Misses     123648   124261    +613     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

) -> AbortHandle;
}

pub struct RealCertifiedUpdateProducer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update the naming convention, trait starts with T, and struct removes Real

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/// Once invoked by `JWKConsensusManager` to `start_produce`,
/// it starts producing a `QuorumCertifiedUpdate` and returns an abort handle.
/// Once an `QuorumCertifiedUpdate` is available, it is sent back via a channel given earlier.
pub trait CertifiedUpdateProducer: Send + Sync {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just call it UpdateCertifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

&self,
epoch_state: Arc<EpochState>,
payload: ProviderJWKs,
qc_update_tx: Option<aptos_channel::Sender<(), QuorumCertifiedUpdate>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is Option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't have to, fixed

"adding peer observation failed with mismatched view"
);

// Verify the quorum-cert.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to verify the signature not the quorum cert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought they are the same thing..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quorum cert is a 2f+1 aggregate signature, here's a single signature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


// All checks passed. Aggregating.
aggregator.contributors.insert(sender);
let new_multi_sig = if let Some(existing) = aggregator.multi_sig.take() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks weird, we have a PartialSignatures for building it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

type Message = ObservedUpdateRequest;
type Response = ObservedUpdateResponse;

fn add(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reliable broadcast may panic, if a validator has a different observation than anyone else? If the rb receives all response but does not aggregate it will panic.

Copy link
Contributor

@danielxiangzl danielxiangzl Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fixes, either rb keeps fetching if the response is different than mine, or allow aggregation to fail and return None. In the later case the validator needs to retry rb to fetch again. Let's discuss what is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVM, I saw you already did the first one.

Base automatically changed from zjma/jwk_network_msg_defs to main February 5, 2024 04:05
@zjma zjma marked this pull request as ready for review February 5, 2024 04:11
@zjma zjma added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Feb 5, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

{
return Ok(None);
}
let multi_sig = Signature::aggregate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a verifier.aggregate_signature function

@zjma zjma enabled auto-merge (squash) February 5, 2024 13:56

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Feb 5, 2024

✅ Forge suite realistic_env_max_load success on dd03efbe8045e70482cde6c72fc94ada72f1efd7

two traffics test: inner traffic : committed: 7076 txn/s, latency: 5392 ms, (p50: 4800 ms, p90: 7200 ms, p99: 14400 ms), latency samples: 3064180
two traffics test : committed: 100 txn/s, latency: 2345 ms, (p50: 2100 ms, p90: 2500 ms, p99: 9400 ms), latency samples: 1820
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.276, avg: 0.207", "QsPosToProposal: max: 0.166, avg: 0.145", "ConsensusProposalToOrdered: max: 0.611, avg: 0.548", "ConsensusOrderedToCommit: max: 0.463, avg: 0.442", "ConsensusProposalToCommit: max: 1.023, avg: 0.990"]
Max round gap was 1 [limit 4] at version 1360489. Max no progress secs was 7.546461 [limit 15] at version 1360489.
Test Ok

Copy link
Contributor

github-actions bot commented Feb 5, 2024

✅ Forge suite compat success on aptos-node-v1.8.3 ==> dd03efbe8045e70482cde6c72fc94ada72f1efd7

Compatibility test results for aptos-node-v1.8.3 ==> dd03efbe8045e70482cde6c72fc94ada72f1efd7 (PR)
1. Check liveness of validators at old version: aptos-node-v1.8.3
compatibility::simple-validator-upgrade::liveness-check : committed: 4790 txn/s, latency: 6682 ms, (p50: 6500 ms, p90: 9500 ms, p99: 16800 ms), latency samples: 182020
2. Upgrading first Validator to new version: dd03efbe8045e70482cde6c72fc94ada72f1efd7
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1869 txn/s, latency: 15650 ms, (p50: 19500 ms, p90: 22200 ms, p99: 22500 ms), latency samples: 91620
3. Upgrading rest of first batch to new version: dd03efbe8045e70482cde6c72fc94ada72f1efd7
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1770 txn/s, latency: 16563 ms, (p50: 19000 ms, p90: 22300 ms, p99: 22600 ms), latency samples: 92080
4. upgrading second batch to new version: dd03efbe8045e70482cde6c72fc94ada72f1efd7
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3167 txn/s, latency: 9842 ms, (p50: 9900 ms, p90: 20200 ms, p99: 20600 ms), latency samples: 126680
5. check swarm health
Compatibility test for aptos-node-v1.8.3 ==> dd03efbe8045e70482cde6c72fc94ada72f1efd7 passed
Test Ok

@zjma zjma merged commit c2a3453 into main Feb 5, 2024
69 of 83 checks passed
@zjma zjma deleted the zjma/jwk_update_qc branch February 5, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR JWK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants