[Quorum Store] Implementation of quorum store components #6055

bchocho · 2023-01-03T22:22:20Z

Description

Implementation of Quorum Store. See component diagram at https://drive.google.com/file/d/1Vu3G_z6zOueljBnnPLZp4VIZp4Oo_AwQ/view?usp=sharing

Quorum store is still disabled by default (via onchain config).

Test Plan

All tests pass without quorum store enabled.

With quorum store enabled (by hard-coding the onchain config):

All unit tests pass, except twins tests. It's tricky to enable twins to run in both modes, we can clean this up when quorum store becomes the default.
All smoke tests pass, except test_upgrade_flow. This needs some investigation.
Land-blocking forge tests pass.
Consensus-only forge tests passes.

Before enabling quorum store, we additionally need to:

Pass all forge non-flaky tests when quorum store = true
Add new tests:
- Smoke test for onchain config flip
- Forge test for onchain config flip
- New failpoint tests
- Smoke/forge test for a malicious node running a different quorum store mode

… into quorum-store-reset

…tos-core into quorum-store-reset

github-actions · 2023-02-07T06:25:57Z

✅ Forge suite `land_blocking` success on `fc7cfc26d2e2e7204fbbb65e2769a08558168242`

performance benchmark with full nodes : 5888 TPS, 6709 ms latency, 10500 ms p99 latency,(!) expired 700 out of 2514960 txns
Test Ok

github-actions · 2023-02-07T06:27:27Z

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `fc7cfc26d2e2e7204fbbb65e2769a08558168242`

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> fc7cfc26d2e2e7204fbbb65e2769a08558168242 (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7783 TPS, 4948 ms latency, 7200 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: fc7cfc26d2e2e7204fbbb65e2769a08558168242
compatibility::simple-validator-upgrade::single-validator-upgrade : 5227 TPS, 7602 ms latency, 9300 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: fc7cfc26d2e2e7204fbbb65e2769a08558168242
compatibility::simple-validator-upgrade::half-validator-upgrade : 4520 TPS, 8894 ms latency, 11100 ms p99 latency,no expired txns
4. upgrading second batch to new version: fc7cfc26d2e2e7204fbbb65e2769a08558168242
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6873 TPS, 5628 ms latency, 11300 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> fc7cfc26d2e2e7204fbbb65e2769a08558168242 passed
Test Ok

github-actions · 2023-02-07T06:32:38Z

✅ Forge suite `consensus_only_perf_benchmark` success on `consensus_only_perf_test_fc7cfc26d2e2e7204fbbb65e2769a08558168242` ==> `fc7cfc26d2e2e7204fbbb65e2769a08558168242`

Test Ok

igor-aptos

overall, looks good! I've made bunch of comments/questions/suggestions inline, but even if you want to address any of those, you can do so in the followup PR, I am accepting as this one is huge, and there is nothing important that is off.

igor-aptos · 2023-01-24T22:19:38Z

config/src/config/consensus_config.rs

@@ -48,10 +49,10 @@ pub struct ChainHealthBackoffValues {
 impl Default for ConsensusConfig {
    fn default() -> ConsensusConfig {
        ConsensusConfig {
-            max_sending_block_txns: 2500,
+            max_sending_block_txns: 4000,


yeah, we cannot really change these now.

either :

keep them as is, and then increase in the next release once QS is enabled in main

or have new fields (qs_max_sending_block_txns, and others) in the interrim, before the cleanup)

igor-aptos · 2023-01-24T22:34:32Z

config/src/config/quorum_store_config.rs

+
+#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
+#[serde(default, deny_unknown_fields)]
+pub struct QuorumStoreConfig {


can you add some more comments what are the constants and what are they referring to? for example what channel is "channel_size" here

igor-aptos · 2023-02-08T18:20:41Z

consensus/consensus-types/src/request_response.rs

@@ -10,8 +10,7 @@ use aptos_crypto::HashValue;
 use futures::channel::oneshot;
 use std::{fmt, fmt::Formatter};

-/// Message sent from Consensus to QuorumStore.
-pub enum PayloadRequest {
+pub enum BlockProposalCommand {


should this be "GetBlockProposalCommand" ?

proposal generator issues this, and then it proposes there. this command prepares the proposal only?

igor-aptos · 2023-02-08T18:28:42Z

consensus/src/network.rs

@@ -302,6 +304,12 @@ impl NetworkSender {

 #[async_trait::async_trait]
 impl QuorumStoreSender for NetworkSender {
+    async fn send_batch_request(&self, request: BatchRequest, recipients: Vec<Author>) {
+        fail_point!("consensus::send_batch_request", |_| ());


use the same convention :
consensus::send_batch_request => consensus::send::batch_request

and for consensus::send_batch and consensus::send_signed_digest below

Good catch! #6650

igor-aptos · 2023-02-08T18:50:39Z

consensus/src/quorum_store/quorum_store_builder.rs

+
+pub enum QuorumStoreBuilder {
+    DirectMempool(DirectMempoolInnerBuilder),
+    InQuorumStore(InnerBuilder),


What does "In" here refer to?

igor-aptos · 2023-02-08T22:20:33Z

consensus/src/quorum_store/quorum_store_builder.rs

+        {
+            let batch_coordinator = BatchCoordinator::new(
+                self.epoch,
+                self.author,


I maybe missing something, but shouldn't remote batch coordinator be creating batches authored by a remote node , not local peer id? Or is this used for something else?

igor-aptos · 2023-02-08T22:23:22Z

consensus/src/quorum_store/quorum_store_builder.rs

+        for (i, remote_batch_coordinator_cmd_rx) in
+            self.remote_batch_coordinator_cmd_rx.into_iter().enumerate()
+        {
+            let batch_coordinator = BatchCoordinator::new(


what's the reason for having separate batch coordinator for each "batch author"?

Is it because we want to process each stream independently (i.e. have 100 receiver loops), or because it makes BatchCoordinator code cleaner to care about the single author? If the second, should we have a single channel, and have a dispatcher to give the message to appropriate coordinator, instead of having 100 loops?

We create batch coordinator workers == num_workers_for_remote_fragments (which can be smaller than the number of peers).

The reason is closer to the former. Most importantly, we want the remote fragments not to block the local fragments (which is even accomplished with == 1). Then we want the remote fragments to not block each other.

igor-aptos · 2023-02-08T22:40:40Z

consensus/src/quorum_store/proof_coordinator.rs

+    digest_to_proof: HashMap<HashValue, IncrementalProofState>,
+    digest_to_time: HashMap<HashValue, u64>,
+    // to record the batch creation time
+    timeouts: DigestTimeouts,


this confused me, these is not recording what has timeouted, but at what point in the future something will expire

maybe expirations is better?

igor-aptos · 2023-02-08T22:44:35Z

consensus/src/quorum_store/proof_coordinator.rs

+
+    fn expire(&mut self) {
+        for digest in self.timeouts.expire() {
+            if let Some(state) = self.digest_to_proof.remove(&digest) {


add comment:

// check if proof hasn't completed already

as that is the reason for there not being a value

igor-aptos · 2023-02-08T23:42:30Z

consensus/src/quorum_store/network_listener.rs

+                        self.remote_batch_coordinator_tx.len(),
+                        idx
+                    );
+                    self.remote_batch_coordinator_tx[idx]


where does the remote_batch_coordinator receive EndBatch?

also if processing remote and local fragments is different, why don't we have two classes - LocalBatchCoordinator and RemoteBatchCoordinator, if they have no overlap?

@zekun000 had the same comment. I'll create an item to work on this.

…imultaneously'

…imultaneously' (#6543)

### Description Based on feedback in #6055

danielxiangzl and others added 30 commits October 4, 2022 14:14

change back to close loop forge, comment something from QS DB

c97f66f

Merge branch 'quorum-store-reset' of github.com:aptos-labs/aptos-core…

3c0adf5

… into quorum-store-reset

try different network workers for different QS message types

4baff03

add end_batch counter

cd664e3

comment out expiration round check just to see performance

04422a9

Merge branch 'main' into quorum-store-reset

e5d0eb8

Merge branch 'main' into quorum-store-reset

6202909

test 1 network worker

91d16a0

back to multiple network workers, bring back QC DB

016677e

try parameters

351a062

parameters

aca53f9

disable tokio-console

b0308eb

parameter

487975d

parameter

cd0cb7f

change run time

4e3cc5e

nit

e108bed

parameter

32c0c7d

order PoS for consensus

6f01d67

Merge branch 'quorum-store-reset' of https://github.com/aptos-labs/ap…

bbffe5f

…tos-core into quorum-store-reset

fmt

abed4b3

Merge branch 'main' into quorum-store-reset

66b5720

try parameter

785759a

run forge with different tps

a89fcdf

trucking num bytes in PoS plus security bug fix

801393f

Merge branch 'quorum-store-reset' of https://github.com/aptos-labs/ap…

9a21bb5

…tos-core into quorum-store-reset

Merge branch 'main' into quorum-store-reset

68659b0

try standard forge

775494a

Merge branch 'main' into quorum-store-reset

04d23b7

Merge branch 'main' into quorum-store-reset

d4313ff

run limited bw test

36e0818

bchocho requested review from gregnazario, JoshLind, msmouse, lightmark, gelash, banool and 0xmaayan as code owners February 7, 2023 05:26

Merge branch 'main' into brian/quorum-store-restructure-1230

fc7cfc2

This comment has been minimized.

Sign in to view

bchocho requested review from zekun000 and igor-aptos February 8, 2023 01:32

igor-aptos self-assigned this Feb 8, 2023

igor-aptos approved these changes Feb 8, 2023

View reviewed changes

zekun000 approved these changes Feb 9, 2023

View reviewed changes

bchocho merged commit 3724b96 into main Feb 9, 2023

bchocho deleted the brian/quorum-store-restructure-1230 branch February 9, 2023 00:15

bchocho added a commit that referenced this pull request Feb 9, 2023

Main was broken because of changes in #6497 and #6055 which landed 's…

a81e4fb

…imultaneously'

bchocho mentioned this pull request Feb 9, 2023

[build fix] One-liner fix to unbreak main #6543

Merged

movekevin pushed a commit that referenced this pull request Feb 9, 2023

Main was broken because of changes in #6497 and #6055 which landed 's…

7fc95ab

…imultaneously' (#6543)

angieyth mentioned this pull request Feb 9, 2023

[transaction test] refactoring publishing flow to test_context #6529

Merged

bchocho mentioned this pull request Feb 17, 2023

[Quorum Store] rename enums to be clearer #6670

Merged

bchocho added a commit that referenced this pull request Mar 1, 2023

Rename structs/enums that were confusing (#6670)

b743ba3

### Description Based on feedback in #6055

thepomeranian mentioned this pull request Apr 20, 2023

[AIP-26][Discussion]Quorum Store aptos-foundation/AIPs#108

Closed

St0nersdash mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 1.6.0 to 1.6.3 St0nersdash/aptos-core#44

Merged

MuMianliu mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.3 MuMianliu/aptos-core#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quorum Store] Implementation of quorum store components #6055

[Quorum Store] Implementation of quorum store components #6055

bchocho commented Jan 3, 2023 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Feb 7, 2023

github-actions bot commented Feb 7, 2023

github-actions bot commented Feb 7, 2023

igor-aptos left a comment

igor-aptos Jan 24, 2023

igor-aptos Jan 24, 2023

igor-aptos Feb 8, 2023

igor-aptos Feb 8, 2023

bchocho Feb 16, 2023

igor-aptos Feb 8, 2023

igor-aptos Feb 8, 2023

igor-aptos Feb 8, 2023

bchocho Feb 15, 2023

igor-aptos Feb 8, 2023

igor-aptos Feb 8, 2023

igor-aptos Feb 8, 2023

bchocho Feb 16, 2023

[Quorum Store] Implementation of quorum store components #6055

[Quorum Store] Implementation of quorum store components #6055

Conversation

bchocho commented Jan 3, 2023 • edited Loading

Description

Test Plan

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Feb 7, 2023

✅ Forge suite land_blocking success on fc7cfc26d2e2e7204fbbb65e2769a08558168242

github-actions bot commented Feb 7, 2023

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> fc7cfc26d2e2e7204fbbb65e2769a08558168242

github-actions bot commented Feb 7, 2023

✅ Forge suite consensus_only_perf_benchmark success on consensus_only_perf_test_fc7cfc26d2e2e7204fbbb65e2769a08558168242 ==> fc7cfc26d2e2e7204fbbb65e2769a08558168242

igor-aptos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bchocho commented Jan 3, 2023 •

edited

Loading

✅ Forge suite `land_blocking` success on `fc7cfc26d2e2e7204fbbb65e2769a08558168242`

✅ Forge suite `compat` success on `testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b` ==> `fc7cfc26d2e2e7204fbbb65e2769a08558168242`

✅ Forge suite `consensus_only_perf_benchmark` success on `consensus_only_perf_test_fc7cfc26d2e2e7204fbbb65e2769a08558168242` ==> `fc7cfc26d2e2e7204fbbb65e2769a08558168242`