-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quorum Store] networking integration #5779
Conversation
a36cc36
to
4c22759
Compare
} | ||
} | ||
|
||
// TODO: implement properly (and proper place) w.o. public fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Fields in SignedDigestInfo and SignedDigest are public. @gelash ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's okay, but I guess we wanted to revisit whether the fields should be pub or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks very weird especially one field is private and has getter and others are public, we should be consistent
consensus/src/quorum_store/types.rs
Outdated
pub(crate) batch_info: BatchInfo, | ||
} | ||
|
||
// TODO: make epoch, source, signature fields treatment consistent across structs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Mostly around quorum_store_enabled verification.
consensus/src/network.rs
Outdated
let msg = ConsensusMsg::ProofOfStoreMsg(Box::new(proof_of_store)); | ||
self.broadcast_without_self(msg).await | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move the two above methods to QuorumStoreSender trait?
consensus/src/network_interface.rs
Outdated
/// Quorum Store: Send a signed batch digest. This is a vote for the batch and a promise that | ||
/// the batch of transactions was received and will be persisted until batch expiration. | ||
SignedDigestMsg(Box<SignedDigest>), | ||
/// Quorum Store: Broadcast a completed proof of store (a digest that received 2f+1 votes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed -> certified or valid?
validator: &ValidatorVerifier, | ||
quorum_store_enabled: bool, | ||
) -> anyhow::Result<()> { | ||
match (quorum_store_enabled, self) { | ||
(false, Payload::DirectMempool(_)) => Ok(()), | ||
(true, Payload::InQuorumStore(proof_with_status)) => { | ||
for proof in proof_with_status.proofs.iter() { | ||
proof.verify(validator)?; | ||
proof.verify(peer_id, validator, quorum_store_enabled)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pass quorum_store_enabled here? We know it is true from the match so why check again inside the verify?
) -> anyhow::Result<()> { | ||
if !quorum_store_enabled { | ||
return Err(anyhow::anyhow!( | ||
"Quorum store is not enabled locally. Sender: {}, epoch: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we need this as this verify will always be called with quorum_store_enabled = true. See the comment above.
} | ||
} | ||
|
||
// TODO: implement properly (and proper place) w.o. public fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Fields in SignedDigestInfo and SignedDigest are public. @gelash ?
consensus/src/round_manager.rs
Outdated
@@ -91,6 +98,23 @@ impl UnverifiedEvent { | |||
cd.verify(validator)?; | |||
VerifiedEvent::CommitDecision(cd) | |||
} | |||
UnverifiedEvent::FragmentMsg(f) => { | |||
f.verify(peer_id, quorum_store_enabled)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be cleaner to check quorum_store_enabled before? We know already here that if it is false we should not even got quorum store messages.
Maybe check it in process_message() in epoch_manager?
consensus/src/quorum_store/types.rs
Outdated
"Quorum store is not enabled locally. Sender: {}, epoch: {}", | ||
peer_id, | ||
self.epoch(), | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. I would check quorum_store_enabled when getting the message.
- Add messages for Fragment, Batch, SignedDigest, ProofOfStore - Add sender-only verification for some messages - Add a separate FIFO local channel for quorum store messages - Add and use broadcast without sending to self
19adda4
to
cc72c96
Compare
consensus/src/epoch_manager.rs
Outdated
@@ -836,6 +846,29 @@ impl EpochManager { | |||
Ok(None) | |||
} | |||
|
|||
async fn check_quorum_store_enabled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is confusing. Maybe check_unverified_event_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not an async function, also isn't this redundant since we already pass self.quorum_store_enabled
to verify function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally did the check in the verify function, but changed it to do the separate check based on Sasha's feedback (#5779 (comment)). I think this is a bit clearer; it avoids dirtying the verify function which should really verify the correctness of the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we testing backwards compatibility?
} | ||
} | ||
|
||
// TODO: implement properly (and proper place) w.o. public fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks very weird especially one field is private and has getter and others are public, we should be consistent
consensus/src/epoch_manager.rs
Outdated
@@ -836,6 +846,29 @@ impl EpochManager { | |||
Ok(None) | |||
} | |||
|
|||
async fn check_quorum_store_enabled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not an async function, also isn't this redundant since we already pass self.quorum_store_enabled
to verify function
consensus/src/network.rs
Outdated
@@ -289,6 +344,11 @@ impl NetworkTask { | |||
) -> (NetworkTask, NetworkReceivers) { | |||
let (consensus_messages_tx, consensus_messages) = | |||
aptos_channel::new(QueueStyle::LIFO, 1, Some(&counters::CONSENSUS_CHANNEL_MSGS)); | |||
let (quorum_store_messages_tx, quorum_store_messages) = aptos_channel::new( | |||
QueueStyle::FIFO, | |||
1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a lot of messages, why do we need to buffer so many?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really shouldn't need to buffer this many, especially if back pressure is working well. However, it looks like we'll hve to redo back pressure, so I'm inclined to just keep this for now and revisit it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may consume significant amount of memory if not dequeue fast enough, imagine a fragment message with maximum message size (~64MB) * 1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's probably risky even with a single bad validator and quorum store off. Reduced to 50 for now.
consensus/src/network.rs
Outdated
BlockStage::NETWORK_RECEIVED, | ||
); | ||
} | ||
if let Err(e) = self.consensus_messages_tx.push( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this piece of code seems repetitive
consensus/src/quorum_store/types.rs
Outdated
} | ||
} | ||
|
||
pub(crate) fn take_transactions(self) -> Vec<SerializedTransaction> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is typically called into_transactions
, take mostly refers to a &mut self function
consensus/src/quorum_store/types.rs
Outdated
#[derive(Clone, Debug, Deserialize, Serialize, PartialEq, Eq)] | ||
pub struct Batch { | ||
pub(crate) source: PeerId, | ||
// None is a request, Some(payload) is a response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems hacky, we should just have two types
- Private for all members of "message" types. We will likely have to implement getters/setters when integrating QS implementation - Public for all members of *Info types. These are pure containers. - Remove (crate) where the mod is already pub(crate)
@davidiw for these integration PRs the "compat" e2e test should be enough. For quorum store as a whole, it will be enabled via onchain configs, and we plan to write forge tests that test the transition when the config is flipped. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@zekun000 I iterated on your comments and also merged the latest changes in main. Please review! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
consensus/src/network.rs
Outdated
@@ -289,6 +344,11 @@ impl NetworkTask { | |||
) -> (NetworkTask, NetworkReceivers) { | |||
let (consensus_messages_tx, consensus_messages) = | |||
aptos_channel::new(QueueStyle::LIFO, 1, Some(&counters::CONSENSUS_CHANNEL_MSGS)); | |||
let (quorum_store_messages_tx, quorum_store_messages) = aptos_channel::new( | |||
QueueStyle::FIFO, | |||
1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may consume significant amount of memory if not dequeue fast enough, imagine a fragment message with maximum message size (~64MB) * 1000
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Description
Quorum Store related changes to consensus networking.
Test Plan
Existing tests. The logic is coming in future PRs.