Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

Introduce block_synchronizer_handler and wire to block waiter (part 2) #240

Merged
merged 1 commit into from
May 17, 2022

Conversation

akichidis
Copy link
Contributor

Resolves: #183

This the second part of the #183 and follow up of #227 .

It extends the block synchronizer handler in order to synchronize missing payloads. Also, the functionality is wired into the block_waiter so we make sure that if we are missing the payload (batches) for a certificate we can sync on the fly from other peers and serve it back.

Comment on lines +446 to +449
let sync_result = self
.block_synchronizer_handler
.synchronize_block_payloads(found_certificates)
.await;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super optimised strategy, as the whole process will block and wait until we have synchronized all the missing payloads.

Ideally, we could follow here a more stream based approach - so use directly the block_synchronizer which emits a message for each successful synchronized batch (best case when we do have then in near cache). However, doing this will require quite some refactoring and made the decision to not spend the time and prioritise delivery. Once we have some stress tests around this with worst case scenarios, we can refactor and improve.

@codecov
Copy link

codecov bot commented May 13, 2022

Codecov Report

Merging #240 (0ad2d6b) into main (064c55b) will increase coverage by 0.40%.
The diff coverage is 87.10%.

@@            Coverage Diff             @@
##             main     MystenLabs/narwhal#240      +/-   ##
==========================================
+ Coverage   84.49%   84.89%   +0.40%     
==========================================
  Files          95       95              
  Lines       13199    13418     +219     
==========================================
+ Hits        11152    11391     +239     
+ Misses       2047     2027      -20     
Impacted Files Coverage Δ
primary/src/block_waiter.rs 83.24% <67.24%> (-3.21%) ⬇️
primary/src/block_synchronizer/mock.rs 73.78% <67.74%> (-3.55%) ⬇️
primary/tests/integration_tests.rs 97.26% <94.28%> (-0.16%) ⬇️
primary/src/block_synchronizer/handler.rs 89.13% <96.77%> (+2.09%) ⬆️
...mary/src/block_synchronizer/tests/handler_tests.rs 97.16% <98.03%> (+0.27%) ⬆️
primary/src/tests/block_waiter_tests.rs 97.02% <100.00%> (+0.19%) ⬆️
types/src/proto.rs 96.55% <100.00%> (+1.31%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 064c55b...0ad2d6b. Read the comment docs.

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think we could gain a lot by embedding a SyncError in the PayloadSyncError.

@@ -38,14 +38,18 @@ pub enum Error {

#[error("Timed out while waiting for {block_id} to become available after submitting for processing")]
BlockDeliveryTimeout { block_id: CertificateDigest },

#[error("Payload for block with {block_id} couldn't be synchronized")]
PayloadSyncError { block_id: CertificateDigest },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be a lot more precise about diagnosing what went wrong here simply by embedding the inner SyncError in the PayloadSyncError constructor.

@@ -247,4 +258,46 @@ impl<PublicKey: VerifyingKey> Handler<PublicKey> for BlockSynchronizerHandler<Pu

results
}

#[instrument(level = "debug", skip_all)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to emit an event in case of err?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump

Copy link
Contributor Author

@akichidis akichidis May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break;
}
Some(result) => {
println!("Received result {:?}", result.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the println to stdout the best approach to making this observable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, will remove this , not needed

loop {
match rx.recv().await {
None => {
debug!("Channel closed when getting results, no more messages to get");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. could this be interpreted by someone reading logs as an error case, even when it's the normal termination case for this loop?
  2. is the debug scope or trace more appropriate? (come to think of it, this probably applies to get_block_headers above)
  3. we're probably going to want to run this task under a timeout. I know we have issues open for this, is there one to which we could tack on taking a look at payload synchronization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the debug statement purely for the testing/debug purposes - I wouldn't expect someone to try and use this log in prod (or at least I wouldn't like someone have to worry about something going wrong here on production - should make this robust so they won't have to worry about). That being said, to increase robustness probably having a timeout would make sense here. The downstream component - block_synchronizer - is already covered from timeout semantics, but it's true that we can't rely on that only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also agree, this can be downgraded to trace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akichidis akichidis force-pushed the feat/block-waiter-sync-missing-payload branch from ccde5a2 to 0dd2788 Compare May 16, 2022 16:04
@akichidis akichidis requested a review from huitseeker May 16, 2022 16:05
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch @akichidis !

@@ -247,4 +258,46 @@ impl<PublicKey: VerifyingKey> Handler<PublicKey> for BlockSynchronizerHandler<Pu

results
}

#[instrument(level = "debug", skip_all)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump

@akichidis akichidis force-pushed the feat/block-waiter-sync-missing-payload branch from 0dd2788 to 3e19829 Compare May 17, 2022 08:25
@akichidis akichidis merged commit 5cfb317 into main May 17, 2022
@akichidis akichidis deleted the feat/block-waiter-sync-missing-payload branch May 17, 2022 08:57
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 30, 2022
MystenLabs/narwhal#240)

feat: Introduce block_synchronizer_handler and wire to block waiter (part 2)

This commit is extending the block synchronizer handler to allow the payload synchronization in a synchronous way. Also the block_waiter is extended to use the block synchronizer handler to sync the (potentially) missing payloads before trying to retrieve them.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[get_collections] wire block_synchronizer in block_waiter component
2 participants