Introduce block_synchronizer_handler and wire to block waiter (part 2) #240

akichidis · 2022-05-13T16:52:19Z

Resolves: #183

This the second part of the #183 and follow up of #227 .

It extends the block synchronizer handler in order to synchronize missing payloads. Also, the functionality is wired into the block_waiter so we make sure that if we are missing the payload (batches) for a certificate we can sync on the fly from other peers and serve it back.

akichidis · 2022-05-13T16:56:30Z

primary/src/block_waiter.rs

+        let sync_result = self
+            .block_synchronizer_handler
+            .synchronize_block_payloads(found_certificates)
+            .await;


Not super optimised strategy, as the whole process will block and wait until we have synchronized all the missing payloads.

Ideally, we could follow here a more stream based approach - so use directly the block_synchronizer which emits a message for each successful synchronized batch (best case when we do have then in near cache). However, doing this will require quite some refactoring and made the decision to not spend the time and prioritise delivery. Once we have some stress tests around this with worst case scenarios, we can refactor and improve.

codecov · 2022-05-13T17:07:48Z

Codecov Report

Merging #240 (0ad2d6b) into main (064c55b) will increase coverage by 0.40%.
The diff coverage is 87.10%.

@@            Coverage Diff             @@
##             main     MystenLabs/narwhal#240      +/-   ##
==========================================
+ Coverage   84.49%   84.89%   +0.40%     
==========================================
  Files          95       95              
  Lines       13199    13418     +219     
==========================================
+ Hits        11152    11391     +239     
+ Misses       2047     2027      -20

Impacted Files	Coverage Δ
primary/src/block_waiter.rs	`83.24% <67.24%> (-3.21%)`	⬇️
primary/src/block_synchronizer/mock.rs	`73.78% <67.74%> (-3.55%)`	⬇️
primary/tests/integration_tests.rs	`97.26% <94.28%> (-0.16%)`	⬇️
primary/src/block_synchronizer/handler.rs	`89.13% <96.77%> (+2.09%)`	⬆️
...mary/src/block_synchronizer/tests/handler_tests.rs	`97.16% <98.03%> (+0.27%)`	⬆️
primary/src/tests/block_waiter_tests.rs	`97.02% <100.00%> (+0.19%)`	⬆️
types/src/proto.rs	`96.55% <100.00%> (+1.31%)`	⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 064c55b...0ad2d6b. Read the comment docs.

huitseeker

Looks good! I think we could gain a lot by embedding a SyncError in the PayloadSyncError.

huitseeker · 2022-05-16T13:12:22Z

primary/src/block_synchronizer/handler.rs

@@ -38,14 +38,18 @@ pub enum Error {

    #[error("Timed out while waiting for {block_id} to become available after submitting for processing")]
    BlockDeliveryTimeout { block_id: CertificateDigest },
+
+    #[error("Payload for block with {block_id} couldn't be synchronized")]
+    PayloadSyncError { block_id: CertificateDigest },


We could be a lot more precise about diagnosing what went wrong here simply by embedding the inner SyncError in the PayloadSyncError constructor.

huitseeker · 2022-05-16T13:14:23Z

primary/src/block_synchronizer/handler.rs

@@ -247,4 +258,46 @@ impl<PublicKey: VerifyingKey> Handler<PublicKey> for BlockSynchronizerHandler<Pu

        results
    }
+
+    #[instrument(level = "debug", skip_all)]


Do we want to emit an event in case of err?

I've added error events inside the method, see

https://github.com/MystenLabs/narwhal/pull/240/files#diff-a3cf9a5161c3e68bcfca99402078be2b430ddeb903bcb3bef6cf09c3dfe0a318R299

huitseeker · 2022-05-16T13:15:08Z

primary/src/block_synchronizer/handler.rs

+                    break;
+                }
+                Some(result) => {
+                    println!("Received result {:?}", result.clone());


Is the println to stdout the best approach to making this observable?

Apologies, will remove this , not needed

huitseeker · 2022-05-16T13:18:00Z

primary/src/block_synchronizer/handler.rs

+        loop {
+            match rx.recv().await {
+                None => {
+                    debug!("Channel closed when getting results, no more messages to get");


could this be interpreted by someone reading logs as an error case, even when it's the normal termination case for this loop?

is the debug scope or trace more appropriate? (come to think of it, this probably applies to get_block_headers above)

we're probably going to want to run this task under a timeout. I know we have issues open for this, is there one to which we could tack on taking a look at payload synchronization?

I've added the debug statement purely for the testing/debug purposes - I wouldn't expect someone to try and use this log in prod (or at least I wouldn't like someone have to worry about something going wrong here on production - should make this robust so they won't have to worry about). That being said, to increase robustness probably having a timeout would make sense here. The downstream component - block_synchronizer - is already covered from timeout semantics, but it's true that we can't rely on that only.

Also agree, this can be downgraded to trace

https://github.com/MystenLabs/narwhal/issues/247

huitseeker

Thanks a bunch @akichidis !

huitseeker · 2022-05-16T19:50:16Z

primary/src/block_synchronizer/handler.rs

@@ -247,4 +258,46 @@ impl<PublicKey: VerifyingKey> Handler<PublicKey> for BlockSynchronizerHandler<Pu

        results
    }
+
+    #[instrument(level = "debug", skip_all)]


MystenLabs/narwhal#240) feat: Introduce block_synchronizer_handler and wire to block waiter (part 2) This commit is extending the block synchronizer handler to allow the payload synchronization in a synchronous way. Also the block_waiter is extended to use the block synchronizer handler to sync the (potentially) missing payloads before trying to retrieve them.

akichidis commented May 13, 2022

View reviewed changes

akichidis requested review from huitseeker, arun-koshy and asonnino May 15, 2022 16:02

huitseeker reviewed May 16, 2022

View reviewed changes

akichidis force-pushed the feat/block-waiter-sync-missing-payload branch from ccde5a2 to 0dd2788 Compare May 16, 2022 16:04

akichidis requested a review from huitseeker May 16, 2022 16:05

huitseeker approved these changes May 16, 2022

View reviewed changes

rebased main

3e19829

akichidis force-pushed the feat/block-waiter-sync-missing-payload branch from 0dd2788 to 3e19829 Compare May 17, 2022 08:25

akichidis merged commit 5cfb317 into main May 17, 2022

akichidis deleted the feat/block-waiter-sync-missing-payload branch May 17, 2022 08:57

huitseeker mentioned this pull request May 31, 2022

[umbrella] Finalize causal completion in read_causal lookups #335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce block_synchronizer_handler and wire to block waiter (part 2) #240

Introduce block_synchronizer_handler and wire to block waiter (part 2) #240

akichidis commented May 13, 2022

akichidis May 13, 2022

codecov bot commented May 13, 2022

huitseeker left a comment

huitseeker May 16, 2022

huitseeker May 16, 2022

huitseeker May 16, 2022

akichidis May 17, 2022 •

edited

Loading

huitseeker May 16, 2022

akichidis May 16, 2022

huitseeker May 16, 2022

akichidis May 16, 2022

akichidis May 16, 2022

akichidis May 16, 2022

huitseeker left a comment

huitseeker May 16, 2022

Introduce block_synchronizer_handler and wire to block waiter (part 2) #240

Introduce block_synchronizer_handler and wire to block waiter (part 2) #240

Conversation

akichidis commented May 13, 2022

Choose a reason for hiding this comment

codecov bot commented May 13, 2022

Codecov Report

huitseeker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akichidis May 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huitseeker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akichidis May 17, 2022 •

edited

Loading