-
Notifications
You must be signed in to change notification settings - Fork 68
Introduce block_synchronizer_handler and wire to block waiter (part 2) #240
Conversation
let sync_result = self | ||
.block_synchronizer_handler | ||
.synchronize_block_payloads(found_certificates) | ||
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super optimised strategy, as the whole process will block and wait until we have synchronized all the missing payloads.
Ideally, we could follow here a more stream based approach - so use directly the block_synchronizer
which emits a message for each successful synchronized batch (best case when we do have then in near cache). However, doing this will require quite some refactoring and made the decision to not spend the time and prioritise delivery. Once we have some stress tests around this with worst case scenarios, we can refactor and improve.
Codecov Report
@@ Coverage Diff @@
## main MystenLabs/narwhal#240 +/- ##
==========================================
+ Coverage 84.49% 84.89% +0.40%
==========================================
Files 95 95
Lines 13199 13418 +219
==========================================
+ Hits 11152 11391 +239
+ Misses 2047 2027 -20
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think we could gain a lot by embedding a SyncError
in the PayloadSyncError
.
@@ -38,14 +38,18 @@ pub enum Error { | |||
|
|||
#[error("Timed out while waiting for {block_id} to become available after submitting for processing")] | |||
BlockDeliveryTimeout { block_id: CertificateDigest }, | |||
|
|||
#[error("Payload for block with {block_id} couldn't be synchronized")] | |||
PayloadSyncError { block_id: CertificateDigest }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be a lot more precise about diagnosing what went wrong here simply by embedding the inner SyncError
in the PayloadSyncError
constructor.
@@ -247,4 +258,46 @@ impl<PublicKey: VerifyingKey> Handler<PublicKey> for BlockSynchronizerHandler<Pu | |||
|
|||
results | |||
} | |||
|
|||
#[instrument(level = "debug", skip_all)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to emit an event in case of err
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added error
events inside the method, see
break; | ||
} | ||
Some(result) => { | ||
println!("Received result {:?}", result.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the println
to stdout the best approach to making this observable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, will remove this , not needed
loop { | ||
match rx.recv().await { | ||
None => { | ||
debug!("Channel closed when getting results, no more messages to get"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- could this be interpreted by someone reading logs as an error case, even when it's the normal termination case for this loop?
- is the
debug
scope ortrace
more appropriate? (come to think of it, this probably applies toget_block_headers
above) - we're probably going to want to run this task under a timeout. I know we have issues open for this, is there one to which we could tack on taking a look at payload synchronization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the debug
statement purely for the testing/debug purposes - I wouldn't expect someone to try and use this log in prod (or at least I wouldn't like someone have to worry about something going wrong here on production - should make this robust so they won't have to worry about). That being said, to increase robustness probably having a timeout would make sense here. The downstream component - block_synchronizer
- is already covered from timeout semantics, but it's true that we can't rely on that only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also agree, this can be downgraded to trace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ccde5a2
to
0dd2788
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch @akichidis !
@@ -247,4 +258,46 @@ impl<PublicKey: VerifyingKey> Handler<PublicKey> for BlockSynchronizerHandler<Pu | |||
|
|||
results | |||
} | |||
|
|||
#[instrument(level = "debug", skip_all)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump
0dd2788
to
3e19829
Compare
MystenLabs/narwhal#240) feat: Introduce block_synchronizer_handler and wire to block waiter (part 2) This commit is extending the block synchronizer handler to allow the payload synchronization in a synchronous way. Also the block_waiter is extended to use the block synchronizer handler to sync the (potentially) missing payloads before trying to retrieve them.
Resolves: #183
This the second part of the #183 and follow up of #227 .
It extends the block synchronizer handler in order to synchronize missing payloads. Also, the functionality is wired into the
block_waiter
so we make sure that if we are missing the payload (batches) for a certificate we can sync on the fly from other peers and serve it back.