-
Notifications
You must be signed in to change notification settings - Fork 68
[fix] block_waiter serve blocks with batches of similar digest ids #746
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great fix, thanks a lot for the explicit comments, context, and tests, too!
primary/src/block_waiter.rs
Outdated
if let Err(err) = s.send(result.clone()) { | ||
error!("Couldn't send batch result {} message to channel [{:?}] for block_id {}", batch_id, err, id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Here and in a few of the lines above you're using a lot of if let Err(err) = xxx
and logging coupled.
Nothing wrong with that, but some may find it tedious to read. You might want to look at tap_err and possibly use captured identifiers ({id}
) to make this a bit lighter. YMMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this specific case I am only consuming the error just to print it. Using the tap_err
in the end is making me (because of clippy) consume the result anyways and do a let _ = ....
. Maybe the tap_err
would be much better fit if I wanted to do something further with the result it self. That being said though, I like the proposal of introducing tap as I can definitely see cases where we can use it - thanks for the recommendation! I've applied the recommendation.
43120bb
to
de26792
Compare
When created the
block_waiter
the assumption was that each produced batch will resolve to a unique digest id. Given the current design this is not guaranteed. If someone posts the same transactions we could end up producing batches with the same batch id. That became more apparent from the tests on the reconfigure.rs while working on the task to swap thebatch_loader
and use theblock_waiter
#738 . The tests are producing transactions of the exact same payload ending producing batches of same batch ids. Consequently theblock_waiter
was failing to successfully respond to concurrent requests of different certificates with common batch ids leading to receiving error messages like this:which is produced when trying to send a batch reply to the aggregator which waits for a certificate's batches to be received. Since
tx_pending_batch
was holding only the latest request's one shot sender, we couldn't basically notify multiple aggregators for the receipt of the batch.This PR is fixing this. Also a small optimisation is performed to make sure that only
one network request
will be made to fetch a batch from a worker.Merging this PR is essential in order to unblock the #738 as the reconfigure.rs will keep failing.