Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

[fix] block_waiter serve blocks with batches of similar digest ids #746

Merged
merged 7 commits into from
Aug 15, 2022

Conversation

akichidis
Copy link
Contributor

@akichidis akichidis commented Aug 11, 2022

When created the block_waiter the assumption was that each produced batch will resolve to a unique digest id. Given the current design this is not guaranteed. If someone posts the same transactions we could end up producing batches with the same batch id. That became more apparent from the tests on the reconfigure.rs while working on the task to swap the batch_loader and use the block_waiter #738 . The tests are producing transactions of the exact same payload ending producing batches of same batch ids. Consequently the block_waiter was failing to successfully respond to concurrent requests of different certificates with common batch ids leading to receiving error messages like this:

Couldn't find pending batch with id ....

which is produced when trying to send a batch reply to the aggregator which waits for a certificate's batches to be received. Since tx_pending_batch was holding only the latest request's one shot sender, we couldn't basically notify multiple aggregators for the receipt of the batch.

This PR is fixing this. Also a small optimisation is performed to make sure that only one network request will be made to fetch a batch from a worker.

Merging this PR is essential in order to unblock the #738 as the reconfigure.rs will keep failing.

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix, thanks a lot for the explicit comments, context, and tests, too!

Comment on lines 772 to 773
if let Err(err) = s.send(result.clone()) {
error!("Couldn't send batch result {} message to channel [{:?}] for block_id {}", batch_id, err, id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here and in a few of the lines above you're using a lot of if let Err(err) = xxx and logging coupled.

Nothing wrong with that, but some may find it tedious to read. You might want to look at tap_err and possibly use captured identifiers ({id}) to make this a bit lighter. YMMV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case I am only consuming the error just to print it. Using the tap_err in the end is making me (because of clippy) consume the result anyways and do a let _ = .... . Maybe the tap_err would be much better fit if I wanted to do something further with the result it self. That being said though, I like the proposal of introducing tap as I can definitely see cases where we can use it - thanks for the recommendation! I've applied the recommendation.

@akichidis akichidis force-pushed the fix-block-waiter-batches branch from 43120bb to de26792 Compare August 15, 2022 13:51
@akichidis akichidis merged commit edaab03 into main Aug 15, 2022
@akichidis akichidis deleted the fix-block-waiter-batches branch August 15, 2022 16:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants