Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

[refactor] use block_waiter instead of batch_loader #738

Merged
merged 3 commits into from
Aug 15, 2022

Conversation

akichidis
Copy link
Contributor

@akichidis akichidis commented Aug 10, 2022

Resolves #706

This PR is:

  • deleting the batch_loader in executor
  • introducing the channels to use the block_waiter to fetch the blocks/payloads
  • introducing the temp_batch_store instead of reusing the batch_store where the later is meant to be used from the worker nodes. Also on the temp_batch_store we have the ability to store the unserialised Batch directly leading to simplification.
  • adding retry logic to fetch the blocks until successfully fetched (or permanently fail because of irrecoverable error)
  • adding end to end test to confirm the consensus output is successfully received
  • removing the code to replace the worker addresses part of the fix fix: Mitigate issue #706 #715

NOTE: Please review this first / as well #746

Ok(block) => {
// we successfully received the payload. Now let's add to store
store
.write_all(block.batches.iter().map(|b| (b.id, b.transactions.clone())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.write_all(block.batches.iter().map(|b| (b.id, b.transactions.clone())))
.write_all(block.batches.into_iter().map(|b| (b.id, b.transactions)))

@akichidis akichidis force-pushed the feat/use-block-waiter-in-executor branch from e182598 to 2cfb29b Compare August 11, 2022 17:43
@akichidis akichidis marked this pull request as ready for review August 11, 2022 18:09
@akichidis akichidis requested a review from asonnino as a code owner August 11, 2022 18:09
@@ -1,95 +1,103 @@
// Copyright (c) 2022, Mysten Labs, Inc.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically I could eliminate the Subscriber as well and embed the functionality in the executor::Core . However, the separation with the Subscriber offers us buffering capabilities, ie if the Core for some reason stuck we can still continue downloading the certificates' batches and output in the tx_executor channel, effectively saving time. Of course if tx_executor fills up we'll stall as well, but that's desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may not need a separate thread for the subscriber but I would keep the logic in the core as simple as possible (so Subscriber may remain but perhaps not spawn its own tokio task?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hhm what do you mean no spawn it's own tokio task? I mean, if we don't want to make subscriber block we need to run our loop in a task. Unless you mean something different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either keep the subscriber as it is (in this PR) or make it a collection of functions that are called by the Executor Core

@@ -1,95 +1,103 @@
// Copyright (c) 2022, Mysten Labs, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may not need a separate thread for the subscriber but I would keep the logic in the core as simple as possible (so Subscriber may remain but perhaps not spawn its own tokio task?)

executor/src/subscriber.rs Show resolved Hide resolved
Comment on lines +79 to +84
// It's important to have the futures in ordered fashion as we want
// to guarantee that will deliver to the executor the certificates
// in the same order we received from rx_consensus. So it doesn't
// mater if we somehow managed to fetch the batches from a later
// certificate. Unless the earlier certificate's payload has been
// fetched, no later certificate will be delivered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

tx_executor,
tx_get_block_commands,
get_block_retry_policy,
}
.run()
.await
.expect("Failed to run subscriber")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha cool, if an unrecoverable error happens we shut down here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, please also see the test here where I am explicitly checking that subscriber will crash in this case.

You can also see the recoverable / irrecoverable cases here . Errors that are mapped to Transient errors will make the retrier retry again. Errors that are marked as Permanent will make the retrier stop and eventually make the subscriber crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#707 should also help with the crashing :)

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, thanks!

@@ -69,7 +69,7 @@ pub struct PrimaryChannelMetrics {
pub tx_primary_messages: IntGauge,
/// occupancy of the channel from the `primary::PrimaryReceiverHandler` to the `primary::Helper`
pub tx_helper_requests: IntGauge,
/// occupancy of the channel from the `primary::ConsensusAPIGrpc` to the `primary::BlockWaiter`
/// occupancy of the channel from the `primary::ConsensusAPIGrpc` & `executor::Subscriber` to the `primary::BlockWaiter`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add one line saying that in contexts with an external consensus, it will be the former, and when the consensus is internal, it will be the later. This should make clear we don't actually expect 2 simultaneous senders yet.

@akichidis akichidis force-pushed the feat/use-block-waiter-in-executor branch from 2cfb29b to da5b9f4 Compare August 15, 2022 16:19
@akichidis akichidis enabled auto-merge (squash) August 15, 2022 17:21
@akichidis akichidis merged commit babd3d8 into main Aug 15, 2022
@akichidis akichidis deleted the feat/use-block-waiter-in-executor branch August 15, 2022 17:31
huitseeker pushed a commit to huitseeker/narwhal that referenced this pull request Aug 16, 2022
This commit is swapping the batch_loader with the block_waiter in the executor. This allow us to use a common solution for fetching batches and take advantage of the existing payload sync capabilities. Also earlier fix code has been removed for swapping external worker addresses with internal ones.
huitseeker pushed a commit that referenced this pull request Aug 16, 2022
This commit is swapping the batch_loader with the block_waiter in the executor. This allow us to use a common solution for fetching batches and take advantage of the existing payload sync capabilities. Also earlier fix code has been removed for swapping external worker addresses with internal ones.
huitseeker added a commit that referenced this pull request Sep 6, 2022
huitseeker added a commit that referenced this pull request Sep 6, 2022
@huitseeker huitseeker mentioned this pull request Sep 6, 2022
huitseeker added a commit to huitseeker/narwhal that referenced this pull request Sep 7, 2022
huitseeker added a commit that referenced this pull request Sep 7, 2022
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 30, 2022
…al#738)

This commit is swapping the batch_loader with the block_waiter in the executor. This allow us to use a common solution for fetching batches and take advantage of the existing payload sync capabilities. Also earlier fix code has been removed for swapping external worker addresses with internal ones.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[executor] Route the executor requests without the worker_to_worker interface
4 participants