Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove slow operations from critical path #3788

Merged
merged 10 commits into from
Oct 28, 2024
Merged

Remove slow operations from critical path #3788

merged 10 commits into from
Oct 28, 2024

Conversation

bfish713
Copy link
Collaborator

@bfish713 bfish713 commented Oct 23, 2024

Closes #<ISSUE_NUMBER>

This PR:

Moves 2 things off the critical path

  • Storing the proposal
  • Fetching a previous proposal

For storing the proposal we just do this at the end of the voting procedure before sending a vote. This only blocks sending a vote
For fetching past proposal there was already logic to do this in both the voting and proposing code path after the other dependencies are fulfilled. We still spawn a task to do the fetch so we can vote on future proposals referencing that view, this is critical for cases where all nodes restart.

I want to make a larger fix where handle_quorum_proposal_recv is moved to a separate task I'll follow up with that. In the meantime this should significantly improve the Decaf situation

This PR does not:

Key places to review:

@bfish713 bfish713 changed the title Move storage to end of proposal processing Remove slow operations from critical path Oct 23, 2024
@bfish713 bfish713 marked this pull request as ready for review October 23, 2024 02:57
Comment on lines +111 to +135
fn spawn_fetch_proposal<TYPES: NodeType, V: Versions>(
view: TYPES::View,
event_sender: Sender<Arc<HotShotEvent<TYPES>>>,
event_receiver: Receiver<Arc<HotShotEvent<TYPES>>>,
membership: Arc<TYPES::Membership>,
consensus: OuterConsensus<TYPES>,
sender_public_key: TYPES::SignatureKey,
sender_private_key: <TYPES::SignatureKey as SignatureKey>::PrivateKey,
upgrade_lock: UpgradeLock<TYPES, V>,
) {
async_spawn(async move {
let lock = upgrade_lock;

let _ = fetch_proposal(
view,
event_sender,
event_receiver,
membership,
consensus,
sender_public_key,
sender_private_key,
&lock,
)
.await;
});
Copy link
Collaborator

@rob-maron rob-maron Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get included with the [task] cancellation logic [on new views] somehow? Do we want it to be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably ok to not cancel this because it is bounded anyways: it will exact as soon as it either gets the proposal or fails

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah bounded at 500ms, and doesn't do anything heavy after sending the request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be bad to add a async_timeout() on the entire fetch_proposal call? I guess the question is whether it's okay for fetch_proposal to abort partway through execution

I agree that it's unlikely fetch_proposal will ever get stuck outside the part where we already have an async_timeout, but it makes me a bit uncomfortable to drop the handle of tasks that need both a read lock and a write lock (at the very least, I think this might obscure a deadlock if we run into one)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a timeout in fetch_proposal I don't think we prevent a deadlock by cancelling this task either, though I'm not sure. Personally I feel it's better to either timeout the actual request for proposal because it seems bad to actually get the proposal then timeout waiting for the write lock to put it into our internal state

@bfish713 bfish713 requested review from rob-maron and jbearer October 28, 2024 17:52
Copy link
Contributor

@ss-es ss-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks straightforward to me, just one question about spawning the fetch_proposal

Comment on lines +111 to +135
fn spawn_fetch_proposal<TYPES: NodeType, V: Versions>(
view: TYPES::View,
event_sender: Sender<Arc<HotShotEvent<TYPES>>>,
event_receiver: Receiver<Arc<HotShotEvent<TYPES>>>,
membership: Arc<TYPES::Membership>,
consensus: OuterConsensus<TYPES>,
sender_public_key: TYPES::SignatureKey,
sender_private_key: <TYPES::SignatureKey as SignatureKey>::PrivateKey,
upgrade_lock: UpgradeLock<TYPES, V>,
) {
async_spawn(async move {
let lock = upgrade_lock;

let _ = fetch_proposal(
view,
event_sender,
event_receiver,
membership,
consensus,
sender_public_key,
sender_private_key,
&lock,
)
.await;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be bad to add a async_timeout() on the entire fetch_proposal call? I guess the question is whether it's okay for fetch_proposal to abort partway through execution

I agree that it's unlikely fetch_proposal will ever get stuck outside the part where we already have an async_timeout, but it makes me a bit uncomfortable to drop the handle of tasks that need both a read lock and a write lock (at the very least, I think this might obscure a deadlock if we run into one)

@bfish713 bfish713 merged commit 25c907e into main Oct 28, 2024
24 checks passed
@bfish713 bfish713 deleted the bf/spawn-validation branch October 28, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants