-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove slow operations from critical path #3788
Conversation
fn spawn_fetch_proposal<TYPES: NodeType, V: Versions>( | ||
view: TYPES::View, | ||
event_sender: Sender<Arc<HotShotEvent<TYPES>>>, | ||
event_receiver: Receiver<Arc<HotShotEvent<TYPES>>>, | ||
membership: Arc<TYPES::Membership>, | ||
consensus: OuterConsensus<TYPES>, | ||
sender_public_key: TYPES::SignatureKey, | ||
sender_private_key: <TYPES::SignatureKey as SignatureKey>::PrivateKey, | ||
upgrade_lock: UpgradeLock<TYPES, V>, | ||
) { | ||
async_spawn(async move { | ||
let lock = upgrade_lock; | ||
|
||
let _ = fetch_proposal( | ||
view, | ||
event_sender, | ||
event_receiver, | ||
membership, | ||
consensus, | ||
sender_public_key, | ||
sender_private_key, | ||
&lock, | ||
) | ||
.await; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this get included with the [task] cancellation logic [on new views] somehow? Do we want it to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably ok to not cancel this because it is bounded anyways: it will exact as soon as it either gets the proposal or fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah bounded at 500ms, and doesn't do anything heavy after sending the request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be bad to add a async_timeout()
on the entire fetch_proposal
call? I guess the question is whether it's okay for fetch_proposal
to abort partway through execution
I agree that it's unlikely fetch_proposal
will ever get stuck outside the part where we already have an async_timeout
, but it makes me a bit uncomfortable to drop the handle of tasks that need both a read lock and a write lock (at the very least, I think this might obscure a deadlock if we run into one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a timeout in fetch_proposal
I don't think we prevent a deadlock by cancelling this task either, though I'm not sure. Personally I feel it's better to either timeout the actual request for proposal because it seems bad to actually get the proposal then timeout waiting for the write lock to put it into our internal state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks straightforward to me, just one question about spawning the fetch_proposal
fn spawn_fetch_proposal<TYPES: NodeType, V: Versions>( | ||
view: TYPES::View, | ||
event_sender: Sender<Arc<HotShotEvent<TYPES>>>, | ||
event_receiver: Receiver<Arc<HotShotEvent<TYPES>>>, | ||
membership: Arc<TYPES::Membership>, | ||
consensus: OuterConsensus<TYPES>, | ||
sender_public_key: TYPES::SignatureKey, | ||
sender_private_key: <TYPES::SignatureKey as SignatureKey>::PrivateKey, | ||
upgrade_lock: UpgradeLock<TYPES, V>, | ||
) { | ||
async_spawn(async move { | ||
let lock = upgrade_lock; | ||
|
||
let _ = fetch_proposal( | ||
view, | ||
event_sender, | ||
event_receiver, | ||
membership, | ||
consensus, | ||
sender_public_key, | ||
sender_private_key, | ||
&lock, | ||
) | ||
.await; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be bad to add a async_timeout()
on the entire fetch_proposal
call? I guess the question is whether it's okay for fetch_proposal
to abort partway through execution
I agree that it's unlikely fetch_proposal
will ever get stuck outside the part where we already have an async_timeout
, but it makes me a bit uncomfortable to drop the handle of tasks that need both a read lock and a write lock (at the very least, I think this might obscure a deadlock if we run into one)
Closes #<ISSUE_NUMBER>
This PR:
Moves 2 things off the critical path
For storing the proposal we just do this at the end of the voting procedure before sending a vote. This only blocks sending a vote
For fetching past proposal there was already logic to do this in both the voting and proposing code path after the other dependencies are fulfilled. We still spawn a task to do the fetch so we can vote on future proposals referencing that view, this is critical for cases where all nodes restart.
I want to make a larger fix where
handle_quorum_proposal_recv
is moved to a separate task I'll follow up with that. In the meantime this should significantly improve the Decaf situationThis PR does not:
Key places to review: