-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tapchannel: improve aux signer signal handling #1118
Conversation
Pull Request Test Coverage Report for Build 10815203740Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice bug hunting and fix!
tapchannel/aux_leaf_signer.go
Outdated
log.Tracef("Processing %d aux sig jobs", len(sigJobs)) | ||
|
||
for idx := range sigJobs { | ||
sigJob := sigJobs[idx] | ||
cancelAndErr := func(err error) { | ||
log.Errorf("Error processing aux sig job: %v", err) | ||
|
||
close(sigJob.Cancel) | ||
// Check that the cancel signal was not already sent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're expected to be the only ones closing the sigJob.Cancel
channel, we could also do this with a sync.Once
. But I guess this works too and would also work in case someone else closes the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we need to straighten out exactly who/what can close the quit channel, otherwise we'll invariably run into a double close panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lnd can close the channel, it is shared amongst the 'classic' sig jobs and these aux sig jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated s.t. lnd is the only one that will be closing the channel (simpler flow, having tapd close was not really needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job digging into this!
I think the changes re the quit/cancel operations make sense, and one of the added tests does demonstrate a problem in this area.
The one thing I'm not sure if we've figured out is: why was the goroutine blocking on a channel that should always be buffered?
tapchannel/aux_leaf_signer.go
Outdated
log.Tracef("Processing %d aux sig jobs", len(sigJobs)) | ||
|
||
for idx := range sigJobs { | ||
sigJob := sigJobs[idx] | ||
cancelAndErr := func(err error) { | ||
log.Errorf("Error processing aux sig job: %v", err) | ||
|
||
close(sigJob.Cancel) | ||
// Check that the cancel signal was not already sent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we need to straighten out exactly who/what can close the quit channel, otherwise we'll invariably run into a double close panic.
Ok, updated to address initial comments. I think this should get merged as-is and then have a follow-up PR, following the sketch here: |
1c6f95e
to
4b03773
Compare
In this commit, we add checks of the aux signer cancel and quit signals at all points during aux sig batch processing when a response may be sent. This mirrors the signal handling used in the lnwallet sigpool worker goroutine. We also update the early exit logic to not close the cancel channel; only the caller, lnd, should mutate that channel.
4b03773
to
4eb1aeb
Compare
log.Errorf("Error processing aux sig job: %v", err) | ||
|
||
close(sigJob.Cancel) | ||
sigJob.Resp <- lnwallet.AuxSigJobResp{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wherever we send to a channel, we should also read on the Quit
channel to make sure we never block on a send (even if it's buffered).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, so if we receive Quit
we abandon sending the err? that may then block lnd
then IIUC.
Also, if we select on Quit
, and we entered this function from a Quit
case statement, we would never send the error, since we're reading an already closed channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, you're right. We have to assume the Resp
channel is buffered here though (which it is), then this should never block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to assume the Resp channel is buffered here though (which it is), then this should never block.
Yeah this is what was puzzling w.r.t the original stack trace that led us down this line of inquiry: unless some weird mutation happened, why would tapd
block on the channel send?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #1114 .
This PR updates the aux signer to handle cancel and quit signals in a similar fashion to the 'primary' sig job handler in
lnwallet/sigpool
. This also improves safety as the cancel channel can now be closed by bothtapd
andlnd
.