-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
morph: Fix subscription deadlock #2720
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2720 +/- ##
==========================================
- Coverage 28.81% 28.80% -0.01%
==========================================
Files 415 415
Lines 32397 32397
==========================================
- Hits 9335 9333 -2
- Misses 22228 22229 +1
- Partials 834 835 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i havent dived deeply to the problem yet, but...
to me its ok that subscribing ops acquire subs lock. And they must not cause any deadlocks. If a deadlock happens - it's almost definitely the routing/handling fault and should be resolve there
pkg/morph/client/notifications.go
Outdated
|
||
if c.subs.subscribedToAllNotaryEvents { | ||
if _, ok := c.subs.subscribedNotaryEvents[txSigner]; ok || c.subs.subscribedToAllNotaryEvents { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to touch subscribedNotaryEvents
if subscribedToAllNotaryEvents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it critical? do not want more lines for not really important things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not critical if this were code from scratch, but it is not. Extra lines are not a problem if they have a reason
I would say 5655887 was not necessary. We can add the third lock but that is extremely hard and not necessary for the client, it would do too many things then. I do not know what to say here, is it the 5th or the 10th time I am fixing a deadlock caused by the impossibility of doing any RPC calls if nothing reads subs channel? neo-go is changing, neofs is changing but still, we face an intuitive regular code that deadlocks. |
let's get rid of this minefield once and for all. Half actions don't work temporary unlocked subscription is not a solution to me. Nothing prevents the listener from discarding events that are not yet in demand |
What do you suggest? We need to hold a few connections and be able to switch b/w them so a lock is there. But neo-go requires to read subs. But making a new sub is an RPC that is impossible without reading the subs.
If you prohibit subscription change, you may experience a connection switch (and a subscription lock) while trying to make a new subscription (another subscription lock). |
You can make |
134e421
to
7716f9b
Compare
Legit. But I do not like it cause that will work only based on some info we know: no routines change the channels if no RPC failures happen (or they even happen but their handling is locked cause we are reading some internal channels). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do it in another way: acquire channels once, keep them until connLost
. But this works too.
Scenario: 0. at least one subscription has been performed 1. another subscription is being done 2. a notification from one of the `0.` point's subs is received If `2.` happens b/w `0.` and `1.` a deadlock appears since the notification routing process is locked on the subscription lock while the subscription lock cannot be unlocked since the subscription RPC cannot be done before the just-arrived notification is handled (read from the neo-go subscription channel). `switchLock` does the same thing to the `routeNotifications`: it ensures that no routine is doing/will be doing changes with the subscription channels, even though `subs`'s lock was created for this purpose initially. Relates #2559. Signed-off-by: Pavel Karpy <[email protected]>
7716f9b
to
d8beaf7
Compare
Scenario:
0.
point's subs is receivedIf
2.
happens b/w0.
and1.
a deadlock appears since the notification routing process is locked on the subscription lock while the subscription lock cannot be unlocked since the subscription RPC cannot be done before the just-arrived notification is handled (read from the neo-go subscription channel).Relates #2559.