Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

morph: Fix subscription deadlock #2720

Merged
merged 1 commit into from
Jan 24, 2024
Merged

Conversation

carpawell
Copy link
Member

@carpawell carpawell commented Jan 17, 2024

Scenario:

  1. at least one subscription has been performed
  2. another subscription is being done
  3. a notification from one of the 0. point's subs is received

If 2. happens b/w 0. and 1. a deadlock appears since the notification routing process is locked on the subscription lock while the subscription lock cannot be unlocked since the subscription RPC cannot be done before the just-arrived notification is handled (read from the neo-go subscription channel).
Relates #2559.

Copy link

codecov bot commented Jan 17, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (46f2a82) 28.81% compared to head (d8beaf7) 28.80%.
Report is 1 commits behind head on master.

Files Patch % Lines
pkg/morph/client/notifications.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2720      +/-   ##
==========================================
- Coverage   28.81%   28.80%   -0.01%     
==========================================
  Files         415      415              
  Lines       32397    32397              
==========================================
- Hits         9335     9333       -2     
- Misses      22228    22229       +1     
- Partials      834      835       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pkg/morph/client/notifications.go Outdated Show resolved Hide resolved
pkg/morph/client/notifications.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cthulhu-rider cthulhu-rider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i havent dived deeply to the problem yet, but...

to me its ok that subscribing ops acquire subs lock. And they must not cause any deadlocks. If a deadlock happens - it's almost definitely the routing/handling fault and should be resolve there


if c.subs.subscribedToAllNotaryEvents {
if _, ok := c.subs.subscribedNotaryEvents[txSigner]; ok || c.subs.subscribedToAllNotaryEvents {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to touch subscribedNotaryEvents if subscribedToAllNotaryEvents

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it critical? do not want more lines for not really important things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not critical if this were code from scratch, but it is not. Extra lines are not a problem if they have a reason

@carpawell
Copy link
Member Author

to me its ok that subscribing ops acquire subs lock. And they must not cause any deadlocks. If a deadlock happens - it's almost definitely the routing/handling fault and should be resolve there

I would say 5655887 was not necessary. We can add the third lock but that is extremely hard and not necessary for the client, it would do too many things then.

I do not know what to say here, is it the 5th or the 10th time I am fixing a deadlock caused by the impossibility of doing any RPC calls if nothing reads subs channel? neo-go is changing, neofs is changing but still, we face an intuitive regular code that deadlocks.

@cthulhu-rider
Copy link
Contributor

5th or the 10th time I am fixing a deadlock caused by the impossibility of doing any RPC calls if nothing reads subs channel

let's get rid of this minefield once and for all. Half actions don't work

temporary unlocked subscription is not a solution to me. Nothing prevents the listener from discarding events that are not yet in demand

@carpawell
Copy link
Member Author

let's get rid of this minefield once and for all

What do you suggest? We need to hold a few connections and be able to switch b/w them so a lock is there. But neo-go requires to read subs. But making a new sub is an RPC that is impossible without reading the subs.

temporary unlocked subscription is not a solution to me

If you prohibit subscription change, you may experience a connection switch (and a subscription lock) while trying to make a new subscription (another subscription lock).

@roman-khimov
Copy link
Member

You can make routeNotifications not require s.subs lock for regular (non-reconnecting) operation. Likely this will solve the problem. Locking here looks correct.

@carpawell carpawell force-pushed the fix/autodeploy-deadlock branch from 134e421 to 7716f9b Compare January 23, 2024 12:27
@carpawell
Copy link
Member Author

You can make routeNotifications not require s.subs lock for regular (non-reconnecting) operation. Likely this will solve the problem.

Legit. But I do not like it cause that will work only based on some info we know: no routines change the channels if no RPC failures happen (or they even happen but their handling is locked cause we are reading some internal channels).
Anyway, done. Small diff, fixes the problem.

Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do it in another way: acquire channels once, keep them until connLost. But this works too.

Scenario:
0. at least one subscription has been performed
1. another subscription is being done
2. a notification from one of the `0.` point's subs is received

If `2.` happens b/w `0.` and `1.` a deadlock appears since the notification
routing process is locked on the subscription lock while the subscription lock
cannot be unlocked since the subscription RPC cannot be done before the
just-arrived notification is handled (read from the neo-go subscription
channel).

`switchLock` does the same thing to the `routeNotifications`: it ensures that no
routine is doing/will be doing changes with the subscription channels, even
though `subs`'s lock was created for this purpose initially.

Relates #2559.

Signed-off-by: Pavel Karpy <[email protected]>
@carpawell carpawell force-pushed the fix/autodeploy-deadlock branch from 7716f9b to d8beaf7 Compare January 24, 2024 08:07
@roman-khimov roman-khimov merged commit 895559b into master Jan 24, 2024
7 of 10 checks passed
@roman-khimov roman-khimov deleted the fix/autodeploy-deadlock branch January 24, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants