-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
epoch stale after mainchain rpc massive restart #2559
Comments
Container can't be created as well in this case until all nodes are restarted. |
@532910, how long RPCs were down? Seconds? Minutes?
@roman-khimov, what exact nodes? IR? SN? |
let start with minutes
IR, I believe |
No progress for now: one node local consensus does not allow reproducing. The best I can do here is wait until the next update and see the logs/profiles. If that is unacceptable I may try the local 4/7 nodes consensus. |
Scenario: 0. at least one subscription has been performed 1. another subscription is being done 2. a notification from one of the `0.` point's subs is received If `2.` happens b/w `0.` and `1.` a deadlock appears since the notification routing process is locked on the subscription lock while the subscription lock cannot be unlocked since the subscription RPC cannot be done before the just-arrived notification is handled (read from the neo-go subscription channel). Relates #2559. Signed-off-by: Pavel Karpy <[email protected]>
Scenario: 0. at least one subscription has been performed 1. another subscription is being done 2. a notification from one of the `0.` point's subs is received If `2.` happens b/w `0.` and `1.` a deadlock appears since the notification routing process is locked on the subscription lock while the subscription lock cannot be unlocked since the subscription RPC cannot be done before the just-arrived notification is handled (read from the neo-go subscription channel). `switchLock` does the same thing to the `routeNotifications`: it ensures that no routine is doing/will be doing changes with the subscription channels, even though `subs`'s lock was created for this purpose initially. Relates #2559. Signed-off-by: Pavel Karpy <[email protected]>
Scenario: 0. at least one subscription has been performed 1. another subscription is being done 2. a notification from one of the `0.` point's subs is received If `2.` happens b/w `0.` and `1.` a deadlock appears since the notification routing process is locked on the subscription lock while the subscription lock cannot be unlocked since the subscription RPC cannot be done before the just-arrived notification is handled (read from the neo-go subscription channel). `switchLock` does the same thing to the `routeNotifications`: it ensures that no routine is doing/will be doing changes with the subscription channels, even though `subs`'s lock was created for this purpose initially. Relates #2559. Signed-off-by: Pavel Karpy <[email protected]>
Can't be reproduced at this stage, waiting for another case in some network. |
Seems to be reproducible if 2/3 RPC nodes to go offline for some time. |
Got a stable reproduction during new payment tests |
@evgeniiz321, as I understand, in your case you want to make the test be faster so you change epoch duration to 20 blocks, right? I do not see any epoch ticks after changing so it is impossible to change epoch duration immediately, we do not have notifications about config changes and cannot recalculate the next block for epoch handling. As I understand you wait for a new epoch no longer than 60 seconds, probably, a new epoch will not happen if you used 240-second epoch before (1-second block for the default 240 epoch duration). Can you either try to tick epoch after network setting tuning or increase max awaiting to 240? And yes, it does not relate the original flapping problem. |
We have not seen this exact issue for so long so I close it, it can be reopened once it happens. |
Still was not the case. See #3007, it is more related to the situation. |
to reproduce: stop all mainchain rpc (for upgrade for exmaple), then start
workaround: restart all ir nodes
The text was updated successfully, but these errors were encountered: