fix: avoid deadlock in publisher and subscriber #1749
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is one scenario where badger could end up in a deadlock state. Complete goroutine dump can be found here https://drive.google.com/file/d/1nIrYlrbwlGtvk4WGDCzM0z996eEydaNI/view?usp=sharing
The issue is publisher sends out the messages over the subscriber channel and subscriber will process those message one at a time. Now subsriber channel size is 1000, so if the channel is completely filled publisher will wait indefinitly to send the message. And this is what happened. while processing the message, subscriber receive the error after 15 min and in the meantime publisher was waiting indefinitely for the channel to clear. On receiving the error subscriber asked the publisher to delete the subscriber and this turns into a deadlock state where one lock is acquired by the publisher to send the message and message processor of subscriber is waiting on that lock to delete the subscriber.
This PR tries to solve this issue by adding a atomic variable in subscriber so that publisher can stop sending out the new updates and thus making the deadlock to be released eventually.
This change is