Logging and close race in RPC client #661
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We're regularly seeing errors like
panic: close of closed channel
in our application with a stack trace like:While I am not 100% sure of the root cause of this, inspecting the RPC client code suggests that there might be a race if someone is calling
client.Close()
on an RPC client at the same time a client might be closing itself due to an error. Since none of the handlers'Cleanup
methods are currently doing anything atomic, it's easy to see how multiple callers might beclose()
ing the channels at the same time.This PR attempts to fix the race condition by adding a mutex to the various RPC client handlers to protect access to
closed
and the response channels; it also adds a check to see if the handler is closed in theHandle()
methods.I noticed that there may be a similar race with the
init
members, so they got a similar but slightly different atomic treatment.Finally, I noticed that the
serf.Config
provides the ability to customize the logging by providing your own*log.Logger
, but the RPC client currently does not. So this PR also adds a*log.Logger
to the config for the RPC client. If not supplied, it uses the default logger (which should have the same effect as the current code).It doesn't seem like there are any tests for the client package, and the top-level tests failed for me before making any changes, so if someone can suggest how I can confirm my changes, I'm more than happy to do so.