Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging and close race in RPC client #661

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

JHartman5
Copy link

We're regularly seeing errors like panic: close of closed channel in our application with a stack trace like:

goroutine 502383 [running]:
github.com/hashicorp/serf/client.(*queryHandler).Cleanup(0xc00198c060)
	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:629 +0x91
github.com/hashicorp/serf/client.(*RPCClient).deregisterHandler(0xc00034c070, 0x11bea00)
	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:804 +0xd8
github.com/hashicorp/serf/client.(*queryHandler).Handle(0xc00198c060, 0x11beb00)
	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:612 +0x1cb
github.com/hashicorp/serf/client.(*RPCClient).respondSeq(0xc00034c070, 0x11beb00, 0xc001cb3740)
	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:824 +0xc4
github.com/hashicorp/serf/client.(*RPCClient).listen(0xc00034c070)
	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:840 +0x86
created by github.com/hashicorp/serf/client.ClientFromConfig
	/go/src/github.com/boxcast/playlist_service/vendor/github.com/hashicorp/serf/client/rpc_client.go:148 +0x4bd

While I am not 100% sure of the root cause of this, inspecting the RPC client code suggests that there might be a race if someone is calling client.Close() on an RPC client at the same time a client might be closing itself due to an error. Since none of the handlers' Cleanup methods are currently doing anything atomic, it's easy to see how multiple callers might be close()ing the channels at the same time.

This PR attempts to fix the race condition by adding a mutex to the various RPC client handlers to protect access to closed and the response channels; it also adds a check to see if the handler is closed in the Handle() methods.

I noticed that there may be a similar race with the init members, so they got a similar but slightly different atomic treatment.

Finally, I noticed that the serf.Config provides the ability to customize the logging by providing your own *log.Logger, but the RPC client currently does not. So this PR also adds a *log.Logger to the config for the RPC client. If not supplied, it uses the default logger (which should have the same effect as the current code).

It doesn't seem like there are any tests for the client package, and the top-level tests failed for me before making any changes, so if someone can suggest how I can confirm my changes, I'm more than happy to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant