-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming RPC handlers are left running after client connections closed and server is stopped #6921
Comments
This would seem to be caused by this suggestion from that PR. I wonder if there was a different root cause for the failures that were the reason we removed the waitgroup. I agree this was a behavior change, and it would be good to bring the old behavior back (along with a test) since it was done that way before and is more ideal. |
(Still debugging...) It seems like there must have been a bug with (or some other exception in) the previous implementation that did allow Line 1038 in 953d12a
So it does seem this would be a behavior change to block if handlers were still running. Maybe only |
I believe this (legacy) code in Lines 1908 to 1910 in 82df321
It's arguably most correct to block until they exit, and I can't think of any valid reason for handlers to not respect the |
What version of gRPC are you using?
v1.60.1
What version of Go are you using (
go version
)?1.21.6
What operating system (Linux, Windows, …) and version?
Ubuntu 22.04.3, macOS Ventura
What did you do/what did you see?
See reproduction here: https://github.com/sunjayBhatia/grpc-go-issue-repro/tree/main/shutdown-cleanup
Clone the repo above, change to the
shutdown-cleanup
directory, run:TEST_COUNT=10 make test
orTEST_COUNT=10 make test-race
(using a higher number of test runs ensure the issue shows up, usually it doesn't take more than a couple runs however)The repro starts a simple server with a streaming RPC, starts a client connection, sends/receives a message, closes the client connection, stops the server, and exits. The server handler logs when connections are opened/closed. It is hooked up to the test logger, which panics if logs are written after a test ends. We can see in this test that the streaming handler is still running after the test ends: we see the panics with the log lines
StreamEcho error in Recv ...
andStreamEcho ended
.Leaving the repro code as-is, should see a failure like:
Looks like the goroutine that spawns the stream handler is not waited for so it may still be running even after we think the server has shut down:
grpc-go/server.go
Line 1027 in 953d12a
Similarly, when using a set number of server stream workers (uncomment/comment the relevant lines in the repro code):
It looks like server workers are left running without full coordination on shutdown as well:
grpc-go/server.go
Line 631 in 953d12a
This issue showed up in Contour CI after we bumped from grpc-go v1.59.0 to v1.60.0+
I've tracked this issue via git bisect down to this PR: #6489 which looks like it changed how coordination of stream handlers is set up. Looks like connection draining/closing might be working as expected, but stream handlers are not
What did you expect to see?
When server is stopped (gracefully or otherwise), it waits until all stream handlers finish to exit.
The text was updated successfully, but these errors were encountered: