-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on grpc transport #5644
Comments
@zasweq Adding defer on this line: grpc-go/internal/transport/http2_client.go Line 1237 in a094a10
solves the issue. Is that a good solution? |
@wimdec what is your server (or proxy) running? It doesn't seem like this condition should be possible if it is following best-HTTP/2-practices: https://httpwg.org/specs/rfc7540.html#GOAWAY
|
We are using kubernetes ingress-nginx, version v1.2.0 as proxy. |
The behavior seen here would be the proxy (not the backing grpc-go server) if that's an L7/HTTP/2 proxy, which I'm pretty sure it is. So they are most likely not following the recommended way of gracefully terminating the connection, which could be raised as an issue with them. If they were allowing enough time for us to stop creating streams the connection, no new streams should need to be terminated this way, and we wouldn't hit this deadlock. (We still have a bug which will be fixed ASAP.) |
What version of gRPC are you using?
1.49
What version of Go are you using (
go version
)?1.19
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
I do not have a reproducible scenario.
Since we switched to grpc-go 1.49 in our application, we see sometimes that our application is hanging on grpc requests.
From the extracted stack traces, I could clearly see that there is a deadlock in transport caused by the recent PR #5494
In this PR this is mentioned on the http2_client mutex:
But that is exactly what is happening very infrequently in our case.
Cause is
grpc-go/internal/transport/http2_client.go
Line 1237 in a094a10
This closeStream request is done while holding the http2_client mutex.
And the implementation of closeStream is calling controlBuf at line:
grpc-go/internal/transport/http2_client.go
Line 870 in a094a10
This violates the assumption of mutex. In this case the http2_client mutex is acquired while trying to acquire the controlbuf mutex.
And in another goroutine in parallel, the controlbuf mutex is acquired while trying to acquire the http2_client mutex.
Reason is the added mutex lock at line:
grpc-go/internal/transport/http2_client.go
Line 723 in a094a10
Relevant stack traces:
What did you expect to see?
no deadlock
What did you see instead?
deadlock
The text was updated successfully, but these errors were encountered: