Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock clientconn.getTransport #1605

Closed
ghost opened this issue Oct 20, 2017 · 5 comments
Closed

Deadlock clientconn.getTransport #1605

ghost opened this issue Oct 20, 2017 · 5 comments

Comments

@ghost
Copy link

ghost commented Oct 20, 2017

Please answer these questions before submitting your issue.

What version of gRPC are you using?

1.7.0

What version of Go are you using (go version)?

1.9.1

What operating system (Linux, Windows, …) and version?

Mac

What did you do?

We noticed this issue when the Google Cloud Datastore library started deadlocking when making Get requests. Digging deeper it seems to be blocking at: https://github.com/grpc/grpc-go/blob/master/clientconn.go#L800

We saw this issue first around 12:15PM PST on October 19.

Rolling back gRPC to an older version seems to have fixed this.

What did you expect to see?

Do not deadlock.

What did you see instead?

Deadlock.

@MakMukhi
Copy link
Contributor

Hey can you share more information to help us reproduce it.
What exactly did you do?

  1. Do you have a custom balancer?
  2. Is your code a mix of both v1 v2 balancer/resolver code?
  3. Is the dial blocking or non blocking?

Please feel free to share any other relevant information that you think might help us dig deeper.

It'd be great if you can share some code? A small reproduction code will be even better.
Thanks

@ghost
Copy link
Author

ghost commented Oct 20, 2017

Hey MakMukhi,

  1. AFAIK, No, I do not have a custom balancer.
  2. No idea, don't think so.
  3. No idea.

As I mentioned, we saw this issue while attempting to use the Google Cloud Datastore client libraries. Reverting the last few commits from grpc-go seems to have fixed the issue for us.

The smallest possible repro is at: https://gist.github.com/dhavalcue/8fc6aba28aad59603fc01351a3dca6d6

The backtrace where it deadlocks is at:
https://gist.github.com/dhavalcue/95558572a4d9fd827ce0a029d52789e1

@menghanl
Copy link
Contributor

This could be caused by the switching of default resolver. Can you try #1606 and see if that fixes the problem?

@dfawley
Copy link
Member

dfawley commented Oct 20, 2017

Also, are you setting DATASTORE_EMULATOR_HOST in your environment? If so, it looks like that specifies the target for grpc to connect to -- can you share its setting?

EDIT: Sorry, nevermind that question - I see this is for debugging and there is a different path in another package that calls grpc.Dial if it's not set.

@ghost
Copy link
Author

ghost commented Oct 21, 2017

@menghanl: #1606 seems to have fixed the problem. It's working as expected now. Will close this issue.

@ghost ghost closed this as completed Oct 21, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants