-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call stuck connecting when server is initially unavailable, but becomes available later #1434
Comments
I just published grpc-js version 1.0.4 with a change that may impact what you are seeing here. Can you try again with that version? If that doesn't fix it, can you run the client in the failing configuration with the environment variables |
Unfortunately it didn't help. I upated the repro project to use 1.0.4, so you can try yourself, but here are the traces as requested: When I start the client first:
nothing else happens afterwards even when I start the server, when the message is supposed to be send there is:
on the other hand, when the server already running, it's like this:
|
That first log is the complete output when you start the client, then the server? |
Yes, starting the server doesn't produce any reaction in the client logs. |
OK, there's definitely a bug. There should have been more to that log with or without a server. I will look into it. |
I modified your example code to work on Node, and when I ran the client I saw the trace output I expected: the client enters a continuous loop of trying and failing to connect. This might be an Electron-specific issue. |
Might be. Just a note that 0.7.4 works stable (>=0.7.5 stopped working) and trace from it looks quite different, it never even enters TRANSIENT_FAILURE state:
I've seen there were some bugs being fixed w.r.t. to refcounting in that release and some other bugs in subsequent releases. I wonder if the reason it stalls in 1.0.4 is the same as for 0.7.5, maybe there are two problems. Any idea how to debug it further? |
The subchannel is supposed to go into the TRANSIENT_FAILURE state when it fails to connect, so that log from 0.7.4 is buggy in a different way. I have now managed to replicate the log you are seeing using Electron, with the exact same code that had the behavior I expected using Node. So, this is definitely Electron-specific behavior. |
I figured out the problem, and the fix is in #1446. It's not exactly Electron-specific behavior; the bug is a race condition that Electron triggers a lot more reliably than Node does. |
I had the same issue with a similar scenario to OP and had to manually check the server socket was open before connecting the client. Great to see a fix will be added for this 👍 |
Do you think this would this also apply if a client was making two (or more) unary requests to a server that doesn't exist? Error code 14 returns for the first and the connectivity status of the channel goes from 3 (idle) to 0 (connecting). The second request then hangs indefinitely for me. grpc 1.24.2. |
|
I have published |
Thanks, works fine now. |
I am on the version ^1.1.7. The bug is showing up still. google pub sub version is 1.1.6. |
Problem description
I have a (C#) grpc server and (grpc-js) client, which I start simultaneously and connect them over loopback (127.0.0.1). Usually the server becomes ready after the client, thus client will often observe nothing listening on a port initially.
Prior to grpc-js 0.7.5 it worked all quite fine, client and some initial calls that I start waited for the server to get ready and connected silently. Apparently 0.7.5 introduced some fixes that made calls fail-fast when client is not connected, so I modified code passing waitForReady:true in metadata.
Unfortunately, this didn't bring the previous behavior. Now the client / calls are stuck and never connect successfully.
Reproduction steps
(the calls are stuck on connecting)
Environment
Additional context
I poked around a bit and it seems that there is TRANSIENT_FAILURE in channel.js (this piece of code executed):
but nothing happens afterwards, there is never completion or other result of re-connection attempt
The text was updated successfully, but these errors were encountered: