-
-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection process stopped handling requests #334
Comments
If this If you see nothing, then I suppose the process is really stuck in |
Thanks, yes, {current_function, {gen, do_call, 4}} is always the same. |
I couldn't dbg somehow, I tried to erl -remsh to my app and open trace port, but got error:
The erlang version on the server is 23. Also I did and got this: But this didn't give me any clue. |
Just do dbg:tracer().
dbg:tpl(gen).
dbg:tpl(gen_statem).
dbg:tpl(gun).
dbg:p(<0.31175.48>, c). |
It failed on the first command:
It seems that module dbg is not present on the server.
tried to load it:
|
Ah well you probably are running a release that doesn't include In any case if you are on 23 I would recommend upgrading. What you are seeing looks like a bug in OTP, which may have been fixed since. At the very least make sure you are on the latest 23 patch release (OTP-23.3.4.20). Then if you run into this again, with |
Ok, thanks, currently I see this version: %% coding: utf-8 |
That's OTP-23.3.4.18. OTP-23.3.4.19 fixes an issue where processes that use OTP-23.3.4.20 doesn't seem to have a relevant fix but doesn't hurt to use it. |
I'll put here the reply which I got from Fred Herbert on erlangforums, based on my backtrace: https://erlangforums.com/t/gun-gen-statem-process-queue-messages-stoped-being-handled/3413/5 Your FSM is currently stuck in a call to another FSM. In this case it appears to be related to TLS connection handling, during the handshake. The pid of that FSM Is <0.21289.50>. Since that call has an infinity timeout, your own FSM is gonna be stuck until the other one returns; if it’s hung (because it’s waiting on your FSM or on a broken external connection), then you’re also going to be stuck forever. Your options are likely to be one of:
|
Yes I suppose you could configure Edit: Note that depending on what the issue is, setting a timeout will not help. It depends on which process gets stuck and why. |
I had to restart the application because of complaints, so I can't investigate that process. And currently there is no queues in gun connection processes, as I said it happens occasionally under unknown conditions. |
I use gun 2.0.0 as http client in my application and use once opened connections to handle multiple requests.
If for some request
gun:await(Connection, StreamRef, Timeout)
returns {error, Reason} I close the connection and it's not used any more, excepting {error, timeout} case, in this case I consider connection still valid and continue using it.
It most of the time works ok, but recently my application stopped handling requests and it turned out that gun connections processes have lists of unhandled messages in their mailboxes, like this:
And every new request just adds a new message to that mailbox.
I've looked into gun's code and I see that it should handle such messages with no problem in 'connected' state, here is the line from gun:
I tried to see the status of that connection process (with big timeout just in case) :
sys:get_status(<0.31175.48>, 600000).
But it didn't work, I've just found the new system message in gun's connection process mailbox:
{system,{<0.22180.130>,#Ref<0.3234177338.4042522626.87030>}, get_status}
So I can't find out what state is this gen_statem server is in and I don't know what caused it to switch into this state from 'connected' state.
The process_info looks like this:
The protocol used by connection is http2 and secure layer is on.
I can't figure out reasons why that could happen. I'm sure that this process was in state 'connected' because I stored it's pid in my pool of connections when the connection was established.
The text was updated successfully, but these errors were encountered: