Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH gun_http2:parse in server logs #242

Closed
jainygaurav opened this issue Mar 1, 2021 · 26 comments
Closed

CRASH gun_http2:parse in server logs #242

jainygaurav opened this issue Mar 1, 2021 · 26 comments

Comments

@jainygaurav
Copy link

jainygaurav commented Mar 1, 2021

OTP: 19.3.6
apns4erl: from master

  • I have ejabberd running on aws EC2 instance.
  • I have an ejabberd module
  • I do the following inside ejabberd module my_apns_module.erl which is running as a gen_server started by ejabberd:
init([Host, _Opts]) ->
        apns:start(),
        case apns:connect(cert, ?APNS_CONNECTION) of
             {ok, ServerPid} -> ?ERROR_MSG("apns connection successful with PID ~p~n", [ServerPid]);
             {error, timeout} -> ?ERROR_MSG("apns connection unsuccessful reason timed out", [])
        end,
  • I am able to send push notifications with the code above.
  • But after few days, I observe the following in ejabberd logs
  • And then I can not send push notifications until I restart ejabberd
  • Then after few days, it happens again.

CRASH REPORT Process <0.491.0> with 0 neighbours crashed with reason: no function clause matching gun_http2:parse(<<>>, close) line 77

Please help me resolve this issue.

@jainygaurav jainygaurav changed the title gun_http2:parse in ejabbered logs CRASH gun_http2:parse in ejabbered logs Mar 1, 2021
@jainygaurav
Copy link
Author

jainygaurav commented Mar 5, 2021

OTP: 19.3.6
apns4erl: from master

Can you please help me with this issue:

Last push notification sent:

2021-03-02 21:27:36.030 [info] <0.506.0>@my_mod_apns:sendAPNS:290 Success in request. Reason was: 200 with <<"c2ca5csdsc5b0550a74fe799asda96d9049d3502c58fd5e4b27e98ed1b97e49c2d">>

Then after couple of days of non-use of apns_connection:

2021-03-05 15:51:38.055 [error] <0.530.0> CRASH REPORT Process <0.530.0> with 0 neighbours crashed with reason: no function clause matching gun_http2:parse(<<>>, close) line 77

@jainygaurav jainygaurav changed the title CRASH gun_http2:parse in ejabbered logs CRASH gun_http2:parse in server logs Mar 5, 2021
@jainygaurav
Copy link
Author

I found this doing some research if it helps:

ninenines/gun#105

@paulo-ferraz-oliveira
Copy link
Contributor

paulo-ferraz-oliveira commented Mar 5, 2021

@jainygaurav, did you try getting help next to the gun folks? I'm not sure (without digging further) that issue comes from apns4erl (not saying it comes from gun either, but since the crash is in an internal function, might be worth digging into what can cause it/how it can be avoided).

@jainygaurav
Copy link
Author

jainygaurav commented Mar 5, 2021

OK..let me try asking on https://github.com/ninenines/gun/issues.

@jainygaurav
Copy link
Author

@paulo-ferraz-oliveira as per your comment on ninenines/gun#261, can you please point me to build or code changes w.r.t gun-2.0.0-rc1 that I need to do to get it working on my end.

@paulo-ferraz-oliveira
Copy link
Contributor

Specifically for apns4erl I can't, because I haven't done it yet (so haven't gone through the upgrade pains), but as a general replacement you could try to add {gun, "2.0.0-rc.1"} to your rebar.config to lock that version (you might want to do rebar3 upgrade at this point). The only change I had to do, in another context (not apns4erl) was rename transport_opts as tls_opts, which I guess will have to be done in apns4erl some time in the future, too.

@jainygaurav
Copy link
Author

jainygaurav commented Mar 7, 2021

@paulo-ferraz-oliveira I get following warnings:

_build/default/lib/gun/src/gun_socks.erl:131: Warning: a term is constructed, but never used
_build/default/lib/gun/src/gun_socks.erl:136: Warning: a term is constructed, but never used

./rebar3 upgrade
===> Verifying dependencies...
===> Fetching base64url v1.0.1
===> Fetching gun v2.0.0-rc.1
===> Fetching jsx v3.0.0
===> Fetching cowlib v2.10.1
===> No upgrade needed for base64url
===> No upgrade needed for gun
===> No upgrade needed for jsx
===> No upgrade needed for cowlib

./rebar3 compile
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling cowlib
===> Compiling base64url
===> Compiling gun
_build/default/lib/gun/src/gun_socks.erl:131: Warning: a term is constructed, but never used
_build/default/lib/gun/src/gun_socks.erl:136: Warning: a term is constructed, but never used

===> Compiling jsx
===> Analyzing applications...
===> Compiling apns

@paulo-ferraz-oliveira
Copy link
Contributor

@jainygaurav, you don't need to double post here and in gun. 😄

That warning's supposed to be innocent, but you'll probably run into trouble when using apns4erl because of the tls_opts as I mentioned before (unless you're not doing TLS). You might need to do a fork and experiment with that.

@jainygaurav
Copy link
Author

ok...

@paulo-ferraz-oliveira
Copy link
Contributor

You can also add this repo. to your watchlist and when the next version is out just import it.

@jainygaurav
Copy link
Author

jainygaurav commented Mar 7, 2021

@paulo-ferraz-oliveira Just FYI...without changing transport_opts -> tls_opts:

I get the following output:

erl -pa _build/default/lib/*/ebin/ -config test/test.config
Erlang/OTP 19 [erts-8.3.5] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V8.3.5  (abort with ^G)
1> apns:start().
ok
2> apns:connect(cert, my_first_connection).
{ok,<0.87.0>}
3> 
=ERROR REPORT==== 7-Mar-2021::23:11:52 ===
** State machine my_first_connection terminating
** Last event = {internal,{"api.development.push.apple.com",443,
                           #{protocols => [http2],
                             retry => 0,
                             transport_opts =>
                                 [{certfile,"priv/apns-dev-cert.pem"},
                                  {keyfile,"priv/apns-dev-key-noenc.pem"}]}}}
** When server state  = {open_common,#{backoff => 1,backoff_ceiling => 10,
                                       client => <0.58.0>,
                                       connection =>
                                           #{apple_host =>
                                                 "api.development.push.apple.com",
                                             apple_port => 443,
                                             certfile =>
                                                 "priv/apns-dev-cert.pem",
                                             keyfile =>
                                                 "priv/apns-dev-key-noenc.pem",
                                             name => my_first_connection,
                                             timeout => 10000,type => cert}}}
** Reason for termination = error:{badmatch,
                                   {error,
                                    {options,
                                     {transport_opts,
                                      [{certfile,"priv/apns-dev-cert.pem"},
                                       {keyfile,
                                        "priv/apns-dev-key-noenc.pem"}]}}}}
** Callback mode = state_functions
** Stacktrace =
**  [{apns_connection,open_common,3,
                      [{file,"/home/ubuntu/tmpXYZ/apns4erl/src/apns_connection.erl"},
                       {line,254}]},
     {gen_statem,call_state_function,5,[{file,"gen_statem.erl"},{line,1223}]},
     {gen_statem,loop_event,6,[{file,"gen_statem.erl"},{line,995}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]

Then I did the changes like this:

      {next_event, internal, { Host
                             , Port
                             , #{ protocols      => [http2]
 -                              , transport_opts => TransportOpts
 +                              , tls_opts => TransportOpts
                                , retry          => 0
                                }}}}.
   

Then my output was as follows:

erl -pa _build/default/lib/*/ebin/ -config test/test.config
Erlang/OTP 19 [erts-8.3.5] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V8.3.5  (abort with ^G)
1> apns:start().
ok
2> apns:connect(cert, my_first_connection).
{ok,<0.87.0>}
3> 

I assume this is what you meant to get it working for this gun dependency.

{deps, [
  {gun, "2.0.0-rc.1"},
  {jsx, "3.0.0"},
  {base64url, "1.0.1"}
]}.

@paulo-ferraz-oliveira
Copy link
Contributor

Yes, that's it. So the only change was that one? Nice.

Now, just let it run and check if the same thing that was happening before still happens. If it does, it requires further digging around.

@jainygaurav
Copy link
Author

@paulo-ferraz-oliveira Thank you so much for being awesome!

I will update accordingly.

@jainygaurav
Copy link
Author

@paulo-ferraz-oliveira erlang/otp#4595

@jainygaurav
Copy link
Author

@paulo-ferraz-oliveira Would you have any idea on ninenines/gun#263?

@paulo-ferraz-oliveira
Copy link
Contributor

According to what @essen replied, there should be no problem. Is it because you're logging/tracing stuff and see the messages? What leads you to believe there is an issue?

@jainygaurav
Copy link
Author

@paulo-ferraz-oliveira I was just looking to make it work like gun-1.3.3. I replied there.

@paulo-ferraz-oliveira
Copy link
Contributor

Sure, but the intent (I guess) should be to "make it work", not "make it work like before" :-)

@jainygaurav
Copy link
Author

jainygaurav commented Mar 10, 2021

@paulo-ferraz-oliveira There were multiple reasons for me to make gun-2.0.0-rc.1 work like gun-1.3.3.

  • This issue (CRASH gun_http2:parse in server logs #242) was happening when apns4erl was not disconnecting and reconnecting at 2 hours interval. So, to make it potentially re-producible in gun-2.0.0-rc.1, I had to make gun-2.0.0-rc.1 work like gun-1.3.3.

  • Then I read https://github.com/inaka/apns4erl ---> Close Connections ---> Best Practices for Managing Connections.

  • So, I assumed apns4erl does not want frequent close connections and was relying on gun-1.3.3 to keep connections alive.

  • But since I, and not apns4erl, upgraded gun ---> 2.0.0-rc.1, so I wanted to remain consistent with apns4erl behavior with 1.3.3.

  • I assumed when apns4erl will officially migrate to gun-2.0.0, they will/should do what I did to remain consistent with prior behavior.

  • Lastly, having done many trials and errors and having limited understanding of various things, I was not sure whether I did something wrong or whether it was the expected behavior for gun-2.0.0-rc.1 as the behavior deviated from gun-1.3.3

  • So, to make myself clear and be sure, I just wanted to make gun-2.0.0-rc.1 work like gun-1.3.3

Please guide/point me to correct behavior.

@paulo-ferraz-oliveira
Copy link
Contributor

I assumed when apns4erl will officially migrate to gun-2.0.0, they will/should do what I did to remain consistent with prior behavior.

I invite you to open a pull request (in apns4erl) with the required changes, so when the time comes it would just be closer to updating the rebar.config pointer. What do you say? 😄

As per what I said in my last comment, if our goal is to keep current behaviour we should respect current requirements (having in mind that gun might not allow us to do that, I'm not sure). I'd prefer to not break anything for the consumer of the apns4erl API just by updating gun, but it might warrant some extra care that we should be verbose in stating in the release notes.

@jainygaurav
Copy link
Author

@paulo-ferraz-oliveira I have never done a pull request or contributed to external projects. I will learn how to do it and will do the same.

Thank you again for everything.

@paulo-ferraz-oliveira
Copy link
Contributor

It's fairly easy.

I'll give you an example for apns4erl.

  1. you fork the project (there's a Fork button at the top),
  2. you clone locally and do your stuff in a branch called (e.g.) feature/gun-2.0.0,
  3. you push locally and refresh the page: there should now be a button letting you know a pull request can be open.

And that's it!

@jainygaurav
Copy link
Author

Great...It will be a good first experience.

@jainygaurav
Copy link
Author

Looks like this issue is fixed with gun-2.0.0-rc.1. I sent last push notification on Mar 10 and kept the connection alive as with gun-1.3.3 (#245) before sending a push notification again on Mar 14 without a server restart. Push notification from Mar 14 went thru successfully.

2021-03-10 20:26:39.495 [error] <0.497.0>@mod_tymeit_apns:sendAPNS:320 Success in request. Reason was: 200 with <<"4c1f88da4e9c5ca87a6d186dad745b3b6817d9fd2ddea095fbfa608c0929d31d">>
2021-03-14 21:10:46.781 [error] <0.497.0>@mod_tymeit_apns:sendAPNS:320 Success in request. Reason was: 200 with <<"9e3d4650671f1c0557dfe751f6019b2ad0ec9d6afe5b822fb1d5a6a07fa22b4d">>
2021-03-14 21:15:03.368 [error] <0.497.0>@mod_tymeit_apns:sendAPNS:320 Success in request. Reason was: 200 with <<"022bc5aba4ed7aa9e964d3c47225c4528ba5dd1e82113bb08af255b5f1bad3b4">>

I will complete the pull request by EOW, Mar 19.

Thank you so much for helping me out with this issue.

@paulo-ferraz-oliveira
Copy link
Contributor

I will complete the pull request by EOW, Mar 19.

Cool. 👍

@jainygaurav
Copy link
Author

@paulo-ferraz-oliveira #246.

Please let me know if anything else needs to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants