Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spawn_link in hackney_pool issue #680

Closed
leonardb opened this issue Mar 17, 2021 · 19 comments · Fixed by #681
Closed

spawn_link in hackney_pool issue #680

leonardb opened this issue Mar 17, 2021 · 19 comments · Fixed by #681

Comments

@leonardb
Copy link
Contributor

With the change in #674 to spawn_link, the caller process will receive the exit message.

If the caller is trapping exits this can be quite unexpected and problematic.

This was quite a surprise when I upgraded as I was only expect exit signals from a potential crash in RMQ connections being handled in the same server. In which case the gen_server would re-initialize.

Since the caller has no access to the Pid of the spawn process it's not possible to cleanly handle this in the caller and the EXIT signal will always propagate.

A possible solution may be using a monitor instead of a link

checkout(Host, Port, Transport, Client) ->
  Requester = self(),
  Ref = make_ref(),
  Fun =
    fun() ->
      Result =
        try
          do_checkout(Requester, Host, Port, Transport, Client)
        catch _:_ ->
          {error, checkout_failure}
        end,
      exit({normal, {checkout, Ref, Result}})
    end,
  {ReqPid, ReqRef} = spawn_monitor(Fun),
  receive
    {'DOWN', ReqRef, process, ReqPid, {normal, {checkout, Ref, Result}}} ->
      Result;
    {'DOWN', ReqRef, process, ReqPid, _Error} ->
      {error, checkout_failure}
  end.
@benoitc
Copy link
Owner

benoitc commented Mar 17, 2021

Using spawn_monitor will not solve the issue the change to spawn_link was intended to fix.

I think an appropriate change for this version would be going back to the use of spawn and handle the race condition during the checkout. I will have a look asap

@leonardb
Copy link
Contributor Author

leonardb commented Mar 17, 2021

@benoitc Maybe I'm misunderstanding the entire issue.

Why would it not solve the problem? The receive statement should receive any exit where the spawned process crashes as per the description of the issue in #675

In the case of a non-normal exit this should be received by the 2nd clause in the receive and allow a clean return of an error to the caller.

The only real difference is you're not allowing a spurious exit message to propagate out of checkout

Happy to be educated here.

Edit

Went and re-read the initial issue and it seems I totally misunderstood. Specific case was request crashing, not an issue with checkout.

@benoitc
Copy link
Owner

benoitc commented Mar 17, 2021

@leonardb correct, the spawn_link is intended to fix the case when the requester is crashing. The linked process doing the checkout is exiting when it's happening.

@benoitc benoitc mentioned this issue Mar 17, 2021
@benoitc
Copy link
Owner

benoitc commented Mar 17, 2021

@leonardb #681 should fix your issue.

@ronaldwind can you also test this patch and let me know how it behaves on your side. This should also fix the issue you intended to solve using spawn_link in #674.

@benoitc
Copy link
Owner

benoitc commented Mar 17, 2021

1.17.3 has been released.

@ronaldwind
Copy link

ronaldwind commented Mar 18, 2021

@benoitc Too bad, this change reintroduces the problem I had again. I've upgraded to 1.17.3 an ran the snippet from my issue at #675, and the in_use_count is once again shows increasing numbers.

@benoitc
Copy link
Owner

benoitc commented Mar 18, 2021 via email

@ronaldwind
Copy link

The in_use_count is not lowering as time passes. I've waited for 10+ minutes and the count is still the same.

I've also tried filling up the pool to the point in_use_count equals max.
At that point no new requests are made and I simply receive a :checkout_timeout:

iex(1) > :hackney.request(:get, "https://github.com/benoitc/hackney", [], "", [])
{:error, :checkout_timeout}

In short, it does not seem to be that the counter is just reporting wrong numbers. And it seems like it is the exact same behaviour as it was in 1.17.0.

@benoitc
Copy link
Owner

benoitc commented Mar 18, 2021

@ronaldwind can you try this patch

diff --git src/hackney_pool.erl src/hackney_pool.erl
index c913998..ba3a739 100644
--- src/hackney_pool.erl
+++ src/hackney_pool.erl
@@ -75,8 +75,8 @@ checkout(Host, Port, Transport, Client) ->
           Requester ! {checkout, Ref, Result};
         false ->
           case Result of
-            {ok, SocketRef, Socket} ->
-              checkin(SocketRef, Socket);
+            {ok, {_Name, ConnRef, Connection, Owner, Transport}, Socket} ->
+              gen_server:call(Owner, {checkin, ConnRef, Connection, Socket, Transport}, infinity);
             _Error ->
               ok
           end

also can you send me a trace. You can do it in private on my mail if you want to

@benoitc
Copy link
Owner

benoitc commented Mar 18, 2021

@ronaldwind i will do a tests later today, but if you have the possibility to do it , please let me know :)

@leonardb
Copy link
Contributor Author

leonardb commented Mar 18, 2021 via email

@ronaldwind
Copy link

@benoitc I can also confirm the patch works fine! Thanks @leonardb for testing it too.

@aboroska
Copy link
Contributor

aboroska commented Mar 18, 2021

@benoitc Is it an oversight that hackney_pool:monitor_client/3 does not actually call erlang:monitor/2 ?

monitor_client(Connection, Ref, State) ->
Clients2 = dict:store(Ref, Connection, State#state.clients),
State#state{clients = Clients2}.

Where is the monitor placed on the Requester?

@benoitc
Copy link
Owner

benoitc commented Mar 18, 2021

thanks @ronaldwind @leonardb I will check it myself. Expect a new release later today.

@leonardb so there is new mechanism to handle the connections that will come the next major version. But right now the connection are indeed monitored via hackney_manager. The current patch is about fixing a race condition that happen when the requester die before it's managed. All this system is overly complicated and will change soon. One of the advantage of it however is letting you pass the control of a connection to another process if needed.

@benoitc
Copy link
Owner

benoitc commented Mar 18, 2021

@benoitc Is it an oversight that hackney_pool:monitor_client/3 does not actually call erlang:monitor/1 ?

monitor_client(Connection, Ref, State) ->
Clients2 = dict:store(Ref, Connection, State#state.clients),
State#state{clients = Clients2}.

Where is the monitor placed on the Requester?

https://github.com/benoitc/hackney/blob/master/src/hackney_connect.erl#L224

@aboroska
Copy link
Contributor

@benoitc You mean hackney_manager:update_state/1 ? That does not place any monitor. I am still confused.

I only found two places where a monitor is placed. Here:

_MRef = erlang:monitor(process, Owner),

and here:

_ = monitor_child(StreamPid),

@benoitc
Copy link
Owner

benoitc commented Mar 18, 2021

I don't use monitor but instead linking the processes ans traping exit in manager:

https://github.com/benoitc/hackney/blob/master/src/hackney_manager.erl#L570

@aboroska
Copy link
Contributor

aboroska commented Mar 18, 2021

Oh, Ic. Then in case of Requester exit the manager sends an explicit 'DOWN' message to the pool process:
https://github.com/benoitc/hackney/blob/master/src/hackney_manager.erl#L387
That is how the client removed and in_use_count is reduced at last in hackney_pool.

Thank you @benoitc. I add this here for those wondering about the same might find it useful.

@benoitc
Copy link
Owner

benoitc commented Mar 18, 2021

1.17.4 has been published @aboroska @ronaldwind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants