spawn_link in hackney_pool issue #680

leonardb · 2021-03-17T14:00:06Z

With the change in #674 to spawn_link, the caller process will receive the exit message.

If the caller is trapping exits this can be quite unexpected and problematic.

This was quite a surprise when I upgraded as I was only expect exit signals from a potential crash in RMQ connections being handled in the same server. In which case the gen_server would re-initialize.

Since the caller has no access to the Pid of the spawn process it's not possible to cleanly handle this in the caller and the EXIT signal will always propagate.

A possible solution may be using a monitor instead of a link

checkout(Host, Port, Transport, Client) ->
  Requester = self(),
  Ref = make_ref(),
  Fun =
    fun() ->
      Result =
        try
          do_checkout(Requester, Host, Port, Transport, Client)
        catch _:_ ->
          {error, checkout_failure}
        end,
      exit({normal, {checkout, Ref, Result}})
    end,
  {ReqPid, ReqRef} = spawn_monitor(Fun),
  receive
    {'DOWN', ReqRef, process, ReqPid, {normal, {checkout, Ref, Result}}} ->
      Result;
    {'DOWN', ReqRef, process, ReqPid, _Error} ->
      {error, checkout_failure}
  end.

benoitc · 2021-03-17T14:50:02Z

Using spawn_monitor will not solve the issue the change to spawn_link was intended to fix.

I think an appropriate change for this version would be going back to the use of spawn and handle the race condition during the checkout. I will have a look asap

leonardb · 2021-03-17T14:55:21Z

@benoitc Maybe I'm misunderstanding the entire issue.

Why would it not solve the problem? The receive statement should receive any exit where the spawned process crashes as per the description of the issue in #675

In the case of a non-normal exit this should be received by the 2nd clause in the receive and allow a clean return of an error to the caller.

The only real difference is you're not allowing a spurious exit message to propagate out of checkout

Happy to be educated here.

Edit

Went and re-read the initial issue and it seems I totally misunderstood. Specific case was request crashing, not an issue with checkout.

benoitc · 2021-03-17T15:05:39Z

@leonardb correct, the spawn_link is intended to fix the case when the requester is crashing. The linked process doing the checkout is exiting when it's happening.

benoitc · 2021-03-17T15:25:03Z

@leonardb #681 should fix your issue.

@ronaldwind can you also test this patch and let me know how it behaves on your side. This should also fix the issue you intended to solve using spawn_link in #674.

benoitc · 2021-03-17T23:05:18Z

1.17.3 has been released.

ronaldwind · 2021-03-18T10:01:30Z

@benoitc Too bad, this change reintroduces the problem I had again. I've upgraded to 1.17.3 an ran the snippet from my issue at #675, and the in_use_count is once again shows increasing numbers.

benoitc · 2021-03-18T10:03:14Z

it should reduce though. or maybe it is only a counter not reduced?

On Thu 18 Mar 2021 at 11:01, Ronald Wind ***@***.***> wrote: @benoitc <https://github.com/benoitc> Too bad, this change reintroduces the problem I had again. I've upgraded to 1.17.3 an ran the snippet from my issue at #675 <#675>, and the in_use_count is once again shows increasing numbers. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#680 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADRIUF3WZGNIYX3NG3423TEHFQ3ANCNFSM4ZKUW3QA> .

-- Sent from my Mobile

ronaldwind · 2021-03-18T10:34:50Z

The in_use_count is not lowering as time passes. I've waited for 10+ minutes and the count is still the same.

I've also tried filling up the pool to the point in_use_count equals max.
At that point no new requests are made and I simply receive a :checkout_timeout:

iex(1) > :hackney.request(:get, "https://github.com/benoitc/hackney", [], "", [])
{:error, :checkout_timeout}

In short, it does not seem to be that the counter is just reporting wrong numbers. And it seems like it is the exact same behaviour as it was in 1.17.0.

benoitc · 2021-03-18T11:00:41Z

@ronaldwind can you try this patch

diff --git src/hackney_pool.erl src/hackney_pool.erl
index c913998..ba3a739 100644
--- src/hackney_pool.erl
+++ src/hackney_pool.erl
@@ -75,8 +75,8 @@ checkout(Host, Port, Transport, Client) ->
           Requester ! {checkout, Ref, Result};
         false ->
           case Result of
-            {ok, SocketRef, Socket} ->
-              checkin(SocketRef, Socket);
+            {ok, {_Name, ConnRef, Connection, Owner, Transport}, Socket} ->
+              gen_server:call(Owner, {checkin, ConnRef, Connection, Socket, Transport}, infinity);
             _Error ->
               ok
           end

also can you send me a trace. You can do it in private on my mail if you want to

benoitc · 2021-03-18T12:48:56Z

@ronaldwind i will do a tests later today, but if you have the possibility to do it , please let me know :)

leonardb · 2021-03-18T12:57:22Z

This patch appears to resolve the issue. Just for testing I altered the patch so the caller process "deadness" is guaranteed (no timing issues). diff --git a/src/hackney_pool.erl b/src/hackney_pool.erl index c913998..e0c7101 100644 --- a/src/hackney_pool.erl +++ b/src/hackney_pool.erl @@ -70,13 +70,14 @@ checkout(Host, Port, Transport, Client) -> catch _:_ -> {error, checkout_failure} end, + exit(Requester, kill), case is_process_alive(Requester) of true -> Requester ! {checkout, Ref, Result}; false -> case Result of - {ok, SocketRef, Socket} -> - checkin(SocketRef, Socket); + {ok, {_Name, ConnRef, Connection, Owner, Transport}, Socket} -> + gen_server:call(Owner, {checkin, ConnRef, Connection, Socket, Transport}, infinity); _Error -> ok end and tested as follows: Erlang/OTP 23 [erts-11.1.8] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1] [hipe] Eshell V11.1.8 (abort with ^G) 1> application:ensure_all_started(hackney). {ok,[unicode_util_compat,idna,mimerl,certifi,syntax_tools, parse_trans,ssl_verify_fun,metrics,hackney]} 2> hackney_pool:start_pool(test_pool, [{checkout_timeout, 100}, {timeout, 1000}, {max_connections, 10}]). ok 3> [begin 3> P = spawn( 3> fun() -> 3> Res = case hackney:request(get, " https://github.com/benoitc/hackney", [], "", [{pool, test_pool}]) of 3> {ok, _, _, _} -> 3> ok; 3> Err -> 3> Err 3> end, 3> io:format("~p : ~p~n", [X, Res]) 3> end) 3> end || X <- lists:seq(1,20)]. [<0.429.0>,<0.430.0>,<0.431.0>,<0.432.0>,<0.433.0>, <0.434.0>,<0.435.0>,<0.436.0>,<0.437.0>,<0.438.0>,<0.439.0>, <0.440.0>,<0.441.0>,<0.442.0>,<0.443.0>,<0.444.0>,<0.445.0>, <0.446.0>,<0.447.0>,<0.448.0>] 4> timer:sleep(1000). ok 5> hackney_pool:get_stats(test_pool). [{name,test_pool}, {max,10}, {in_use_count,0}, {free_count,12}, {queue_count,0}] Aside: I'm still not certain this is the correct way to handle the issue as is there not still a possibility the Requester dies before receiving the connection or fails to return it? I'm wondering if maybe the pool should be monitoring the Requester instead and return the connection if the Requester dies without the connection being returned. I'm also a bit confused as the code in hackney_pool *seems* to be indicating that the Requester is being monitored in some fashion.

…

On Thu, Mar 18, 2021 at 7:01 AM Benoit Chesneau ***@***.***> wrote: @ronaldwind <https://github.com/ronaldwind> can you try this ptch diff --git src/hackney_pool.erl src/hackney_pool.erl index c913998..ba3a739 100644 --- src/hackney_pool.erl +++ src/hackney_pool.erl @@ -75,8 +75,8 @@ checkout(Host, Port, Transport, Client) -> Requester ! {checkout, Ref, Result}; false -> case Result of - {ok, SocketRef, Socket} -> - checkin(SocketRef, Socket); + {ok, {_Name, ConnRef, Connection, Owner, Transport}, Socket} -> + gen_server:call(Owner, {checkin, ConnRef, Connection, Socket, Transport}, infinity); _Error -> ok end also can you send me a trace. You can do it in private on my mail if you want to — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#680 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFKWMFVHLZCAP432VHCGTDTEHMO3ANCNFSM4ZKUW3QA> .

ronaldwind · 2021-03-18T13:04:22Z

@benoitc I can also confirm the patch works fine! Thanks @leonardb for testing it too.

aboroska · 2021-03-18T14:31:50Z

@benoitc Is it an oversight that hackney_pool:monitor_client/3 does not actually call erlang:monitor/2 ?

hackney/src/hackney_pool.erl

Lines 628 to 630 in 039db60

    
           monitor_client(Connection, Ref, State) -> 
        
             Clients2 = dict:store(Ref, Connection, State#state.clients), 
        
             State#state{clients = Clients2}.

Where is the monitor placed on the Requester?

benoitc · 2021-03-18T14:33:23Z

thanks @ronaldwind @leonardb I will check it myself. Expect a new release later today.

@leonardb so there is new mechanism to handle the connections that will come the next major version. But right now the connection are indeed monitored via hackney_manager. The current patch is about fixing a race condition that happen when the requester die before it's managed. All this system is overly complicated and will change soon. One of the advantage of it however is letting you pass the control of a connection to another process if needed.

benoitc · 2021-03-18T14:35:12Z

@benoitc Is it an oversight that hackney_pool:monitor_client/3 does not actually call erlang:monitor/1 ?

hackney/src/hackney_pool.erl

Lines 628 to 630 in 039db60

monitor_client(Connection, Ref, State) ->

Clients2 = dict:store(Ref, Connection, State#state.clients),

State#state{clients = Clients2}.

Where is the monitor placed on the Requester?

https://github.com/benoitc/hackney/blob/master/src/hackney_connect.erl#L224

aboroska · 2021-03-18T14:43:54Z

@benoitc You mean hackney_manager:update_state/1 ? That does not place any monitor. I am still confused.

I only found two places where a monitor is placed. Here:

hackney/src/hackney_stream.erl

Line 28 in 039db60

_MRef = erlang:monitor(process, Owner),

and here:

hackney/src/hackney_manager.erl

Line 554 in 039db60

_ = monitor_child(StreamPid),

benoitc · 2021-03-18T16:48:35Z

I don't use monitor but instead linking the processes ans traping exit in manager:

https://github.com/benoitc/hackney/blob/master/src/hackney_manager.erl#L570

aboroska · 2021-03-18T18:33:31Z

Oh, Ic. Then in case of Requester exit the manager sends an explicit 'DOWN' message to the pool process:
https://github.com/benoitc/hackney/blob/master/src/hackney_manager.erl#L387
That is how the client removed and in_use_count is reduced at last in hackney_pool.

Thank you @benoitc. I add this here for those wondering about the same might find it useful.

benoitc · 2021-03-18T22:43:22Z

1.17.4 has been published @aboroska @ronaldwind

benoitc mentioned this issue Mar 17, 2021

Fix/gh680 #681

Merged

benoitc closed this as completed in #681 Mar 17, 2021

ronaldwind mentioned this issue Mar 18, 2021

Dangling clients when requests are aborted mid checkout #675

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spawn_link in hackney_pool issue #680

spawn_link in hackney_pool issue #680

leonardb commented Mar 17, 2021

benoitc commented Mar 17, 2021

leonardb commented Mar 17, 2021 •

edited

Loading

benoitc commented Mar 17, 2021

benoitc commented Mar 17, 2021 •

edited

Loading

benoitc commented Mar 17, 2021

ronaldwind commented Mar 18, 2021 •

edited

Loading

benoitc commented Mar 18, 2021 via email

ronaldwind commented Mar 18, 2021

benoitc commented Mar 18, 2021 •

edited

Loading

benoitc commented Mar 18, 2021

leonardb commented Mar 18, 2021 via email

ronaldwind commented Mar 18, 2021

aboroska commented Mar 18, 2021 •

edited

Loading

benoitc commented Mar 18, 2021

benoitc commented Mar 18, 2021

aboroska commented Mar 18, 2021

benoitc commented Mar 18, 2021

aboroska commented Mar 18, 2021 •

edited

Loading

benoitc commented Mar 18, 2021

spawn_link in hackney_pool issue #680

spawn_link in hackney_pool issue #680

Comments

leonardb commented Mar 17, 2021

benoitc commented Mar 17, 2021

leonardb commented Mar 17, 2021 • edited Loading

Edit

benoitc commented Mar 17, 2021

benoitc commented Mar 17, 2021 • edited Loading

benoitc commented Mar 17, 2021

ronaldwind commented Mar 18, 2021 • edited Loading

benoitc commented Mar 18, 2021 via email

ronaldwind commented Mar 18, 2021

benoitc commented Mar 18, 2021 • edited Loading

benoitc commented Mar 18, 2021

leonardb commented Mar 18, 2021 via email

ronaldwind commented Mar 18, 2021

aboroska commented Mar 18, 2021 • edited Loading

benoitc commented Mar 18, 2021

benoitc commented Mar 18, 2021

aboroska commented Mar 18, 2021

benoitc commented Mar 18, 2021

aboroska commented Mar 18, 2021 • edited Loading

benoitc commented Mar 18, 2021

leonardb commented Mar 17, 2021 •

edited

Loading

benoitc commented Mar 17, 2021 •

edited

Loading

ronaldwind commented Mar 18, 2021 •

edited

Loading

benoitc commented Mar 18, 2021 •

edited

Loading

aboroska commented Mar 18, 2021 •

edited

Loading

aboroska commented Mar 18, 2021 •

edited

Loading