-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary memory leak in drv_binary associated with the dist port #5876
Comments
Erlang/OTP 23 also introduced erlang:term_to_iovec/1 and started using in the the distribution. So that could be another source of this leak. I don't suppose it is possible for you to see if the issue can be recreated using the latest Erlang/OTP 22? Normally when trying to find leaks with refc binaries the core file does not help, as all it will show is a bunch of binaries that have their refc at 1 without any idea about what code missed to do the decrement it needed to... |
Have you run |
Thanks for taking a look @garazdawi
Thanks for the idea. I'll try running with OTP 22.
We ran that across the cluster a few times to see what we could find but it didn't seem to reveal large differences between the nodes which leaked memory and the ones which didn't. I tried again a few times:
And one which doesn't leak binary memory
Then I thought of naively summing the binary sizes referenced in the f(SumAllBins), SumAllBins = fun() ->
lists:foldl(fun(Pid, SzAcc) ->
Bins = case erlang:process_info(Pid, binary) of
undefined -> [];
{binary, BList} -> BList
end,
lists:foldl(fun({Addr, Size, _Refs}, Acc) ->
Acc + Size
end, SzAcc, Bins)
end, 0, processes())
end. And noticed that even with the naive summing it seems the total sum of
|
I was thinking a bit more about what you wrote here. This would suggest that the refc binaries are kept alive by something that is destroyed when the connection is dropped. I've written a small extension to etp that can dump information about all current distribution entries in a core dump. What do you get if you run that on your dump? |
Thanks for the script @garazdawi. Here is the output from the node which had the largest binary memory leak at the time: https://gist.github.com/nickva/b428fbb52f93c8248c7be67361cdc03c The node where the core was dumped is db11. db8-db12 are the peer nodes most of the dist data goes through there. It is odd to see db3 in the disconnected nodes that used to be an older (replaced) node so something had tried to connect to it. And also db11 has an entry for itself, but that may be expected? EDIT: We suspected the search component (clouseau) initially as and restarted it a a few times but it didn't seem to clear the binary memory. |
hmm, that didn't tell us much...
yes, that is expected. |
We managed to start the db11 node with with OTP 22 and are monitoring it currently. Looking at other metrics we have also noticed that whenever binary memory is bumped up it corresponds with a spike in traffic and those spikes also coincide with the an increase in our For example in this graph the green is the binary memory on a "good" node (db12) where it stays constant over a few weeks. The blue one is the "bad" (db11) one which keeps increasing. The second set of metrics (the right Y axis) is the rate of Also I had tried a higher |
After running with OTP22 for some time, unfortunately it seems to exhibit the same memory growth as OTP 23. When discussion with a co-worker the issue we were wondering if fragmentation and re-assembly of fragments could leak memory if some fragments never arrive and if erlang:send(...., [nosuspend]) could contribute to that. Another idea we had, since we cannot "find" this memory in |
@garazdawi I think we can reproduce it locally with https://gist.github.com/nickva/0805d9e5865438220ae52fbf2e1a102d which, I believe, is one of your own scripts I found in a mailing list and I altered it a bit :-) With that script the memory leak can be seen on OTP 22, 23 and 24, but not OTP 21. That might point to dist fragmentation? OTP 23
Running OTP 22 and 24 look about the same. Maybe on 24 it was a tiny bit slower. I had waited for 10-20 minutes then, and it still was allocated. Finally, to confirm it tied to dist connection I explicitly disconnected from the node and it immediately got deallocated:
OTP 21
|
Thanks for the great reproducer! I was able to capture the fault in an rr session and I think I have figured out what is going on... though I don't have any solution to fix it yet. Each fragment sent has a sequence ID and a fragment ID. The sequence ID is basically the PID of the sending process and it is used as the key when re-assembling fragments on the receiving side. When a process receives an exit signal while suspended during a fragmented send, the send operation is aborted and the remaining fragments are not sent. This caused the receiving side to keep the incomplete message in its buffer until the connection was aborted. I could have sworn I had handled this scenario when adding the fragmented send functionality, but apparently not... As I see it we have a couple of options:
I'm experimenting a bit to see which approach is better, though I'm currently leaning towards #2. |
If a process is suspended doing a fragmented send and then receives an exit signal it was terminated before it could finish sending the message leading to a memory leak on the receiving side. This change fixes that so that the message is allowed to finish being sent before the process exits. Closes erlang#5876
Please try #5892 which is an implementation of suggestion 2. |
That's great news, thanks @garazdawi! Will give it a try |
@garazdawi we were curious why the release compat flag |
If a process is suspended doing a fragmented send and then receives an exit signal it was terminated before it could finish sending the message leading to a memory leak on the receiving side. This change fixes that so that the message is allowed to finish being sent before the process exits. Closes erlang#5876
I don’t know, and I won’t have time to check until earliest Tuesday. |
@garazdawi thanks for the quick fix! Confirmed with the reproducer locally on maint-23 + your patch that the memory doesn't increase and when
After
|
Great! Then I just need to make sure that I did not break anything else. Handling of process state is quite complex and it is easy to break something while fixing something else... Regarding the |
Thanks for explanation about the As for fix, thanks for the heads up. For now we haven't deployed it further but the cluster were it was deployed seems stable. |
@garazdawi thank you for investigating and fixing the binary leak. I see it's already part of the new Erlang 23 patch release https://github.com/erlang/otp/releases/tag/OTP-23.3.4.14! |
Happy to help! Thanks for providing such an easy to reproduce test case! Having a way to reproduce the problem makes things so much easier. |
Describe the bug
Upgrading from OTP 20 to OTP 23 noticed increasing binary memory usage associated with tcp_inet drv_binary allocation type
It is noticeable in the
instrument:allocations()
output:Another server:
A well-behaved node in the same cluster:
(The increase on the graph corresponds with upgrading to OTP 23, the flat section previously is OTP 20)
Suspected dist protocol and possibly the 64kb fragmentation logic interacting with refc binaries as when I had tried to dump the core with gdb, the process was suspended long enough to miss the dist tick timeout and it disconnected. When that happened the binary memory was released immediately.
(The core was dumped around 21:43 to 21:45)
To Reproduce
Out of 6 or so clusters that were upgraded (each with 3 to 12 nodes or so) only observed this on one cluster and on that cluster only on 3 out of 6 nodes.
I couldn't reliable reproduce the issue on my laptop. I had tried simply sending large binaries between two local nodes. Sometimes I see spikes in the
drv_binary =>
64kb histogram slot but then it clears out back to 0.Expected behavior
The binary memory wouldn't keep increasing and would either level off at a low enough level and stay there or get de-allocated.
Affected versions
Erlang 23 (erts-11.2.2.10)
Additional context
vm.args used:
I had suspected it was a simple refc binary memory leak and ran
recon:bin_leak(5)
which runs GC for every process but that didn't affect the binary memory.Other suspicion was that it may be memory fragmentation and had tried running with smaller limit for binary MBC blocks with
And removing
+MBacul 0
and+MBas aobf
. But that didn't seem to make a difference.A few more details:
Most of the messages are sent between nodes with
erlang:send(Dest, Msg, [noconnect, nosuspend])
command. In case the messages cannot be sent (response =/=ok
), there is some buffering with a single head of line sender attempting to re-connect: https://github.com/apache/couchdb/blob/3.x/src/rexi/src/rexi_buffer.erl#L73-L84. Perhaps after Erlang 21 improvingerlang:send/2,3
and making it more asynchronous that buffering strategy is now counter-productive and may be involved in this binary leak issueA lot of traffic may be binary data being sent through a coordinator node on the cluster to be written to other nodes. So a common path might be
http socket <-> node1 <-> [dist] <-> node2 <-> disk
, where node1 would be a coordinator and write or read from multiple cluster nodes then respond to the http requestI had tried to determine the exact size in bytes of the 64kb block by varying the start size of the histogram and the size seems to be
65608
:Suspecting it may have something to do with dist protocol fragmentation and wanting to mitigate the issue somehow, I had tried to use the release compatibility flag
+R 21
on one of the nodes, but that didn't seem to have any effect.I have a gdb core dump and had been trying to inspect it but so far have not gotten very far:
The text was updated successfully, but these errors were encountered: