-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitmonero not close connections #602
Comments
If you run just bitmonerod, and no wallet, do you see socket leaks like this if you run repeatedly: curl -X POST http://127.0.0.1:28081/json_rpc -d '{"jsonrpc":"2.0","id":"0","method":"get_connections","parms":{}}' -H 'Content-Type: application/json' Adapt the IP address and port to your RPC connection. I don't see any leaking sockets here. Which boost version are you using ? |
After updating libboost from 1.55 to 1.59 and recompiling bitmonerod from source - seems all OK. |
For whoever still has the bug, this is a patch that adds logs that hopefully can help. If you run with it, it needs starting the daemon with --log-level 1. |
This may not be related to the timing out handshakes, as first thought. I can now replicate the timeouts by forcing an error path in the protocol, so the connection gets dropped, and the peer times out. This does not end up in leaking sockets though. |
Can you please try again with the patch here: http://fpaste.org/330341/14565796/raw/ It replaces the previous patch. Run with --log-level 1 I will need to know the local port of the lingering connections (ie, as shown by netstat) |
Possible fix: http://fpaste.org/330766/66641701/raw/ |
The above was not thread safe, this should make it so: http://fpaste.org/330911/70084414/ |
Again, a new patch which replaces the previous ones: http://fpaste.org/334868/57290305/ I think I've found a couple issues. One is fixed in this patch, the other not yet. If the logs go my way, I will fix that second one too (fixing the second one now would hide whether the first is fixed). Please run with --log-level 1 again. |
If anyone else wants to test, the latest patch is now at https://github.com/moneromooo-monero/bitmonero/tree/leaking-sockets and includes hyc's race fixes for db errors. |
Hi @moneromooo-monero, has this been folded into core since your post or should I still pick up the patch from hyc's repo? |
Looks like the bug fix one git merged, but not the belt and braces one, since it doesn't address the root cause. |
In the version 'Wolfram Warptangent' (v0.10.0.0-80c5de9) |
@AJIekceu4 with @vtnerd's thread changes for 0.10.1, would you now try with 0.10.1 and let us know? |
My changes should have no affect on this (haven't completely changed the background tasking code). Although, if it somehow fixed it, celebrate. |
Hello all, i use new version 3 days and at this time no problem with connections, but may be it is just coincidence. I will test it few weeks more and make feedback here. |
I test it and unfortunally nothing change. Some connections still not closing. |
With current master, we can have a lot better information, as the old _info type logs are now going into the normal log. Can you try again with this setting please: --log-level 2,*net*:3 |
Along with a note saying which IP leaked at what time if known, as the logs may be pretty large :) |
OK, i will post information later, when (if ;) detect this bug again in new version. |
netstat -ntu | grep ESTAB | grep :13666 91.211.144.242 - my own IP and no active connection to rpc port between my pc and my remote node. Log file (not from start, but from moment, when problem connection detected, port 56812, ip 91.211.144.242): cat /root/public_node/bitmonero.log-2017-02-27-11-23-11 | grep 56812 daemon - Monero 'Wolfram Warptangent' (v0.10.2.1-release) |
Seems even less helpful than before. That bug's a real stinker :( |
I change server: How reproduce bug:
I log all network data between my pc and remote node (via tcpflow) and check this connection data transfer (to 38006 port and from 38006 port): This is from my pc to remote node
This is answer from remote node to my pc:
And this is ALL network activity for this connection (port 38006) for 5 hours, while my wallet-cli was active. Wallet-cli create separate connection only for this one getheight query and after few hours this connection became zombie on daemon side ;) |
Can you try this branch ? |
Hello. I am testing it now. I will post some info after few days of using this version ;) |
After some time, i get this error and monerod stopped: 2017-07-01 18:31:04.956 [P2P9] INFO net.p2p.msg src/cryptonote_protocol/cryptonote_protocol_handler.inl:1017 [92.135.206.9:60040 INC] Received NOTIFY_REQUEST_CHAIN (27 blocks This is last line,after this monerod stop work:
|
Can you try #2685 ? |
Testing it now. I will report info few days later. |
Nothing change. I still can reproduce bug with this:
|
+bug |
I find it strange that the wallet client can cleanly exit and yet the kernel on the monerod side still sees the connection in ESTABLISHED state instead of e.g. TIME_WAIT or some other pending-close state. I suggest we turn on KEEPALIVE on the sockets; that ought to get the kernel to eventually notice the connection is dead. |
I run public node:
/bitmonerod --p2p-bind-ip 188.166.61.194 --p2p-bind-port 18080 --rpc-bind-ip 188.166.61.194 --rpc-bind-port 13666...
version: Monero 'Hydrogen Helix'
http://monero.net/images/graph/node_week.png
Sun 12-00 - upgrade from version v0.8.8.7-e175205 to v0.9.0.0-release.
On this graph tcp connections to port 13666. All this connections is remain "established".
netstat -ntu | grep ESTAB | grep :13666 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr
14 171.5.241.66
4 171.5.245.99
...
blablabla
171.5.241.66 - this is my own IP, i use my remote node from this IP 3 days ago. Use 'exit' command, when i quit from wallet.
171.5.245.99 - this is my own IP, i use my remote node from this IP 2 days ago. Use 'exit' command, when i quit from wallet.
netstat -ntu | grep ESTAB | grep :13666 | sort
tcp 0 0 188.166.61.194:13666 171.5.241.66:41626 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:41640 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51419 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51420 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51452 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51453 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51461 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51486 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51487 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51488 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51489 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:51492 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:54915 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.241.66:54916 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.245.99:53898 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.245.99:53900 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.245.99:53901 ESTABLISHED
tcp 0 0 188.166.61.194:13666 171.5.245.99:53902 ESTABLISHED
...
blablabla
Why all this connection not closed after 2-3 days of inactive? With other service (apache) i don't see any problem with not closed connections.
The text was updated successfully, but these errors were encountered: