Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many connections in state CLOSE_WAIT and FIN_WAIT2 without release #1380

Closed
kejyun opened this issue Dec 25, 2013 · 19 comments
Closed

Many connections in state CLOSE_WAIT and FIN_WAIT2 without release #1380

kejyun opened this issue Dec 25, 2013 · 19 comments

Comments

@kejyun
Copy link

kejyun commented Dec 25, 2013

hi , I used ubuntu(12.04) + nodejs (v0.10.22) + socket.io (v0.9.16) to transmit messages.

There are ~300 simultaneous connections. After some hours (about 1 or 2 hours above, it doesn't show up immediately), some connections will persistent in the state CLOSE_WAIT or FIN_WAIT2.

And these un-dead connections grows linearly with time. The users will hard to connect socket server when the connections number reach the limit (Default 1024 - see also Linux TCP/IP tuning for scalability) , unless some connections released normally.

The following was socket service connections status, running about 3 hours.

netstat -anl | grep <PORT_OF_NODE_PROCESS> | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

FIN_WAIT2 35
LISTEN 1
SYN_RECV 1
CLOSE_WAIT 44
TIME_WAIT 8
ESTABLISHED 288
FIN_WAIT1 8

Probably Solutions:

  • Touch js file in regular periods

Using Nodemon Package to run js file, when change the file's last modified time, nodemon will restart service, and release all previous un-dead connections (CLOSE_WAIT or FIN_WAIT2)

  • Increase connections limit
sudo vim /etc/security/limits.conf

*       soft    nofile  1024
*       hard    nofile  2048
root    soft    nofile  4096
root    hard    nofile  8192
user1   soft    nofile  2048
user1   hard    nofile  2048

Try to let connections hard to reach limit.

  • Decrease keep-alive time

Let operation system to close connections automatically in the short time, but I'm not try it yet.

see also : Using TCP keepalive under Linux

Question:

I found some probably solution to fix the problem. But the above solutions were not really solved the persistent connections with state CLOSE_WAIT or FIN_WAIT2 problem. I could find this is a result of server(CLOSE_WAIT) or clients (FIN_WAIT2) not correctly closing connections. I think socket.io will force-close these incorrectly connection after some timeout. But it seems like not work correctly.

I try to reappear the state CLOSE_WAIT or FIN_WAIT2 problem in my test environment. But it never show up these connection situation.

  1. After connect socket server and disconnect network
  2. Connect socket server for a long time

I found @njam ask related question before (Many stale connections in state CLOSE_WAIT and FIN_WAIT2), but still can't find the solution. Does anyone know how to solve this problem??

Thanks.

References:

  1. Many stale connections in state CLOSE_WAIT and FIN_WAIT2
  2. TIME_WAIT and its design implications for protocols and scalable client server systems
  3. Using TCP keepalive under Linux
  4. Linux TCP/IP tuning for scalability
@njam
Copy link

njam commented Dec 26, 2013

FYI what worked for us: Switching from Socket.io to SockJS and terminating HTTPS outside of node.js.

@leop-tcto
Copy link

I've got the same problem using socket.io 0.9.16.
netstat -anl | grep 8443 | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'
FIN_WAIT2 976
LISTEN 1
ESTABLISHED 3

I don't want to switch away from socket.io. I am hoping someone can help. Not sure what to do from here.

I can easily replicate this problem when I have a few clients connecting and then disconnecting in a loop as a load test. Even when the load test stops, the FIN_WAIT2's stay there forever it seems. The only way to clear the FIN_WAIT2 is to restart the node.js application.

@samsonradu
Copy link

@leopapadopoulos : I had this issue previously but it's been gone after I stopped using RedisStore and upgraded to Socket.io v0.9.16 and Node > v0.10.12

@kejyun
Copy link
Author

kejyun commented Dec 30, 2013

I try to use multiple connections to connect socket server at same time, I found that some of the client socket will use the same SOCKET ID(get from xhr and it will looks like _nmXTMmCGNQp4EncrfHqj_) to establish connection. I close the browser when all connections established, and it will cause many CLOSE_WAIT connections without release. A few of connections will close (Base on number of Unique SOCKET ID that have been generated). Because server will establish TCP/IP connection from SOCKET ID. But, if SOCKET ID connections already exist in connections pool, this connection will not store in connections pool. So when client send FIN packet to try to close connection but not exist in server connections pool. Server will always not send ACK packet to prepare close connection. So these connection will stay in CLOSE_WAIT state and without release.

var host = 'http://socket.server/';
var sockets = [];
for(var i=0;i<200;i++){
    var socket = io.connect(host,{"force new connection":true});
    sockets.push(socket);

  socket.on("message",function(message){
    console.log(message);
  });
  socket.on("disconnect",function(){
    console.log("disconnect");
  });
}

Fix _lib\manager.js_ line 670.

Not to establish TCP/IP connection from SOCKET ID when SOCKET ID connections already exist in connections pool.

See also: kejyun@8d6c02a

if (!this.connected[data.id]) {
  if (transport.open) {
    if (this.closed[data.id] && this.closed[data.id].length) {
      transport.payload(this.closed[data.id]);
      this.closed[data.id] = [];
      }

      this.onOpen(data.id);
      this.store.publish('open', data.id);
      this.transports[data.id] = transport;
    }

    this.onConnect(data.id);
    this.store.publish('connect', data.id);
    //....etc
  }
}

The following was socket service connections status, running about 6 hours.

netstat -anl | grep <PORT_OF_NODE_PROCESS> | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

FIN_WAIT2 37
LISTEN 1
TIME_WAIT 13
ESTABLISHED 295
FIN_WAIT1 20
  1. Benchmarkt socket.io

@leop-tcto
Copy link

@samsonradu Thank you for the prompt response. I am already running socket.io V0.9.16 and node v0.10.16. I am not 'knowingly' using RedisStore . Is it possible that socket.io or node is using RedisStore internally? My understanding is that socket.io needs to be configured for RedisStore and I have not done that.

@leop-tcto
Copy link

@kejyun Thank you for your answer. I will try what you suggest as soon as I can. You directions are really clear thank you.

I have not modified someone else's node.js code before. I have socket.io installed in the node_modules directory. When I modify manager.js do I need to recompile? If so how? Or is this file just referenced in which case I will simply modify it and try my app again.

@leop-tcto
Copy link

@kejyun Unfortunately your suggestion did not work for my problem. When I try it using your test that you describe it does work to stop the CLOSE_WAIT. However, my problem is not with CLOSE_WAIT it is with FIN_WAIT2.
As I run my connect and disconnect test I see the FIN_WAIT2's growing. I let them grow all the way up to 631 before I stopped my test. Then I closed my clients. However the FIN_WAIT2's are stuck at 631. It seems the only way to clear them is to restart the node.js application.

netstat -anl | grep 8443 | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'

FIN_WAIT2 631
LISTEN 1
ESTABLISHED 2

Node Version = v0.10.16
socket.io Version = v0.9.16

@leop-tcto
Copy link

@njam Thanks for the suggestion. About switching away from socket.io to SOCKJS. I need all the room features and the room emit features of socket.io. This is why it is difficult to switch away from socket.io.

@kejyun
Copy link
Author

kejyun commented Dec 31, 2013

@leopapadopoulos Can you describe scenario of your application? Such as how many users? With mobile device? Does network stable? etc...

My FIN_WAIT2 state problem still exist too. I guess it is happened on mobile device with unstable network. I'm not figure it out yet.

I wonder know how to generate FIN_WAIT2 state about your scenario.

thanks.

@leop-tcto
Copy link

@kejyun I use node.js and socket.io as a kind of a pub/sub server. I use socket.io rooms to represent a topic. Data is broadcast to any clients that have joined(subscribed to) the room when the data is changed.

In my test I have a c++ client (socket.io-poco) connect to the server via websocket. The client then subscribes to one or more rooms and makes data changes. Then the test client exits. The client does NOT exit cleanly. However, that is no reason for the FIN_WAIT2's to get stuck forever. For example, if a client crashes or loses network connectivity, socket.io should clean up after itself I would think.

The client does the above over and over again in 10 second intervals as a kind of a load test to see how this would act in production. As it does this over and over again the FIN_WAIT2's grow and grow, so it is clearly not ready for production in this application.

@icykey
Copy link

icykey commented Jan 1, 2014

I am having the same problem, in my case I am running IOS clients and node 0.10.16 with socket.io 0.9.16. Whenever the client "connects" and "disconnects" a connection will stay at FIN_WAIT2 sate and never being released.
After some research, there are also issues being raised with node as well for this but no fix so far. nodejs/node-v0.x-archive#3613
Really hope for an answer.
Thanks!

@freeman983
Copy link

I have same problem.

netstat -anl | grep 8888 | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'
TIME_WAIT 153
CLOSE_WAIT 256
FIN_WAIT2 1
ESTABLISHED 1566

I try @kejyun commented , get many TIME_WAIT state

netstat -anl | grep 8888 | awk '/^tcp/ {t[$NF]++}END{for(state in t){print state, t[state]} }'
TIME_WAIT 1321
CLOSE_WAIT 11
FIN_WAIT2 12
ESTABLISHED 2913
LAST_ACK 1
LISTEN 1

@kejyun
Copy link
Author

kejyun commented Mar 14, 2014

I recently try to trace socket.io source code. I found the socket structure are so complex. The structure seems like following:

Manager = {
    SocketNamespace :{
        sockets : {},
        transport : {
            'websocket' : {},
            'htmlfile' : {},
            'xhr-polling' : {},
            'jsonp-polling' : {}
        }
    }
}

Every structure level will reference each other. I guess it is use for convenient to call each structure level method. Such as "transport" call "Manager".

And a lots of structure level will reference "Socket" connections. So when client disconnect, there may some structure still keep this "disconnect" connection. So it will cause CLOSE_WAIT and FIN_WAIT2 problem.

So I try @njam recommend SockJS, and it will not cause CLOSE_WAIT and FIN_WAIT2 problem. I'm running it really well.

But SockJS without "reconnect server when dissconnect" , "event message" , "chat room" function such as socket.io , but my application need these function. So I try to Implement these fuction call SockJSUtility ( https://github.com/kejyun/SockJSUtility ). And it work well now.

@3rd-Eden
Copy link
Contributor

@kejyun you should checkout Primus, which wraps Sockjs, Socket.IO, engine.io and plain websockets with a common interface and exposes a plugin interface. There are already plugins that implement event emitter and rooms. In addition to that, reconnect is build in to primus using randomized exponential backoff.

https://github.com/primus/primus

@leop-tcto
Copy link

@3rd-Eden Thank you for the suggestion to use https://github.com/primus/primus I will check it out. There are many many posts about this problem. @kejyun seems to be the only one to have come up with his own solution. I had given up on it and was just alarming when FIN_WAIT2 reached a certain level. With node.js being so popular it is a mystery why this is not getting fixed. I wonder if we are posting this problem to the wrong place?

@panuhorsmalahti
Copy link

I'm noticing this same behaviour with 1.0. Disconnected connections are still available after 60 seconds (the default TTL).

@rauchg
Copy link
Contributor

rauchg commented Jun 26, 2014

@panuhorsmalahti how are you listing disconnected connections?

@panuhorsmalahti
Copy link

I'm listing them with io['sockets']['adapter']['sids'] and io['sockets']['adapter']['rooms'], so the problem is atleast with socket.io-adapter.

@mraiter
Copy link

mraiter commented Jul 17, 2014

We've been seeing a similar problem on our production servers, but we've found a fix that seems to work for us, and we wanted to post it to see if it might help others.

We had the same problem as everyone else - a slowly increasing number of zombie connections stuck in FIN_WAIT2 or ESTABLISHED. After trying a bunch of different fixes, we had some luck with the node-ka-patch that @kejyun posted in a different thread.

With that patch, we found that it only eliminated the connections stuck in FIN_WAIT2, but not our zombie ESTABLISHED connections. We took this as a sign we were on the right track, and that enabling keep-alive on the connections could address the problem.

The node.js Buffer API exposes a function to set keep-alive, but we couldn't find a way to call that from a socket that we received from socket.io. So, we decided to see if we could force it using the node-ka-patch as inspiration.

Here's the code snippet:

(function () {
  var TCP = process.binding('tcp_wrap').TCP
    , _writeBuffer = TCP.prototype.writeBuffer;

   TCP.prototype.writeBuffer = function () {
    var r = _writeBuffer.apply(this, arguments);
    this.setKeepAlive(true, 30 );
   return r;
  };
})();

It's a bit hacky, as it's spamming a call to setKeepAlive() everyime the socket writes out a buffer, but we haven't found any noticeable performance impact in our production environment. As I said before - we'd prefer to just set this on the socket once the connection is established, but couldn't find a way to do that through socket.io.

Hopefully, this helps out other folks having this problem. We'd love to hear if anyone has a way to improve on this, or find a way to just call setKeepAlive() directly on the socket.

Oh, and it it's useful, we are using node.js 10.25 and socket.io 0.9.16.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants