Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handshakes leaking when using RedisStore #1064

Closed
wants to merge 1 commit into from

Conversation

natefaubion
Copy link

We've been trying to deploy Socket.IO in production, but have been having a hard time keeping it up for more than a couple hours due to ballooning memory usage. Looking at a heap dump showed that manager.handshaken was not getting cleared out. I put in some logging and found that other nodes were never getting notified of disconnects, and thus not clearing out their handshake data. I added another couple of publish calls in onDisconnect within the if (local) block to let other nodes know of the disconnects.

I also found that while in manager.js, onDisconnect has an argument local, it is actually onClientDisconnect that gets called with the local flag in transport.js. Consequently handlers were also never getting removed from the store. I added the flag to onClientDisconnect and pass it on to onDisconnect.

With some more logging, I found that some keys still weren't getting cleared out of transport, closed, and open (usually left with an empty array). So I added an additional routine to garbageCollection to loop through those and clear out any keys that weren't in handshaken.

These changes have kept memory usage from climbing so quickly, though it still continues to climb slowly (not sure where else to look at this point). CPU usage using RedisStore is the larger problem for us right now.

Fix various leaks pertaining to sessions. More thorough garbage collection.
@natefaubion
Copy link
Author

Regarding CPU usage with RedisStore, it appears that sub handlers leak. When someone disconnects, a variety of handlers can be left behind listening to Redis pub messages. Since each handler appears to unpack every single message anew, CPU usage will gradually increase as more and more handlers leak. I verified this by setting up a loop that logged the number of handlers in manager.store.sub._events.message. The one that sticks around the most is the dispatch:id channel. It seems to get subscribed in onClose, and then unsubscribed in onDisconnect but because of the async nature of Redis subscriptions, it never gets removed.

My current fix is quite kludgy, but works well. I'm tagging every handler that goes into manager.store.sub._events.message with a channel attribute that represents the channel it listens for. Then in the garbage collection routine, I've added some code that loops through these handlers and unsubscribes from channels for sessions no longer in handshaken.

Manager.prototype.garbageCollection = function () {
  ...
  this.store.sub._events.message.forEach(function (handler) {
    if (!handler.channel) return;
    var ch = handler.channel.split(':');
    if (ch.length === 2 && !(ch[1] in self.handshaken)) {
      self.store.unsubscribe(handler.channel);
    }
  });
};

The real solution would be to go through and trace why some of these handlers don't get cleaned up, but this works for now and seems to have solved our memory and CPU problems.

@shripadk
Copy link
Contributor

This has already been fixed: #1061

@natefaubion
Copy link
Author

It seems like there's still a lot of cleanup to be done though. Like what's the point of the local flag if its never passed?Aren't there potentially more handlers that could leak? That's part of what I fixed: properly propagating that flag, and reusing the disconnect channel. That seems to be what the intention was anyway with that flag.

Should I make another issue for the other memory issues? The redis store right now is still unusable with dispatch handlers leaking.

@shripadk
Copy link
Contributor

shripadk commented Nov 1, 2012

@natefaublon what are the handlers that are leaking? Most of the handlers are set during connection and are removed on client disconnection (only for that particular node). The other child processes don't inherit those handlers AFAIK.

@shripadk
Copy link
Contributor

shripadk commented Nov 1, 2012

There is no point to the local flag. Thats precisely what I was saying on the other issue page. Even those extra handlers like: disconnect:this.id are not required. The existing disconnect handler can be reused. Global handlers are enough. There is no need for per-client handlers. Memory usage will increase unnecessarily for every new topic subscribed. psubscribe/punsubscribe can be used instead.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants