-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket.io redis store leaks channels #663
Comments
ping @dshaw ^ |
On it. |
The third leaked subscription is this one: https://github.com/LearnBoost/socket.io/blob/master/lib/manager.js#L433 It looks like this is because the other processes never get a disconnect message, and it looks to me as though that's because connected process's manager never publishes a message on the 'disconnect' channel -- so that probably means we're leaking memory in the client processes as well as channels in the Redis instance. I'll issue a pull; if you want it against some repo other than this one, let me know which. |
@brettkeifer Pull Request #665 is not going to handle it correctly. Sorry, I've been tied up with Thanksgiving holidays. Landing patch(es) now. |
…s not getting passed through onClientDisconnect().
…s not getting passed through onClientDisconnect()." This reverts commit d5ab46d.
@dshaw Cool, and I see that you picked up the disconnect publish from c742735 in c110036 to get rid of the other leaked channel, so that looks like it ought to fix our issue, I'll give your changes a try. Thanks! I did it the way I did because it looked to me as though onClientDisconnect would only ever be called when 'local' should be true, so there was no need for the param -- can you explain if that reasoning was wrong? @guille Uh oh, looks like this broke something for you, hence the reverts? |
It might or might not have broken our tests :D |
Tests run clean for me. |
@brettkiefer onClientDisconnect is called in 3 places, only one of which is with the local flag. |
Issue is with Travis, not dshaw. ;-P |
@dshaw Okay, got it - so a Socket.prototype.disconnect and a Manager.prototype.handleClient happen in processes where the client isn't 'local'? |
Running with dshaw's patch applied to 0.8.7, it's certainly not leaking a 3 channels per closed connection (thanks!). It's still leaking channels, but very slowly, and I haven't yet figured out when or why. Also, just in case this rings any bells, it also LOOKS like we're seeing an issue in production where one of our processes will start creating socket.io pubsub channels so quickly that it will flood all of the space available for Redis in under a minute, so I've added logging to try to figure that out, too. It looks like it happens about once per 10000 open websockets per hour. |
If you have reproducible steps for the additional leaks, please post them. I'd like to squash that too. I've encountered an issue like that intermittently myself. Seems to be related to XHR transports getting in a weird state. Are you only using pure websockets? |
Thanks, and of course I'll post more info on the other issues as I get it. Yes, pure websockets only -- we use a homegrown and fairly efficient short xhr poll as the fallback for Trello because that approach ALWAYS works, and we had to turn Socket.io websockets off right after launch due to some difficult-to-diagnose, sudden, and catastrophic problems we were having with it under load. So we're trying to get all of our websocket-capable browsers back on websockets now. |
Okay, it looks like the issue with Redis consuming all of its available memory in a very short time (issuecomment 3010815, three comments up) is NOT due to a bunch of pubsub channels being opened -- measurement error on my part, my apologies. |
The issue with Redis going down the tubes and evicting all of its keys turned out to be redis/redis#91, see that case for more info. It LOOKED like it was a problem with socket.io because without socket.io running the offending process would die quickly (and was almost always the first to err on a SET), whereas with socket.io we try to close all of our websocket connections on process exit to work around issue #495, and the extra time doing that kept the Redis connection (and thus its output_list) around long enough to do more damage. |
…s not getting passed through onClientDisconnect().
…663) Bumps [engine.io](https://github.com/socketio/engine.io) from 4.1.2 to 6.2.1. - [Release notes](https://github.com/socketio/engine.io/releases) - [Changelog](https://github.com/socketio/engine.io/blob/main/CHANGELOG.md) - [Commits](socketio/engine.io@4.1.2...6.2.1) --- updated-dependencies: - dependency-name: engine.io dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Repro:
Socket.io 0.8.7
Run socket.io with the redis backing store.
run 'info' on your redis instance, noting the number of pubsub_channels
Create a websocket connection to socket.io
run 'info' again: you should now have 5 more pubsub_channels than before
Close the websocket connection
run 'info' again: you now have 3 more pubsub_channels than when you started
Repeat -- it leaks 3 channels each time.
Is this intended?
This line would cause it to leak 2 channels, because we're not unsubscribing in the non-local case https://github.com/LearnBoost/socket.io/blob/master/lib/manager.js#L516
But even if I remove the if (local) logic, I still leak one channel per opened and closed connection.
The text was updated successfully, but these errors were encountered: