-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Websocket server stop responding after some hours. #328
Comments
When i kill the chat server process manually by using the "kill" command then the supervisor start the chat server immediately and my chat working fine. |
Put the following code in your script and restart it: error_reporting(E_ALL);
ini_set('display_errors', 1); Please paste the output of the following command here:
Once the error happens again don't kill the script. Check and paste the latest output of the supervisor stdout and error logs in this issue. Once the error has happens ssh directly into the server and run the
|
I'm going through the same problem. and here is what was shown on "composer show -i" command cboden/ratchet v0.3.3 PHP WebSocket library I placed these statements at the starting of my server file, but didn't find any. error_reporting(E_ALL); please help me. |
@rohitkhatri What happens when you do the telnet thing after the server has stopped responding? |
Closing due to extended period of inactivity. I'll re-open if reporting continues. |
Slightly late to the party but I've been looking into Ratchet again and I wanted to check to see if some of the issues that I encountered the last time I tried to use it (two years ago) have been resolved. This particular issue was a showstopper when it came to using Ratchet for a service that ran continuously and I'm seeing a fair number of reports here with similar symptoms. Ratchet would work fine for a while before becoming unresponsive to new connections, maxing out the CPU and needing a restart. Initially, it'd run happily for several weeks but that ended dropping to needing manual intervention every couple of days. The problem turned out to be file descriptor leakage and as soon as it hit the maximum (1024 by default, I believe), restarting the process would be required. As the site's traffic grew, it'd end up hitting the limit sooner. I didn't have the time or interest to fix the root problem for a hobby project so I can't offer up a patch but perhaps being aware that there was/is a leak can help get the ball rolling. Edit: I can see you've suggested raising the limit in another issue (#349 (comment)), so I guess you're already aware of the problem. I don't consider increasing the limit to be a robust solution, so is there something that's preventing this from being fixed or a way to ensure the user doesn't run into this problem? |
pradeepta20 .. got any solution for that... |
Initially i have done the database coding using PDO.Later i change the code to mysqli its working fine for me. |
Use below technique to connect automatically when Websocket server restarded. var wsUri; |
I am having a similar issue. The web browser doesn't connect (but it doesnt fail either) and hen I check the websocket logs, everything looks fine and the new connection isn't started. My sockets are basic without a database. |
@Chaosvex The file descriptor limit issue is an operating system level feature...you will run into this problem with any language/environment (not just Ratchet/PHP). If you're hosting a WebSocket server 1024 is WAY too low if you expect any traffic at all. @MarcGodard Please follow the directions from this comment above and report the output so we can help troubleshoot. |
@cboden Yes, I know but that has nothing to do with the leaks within Ratchet. |
@cboden I figured out the issue minutes after posting that. It was my fault. Thanks for the quick responce. |
@MarcGodard could you share your finding if its of general interest? |
I had an error in my code, that's all. |
The same problem as at @pradeepta20, in several hours sockets cease to answer. In this discussion I haven't found the solution. Whether there is a solution in general, except as restart of sockets constantly? |
I'm using AWS EC2 server for my application. My chat has stopped 2 times till now automatically but when I go to terminal and use the pgrep command to find the websocket process, its showing me processes, its mean that socket is connected to server. but still chat was not working. Then i kill all two process mnually and my cron job which is set for every 1 minute to start socket its running again and my chat get live again. I faced this issue 2 times. I don't know how things are going on in the background. I didn't find any logs, I have maintained all the logs that user is attached, user msg to this user, blah blah. but there is no error log in my logs. Further more if mobile user just turned off his/her wifi or mobile data my chat is not firing onClose method. So the thing is that how to mark a user offline in this scenario if someone met this criteria. Which is the way to mark user offline if onClose method not fired. Thanks! |
Considering that neither developer has responded to a showstopper of a bug in several years, you'd be best dropping this project and looking for an alternative. |
We fixed several memory leaks in ReactPHP over the last few years since your last comment. We implemented new event loops and done a ton of performance upgrades. The thing is you need to account for the following when deploying:
we simple cannot debug very deeply into your problem without information like that. But what also matters is what you do when a connection has been establisht, what kind of resources you use and how you connect to databases/storage etc etc. And last I really want to stress that we cannot change the 1024 thing for you on your deployment. You have to do that yourself. It's baked in hardcoded into |
This was never about memory leaks or even a concurrent handle/descriptor limit, it was about the descriptors leaking so a popular Ratchet-based site would have to be frequently killed off and restarted to prevent it from live-locking and potentially bringing an entire system to a crawl. Concurrent handles was irrelevant. |
File descriptors don't leak, and according to your own description hitting the 1024 limit was/is the root cause of the issue you where having. They can leak over from a parent process into a child process but that's a totally different issue. (And we added tooling against that in The thing is we have several users doing more then 50K+ connections in a single process with Ratchet without a hitch. And I don't want to suggest the problem is within your code, but implementing a long running service in PHP is a delicate dance between us the maintainers of Ratchet/ReactPHP and the implementer's of our libraries. This is also on a lower level PHP then your normal request-response cycle. I gladly sit down for a hangouts call with anyone to resolve their problems, but it's going to get technical and diving into tracing, logging and monitoring really fast. And to be really honest I can't do much with "It's locking up please fix". I want to dive into that and preferably try and find a way to reproduce it so we can fix it up stream. Personally I monitor the memory usage (amongst things) fanatically to stay ahead of issues like this. |
Yes, hitting 1024 total over the lifetime of the process, not 1024 concurrent. When those connections are no longer active and there's no longer a handle available, that's when it becomes fair to refer to it as a leak. |
Just tested this with StreamSelectLoop, no problems. I suspect you're holding a reference somewhere, holding open 1024 connections, causing your problems.
<?php
require __DIR__ . '/vendor/autoload.php';
$app = new Ratchet\App('localhost', 8080);
$app->route('/echo', new Ratchet\Server\EchoServer, array('*'));
$app->run(); // In a browser
setInterval(() => {
let c = new WebSocket('ws://localhost:8080/echo');
setTimeout(() => c.close(), 250);
}, 500); ^ Server still responding with 101 HTTP status code after 1024 connections. |
Given example implements variant with closed connections
|
@inri13666 What's the error you're getting? The reason I closed connections was because the browser I was testing with didn't allow that many concurrent connections to a single server as a security measure. |
I do not have any errors in the log, the socket not responding and only socket command restart may help :( |
I found the cause for my issue, |
Totally I found the final solution To solve the issue described here install one of the following PHP extensions
P/S |
Hi I am sorry for this late comment. I was having a similar issue that server seemingly stopped responding after some time. Tried many solutions but at the end turned out that MySQL was taking timeout after 8 hours of inactivity and disconnecting. So I put a root cron job |
I know this issue is quite old and has been closed for a while but it best covers what I have been running into. Every once a while, my websocket message interface just stops working/responding. I restart the server and things immediately work again, with clients immediately reconnecting. Sometimes it breaks again within 10 or so minutes and sometimes it works for hours or days without any issue. I have a cronjob that restarts this server every 3 hours, so my question is: Is there determinism on which PECL extension (ev, event) is used whenever the websocket loop is defined? Is there a chance that sometimes I'm getting a loop that uses one interface and sometimes I get the other if I have both PECL extensions installed? I do not see any memory or CPU issues on the server, but whenever I see a series of broken pipe issues:
I restart and either don't see these errors again or do, in which case I restart again until they subside. My theory is that sometimes the code that creates the loop uses the ev library and other times it uses libevent, and one of the two is not working well and the other one is fine. I am not quite sure how to test this issue, since the only indicator I have is the message interface having a series of onError callbacks all at once, with the Broken Pipe error I pasted above. |
I had an experience which may help somebody. |
Well, folks, I dunno quite well whether my issue has something to do with that one in specific... However, this is what I've experienced for practically one month or so: I have a websocket service running via supervisor on port 8080, and it works perfectly with my code implementation (PHP/Javascript/Codeigniter/Ratchet lib from socket.io)... Yes, it works well, but just until PAST 10 OR 12 HOURS (GENERALLY THE FOLLOWING DAY). When I'm gonna check the websocket service itself, okay, it's there... Running on port 8080, okay... The issue seems to be on client side (at least apparently, because I also suspect that there may be some hinder on server side). Checking my console (Javascript logs), and I see that:
|
in 2023 I am also facing same issue. Nothing in logs, no errors but after few hours the messages are not sent. |
Hello. I have used the supervisor concept to check and restart my chat server if it is stopped.
After some time interval my chat not working.When i check the console i didn't find any websocket connection fail error.
Then i use below command to check my server is running or not and see the server still running.
ps aux | grep php
So i am confused where is the issue or bug for which my chat application not working.
The text was updated successfully, but these errors were encountered: