-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statusengine service stuck after encountering "Mysql server has gone away" #18
Comments
The basic concept is:
I think the worker should also retry to submit the data on What data was affected on your system?
|
The data was not affected. I can see the entries of the service checks and everything from the duration when the SQL connection had gone away. My guess is gearman held onto the information and dumped it once the DB became available. Also, the downtimes got scheduled as soon as the service restarted which were triggered when the DB connection was unavailable. I'm not much of a DBA so I'm gonna limit my comments on the things that I see on my setup for Azure Mysql. Here is a blog that talks about the errors. May be it would be worth checking the errors related to the intermittent disconnection section in the link and apply logic for those things? As for the last part, the service restarts on failure but in my case it didnt fail. It noticed and error and kept running. May be a trigger can be initiated to restart the service on "MySQL has gone away" error. What harm could it bring? |
For me this sounds like that a child process (Statusengine Worker use Have you checked journalctl for any uncaught exceptions? I read the blog. From my experience I pushed a few changes to a new branch mysql-connection to catch Timeout and Deadlock errors. Hopefully this helps you to resolve the issue. Could you please test this in your environment? I forced some I also killed my MySQL server but this was also recovered by the Worker without any issues. I would recommend to monitor the amount of processes to see if any child process die and don't get restarted like so:
I addition you should monitor the gearman queue, to see if a queue is without a worker. You can modify this script https://gist.github.com/nook24/71e5752130d179231fb506af5eacd19b for example. |
Hi @nook24
This morning I encountered an issue when I was trying to put some servers in maintenance. The job was submitted but the maintenance never started. Upon investigation further, I found that the statusengine service reported the error couple of hours back.
It was when I restarted the statusengine service that it regained the connection back. Below are the logs from restart (just FYI)
Now I do see a similar entry which seems to have resolved on its own yesterday. Logs are below:
My questions:
My MySQL is actually an Azure DB within the same Resource group and network and everything. I'll try to find out why these connections are failing (feel free if you have any ideas around this but this is ofcourse out of your scope to fix). Ref art: https://blogs.msdn.microsoft.com/azuresqldbsupport/2018/11/20/azure-database-for-mysql-server-has-gone-away/
The text was updated successfully, but these errors were encountered: