-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IRR Explorer front end occasionally hangs #193
Comments
My first thoughts here are:
Logs may be a bit misleading - I don't know if we log a request upon receiving or upon answering it (which may not be reached). We can definitely add logging at request receive time though. |
I'll also add irrdnet/irrd#721 to this project so we can get Python tracebacks while the process is running. |
We're still experiencing this at least once or twice a month. Last occurence was today. CPU load graphs show 100% CPU utilisation for a long time before the process was killed: More statistics can be found here: https://irrexplorer.nlnog.net/munin/localhost/localhost/index.html A suspicion we have, is that although we see 'breaking' in the output, the hanging process isn't aborted. @mxsasha: do you have any additional steps on how to investigate this better next time it occurs? |
Last entries before the latest crash:
|
No clear answer yet, but ct does seem to be somewhere in the set resolving. #220 adds a few extra debug lines, and also responding to SIGUSR1 by printing a traceback. So if you do that on a hanging process, it should tell us where the process is hanging. And the new lines logged may have answers about why. |
|
It seems irrd had crashed:
|
Discussion: this is likely an issue that IRRexplorer does not handle IRRD query timeouts well and/or if IRRD closes the socket. IRRexplorer ends up looping forever on CPU. |
Once in a while the IRR Explorer front end hangs, the only way to fix this is to kill the process and restart it. We need more information on what causes this (logs,stacktraces, memory/cpu usage, etc). We can collect the data in this issue, so we can find a cause (and solution).
The text was updated successfully, but these errors were encountered: