-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osrm-routed connection accept loop can exit and not recover #6040
Comments
As I complained in the other ticket about rare crashes, this problem description could match at lest some of them. I sometimes could attach a debugger to get a stack trace. One type was definitely all threads exited except one, which would fit this issue, but I didn't save those traces. But others still had threads, though the stack trace was not very enlightening. Not sure if those crashes are related to this problem, but I thought I'll post the stack traces here anyway:
|
I've observed this on both Mac and Linux builds now. |
As discussed in #6033
osrm-routed
does not immediately clean up a keep-alive connection when the client closes it. Instead it waits for five seconds of inactivity before removing.If you have a client that opens and closes a lot of keep-alive connections, it's possible for osrm-routed to run out of file descriptors whilst it waits for the clean-up to trigger.
The key point here is if this does happen, the connection acceptor loop exits. Even after the old connections are cleaned up, new ones will not be created. Any new requests will block until the server is restarted. See: https://github.com/Project-OSRM/osrm-backend/blob/master/include/server/server.hpp#L96
It's possible that there are other errors which will also generate this behaviour.
Ideas for improving the error handling:
osrm-routed
exits cleanly.The text was updated successfully, but these errors were encountered: