-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate 0.8.2 node hitting max db connections #945
Comments
I'm hoping that this is anomalous since it was only one node and only one incident. Still, there was some discussion off-channel that perhaps granting a timeout of 10m might be too generous, and that we might want to find a spanner connection timeout that is shorter, but does not trigger the stream of 502s we had prior. |
Documenting more detail about these incidents here |
*Ops:* Adds `SYNC_DATABASE_POOL_CONNECTION_DEADMAN_SWITCH` which is the number of milliseconds the pool can report being at 0 available connections before the application triggers a `panic!`. The current default is `0` meaning the deadman switch is inactive. Issue #945
This will add several fields to `__lbheartbeat__` if `database_pool_max_size` is specified. These include: ```json { "active_connections": ... /* Number of active connections */, "idle_connections": ... /* number of idle connections */, "duration": ... /* how long no idle connections have been availble */, } ``` Note that "duration" will only be present if `idle_connections` has been zero since the last time a check was performed. * this also adds `database_pool_max_size` as a config option. Issue: #945
* feat: Add pool connection info to __lbheartbeat__ for ops This will add several fields to `__lbheartbeat__` if `database_pool_max_size` is specified. These include: ```json { "active_connections": ... /* Number of active connections */, "idle_connections": ... /* number of idle connections */, "duration": ... /* how long no idle connections have been availble */, } ``` Note that "duration" will only be present if `idle_connections` has been zero since the last time a check was performed. Issue: #945 Co-authored-by: Philip Jenvey <[email protected]>
We haven't hit this in prod since #985 rolled out - closing. |
Following the 0.8.2 deploy yesterday a node became "stuck" with the max (30) number of db connections.
The node continued serving requests until ops manually killed it, but its response times were incredibly slow (e.g. mostly in double digits seconds). Its affect on the overall 95th percentile:
(and it triggered the nginix GET dashboard alarm).
The text was updated successfully, but these errors were encountered: