-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Synapse does not recover correctly after a database server outage #11167
Comments
I think that if the event-fetch job fails with an exception (as it did), |
also: we should export |
Event fetches explain most of the stuck requests. There's one class of stuck request that I can't explain yet: |
In testing, |
yeah, very odd. I'm also at a loss to explain it. |
Fixed by #11240, except for the |
Let's close this for now, then. |
I think the problem here is that we increment Apart from the fact that integer increment/decrement operations aren't atomic in Python (so decrementing it without holding the lock is racy), we also have the problem that if we're unable to get a connection to the database (eg, because it is shutting down...), |
Earlier today our database server restarted. The database recovered itself; Synapse did not. In particular we saw "in flight requests" stacking up:
... and the reverse-proxy returned 429s for many requests.
The text was updated successfully, but these errors were encountered: