-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Reproduceable UISIs during federation failures. #5441
Comments
This might be element-hq/element-web#3754? |
I've seen this every time matrix.org has gone down recently (first for the upcloud outage on hestia, and then for today's cloudflare outage) |
Did you see this during Friday's outage as well? |
erm, can't remember. i wasn't looking for it, given i was mid-pitch, and the outage was 'only' an hour. |
This seems to be a device list cache problem. It stems from the fact that explicit queries for a remote's device lists do not update Synapse's device_list cache.
The failure case is as follows:
tldr / Steps to reproduce reliably:
Sidenote:I've had another small weird behavioral issue here, when you start the shutdown homeserver again no new messages come in on the chat until someone else sends a message. Is this expected behavior? Has anyone else noticed it? Where should I mention this? Solutions:We probably just want to start caching results from the federation |
[edited #5441 (comment) lightly for formatting and typo's. Hope I haven't broken it!] |
@JorikSchellekens just wanted to give a big 👍 for tracking that issue down! A couple of quick thoughts:
|
Thanks @richvdh
|
are we assuming this is fixed by #5693? |
let's assume it was fixed by #5693. |
I've been memory profiling arasphere.net today, and so have been taking the HS down for a few hours at a time. Afterwards, I reliably have UISIs for E2E messages transmitted in rooms whilst the server was offline. In one instance (matrix.org->arasphere.net) they recovered 5-10 mins after the server recovered. In the others (msgs from vector.modular.im and t2l.io) the UISIs never recovered. This feels very reproduceable.
The devices on arasphere.net have not changed, so this is not the same as #5095.
I'm giving it its own bug rather than losing it in element-hq/element-web/issues/2996
The text was updated successfully, but these errors were encountered: