-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Retry to sync out of sync device lists #7453
Conversation
Otherwise we're going to be logging `Failed to handle device list update for @user:example.com` every 30s for every remote we're not retrying because of backoff.
Just tried this PR on my HS, and at least half of it seems to work well, in that inserting rows manually with
seems to result in the users' device lists being resynced automatically, and having their shield turned back to green in Riot. Another point is that I'll need to see if this PR doesn't clash with #6786, it shouldn't be the case but maybe there's be something that needs to be done to smooth them together and make the whole thing more maintainable. |
Alright, the other half of this (the bit that does the insertion if resync fails) seems to also be working, given I'm now seeing |
So I think this PR should be ready for review. I don't believe there's anything that needs to be done wrt #6786. Also, I'm not sure if it's good enough that, if a remote is unreachable, retry it every 30s and fail silently on a Another question/issue I have is that it would be neat to cache |
Known shortcomings of this PR:
|
@babolivier please could you fix #7498 while you're here |
Sure |
So I'm not so sure about
being a shortcoming of this PR anymore. This is only needed if the remote server is completely unreachable, since Synapse will always 200 device list updates, even if it failed to process them. AFAICT, when responding to a request, we can't really be 100% sure response will be reached and fully processed by the remote server (especially if that server or its connection is wobbly). In that case, I'd rather have the sending server wrongly think it has failed to send the update and resend it with the next transaction (so the remote server can simply ignore it if it's already received it) than have it wrongly think it's succeeded and never send it again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good if CI passes!
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it. matrix-org#6776 introduced some code infrastructure to mark device lists as stale/out of sync. This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now. It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`. Fixes matrix-org#7418
When a call to
user_device_resync
fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it.#6776 introduced some code infrastructure to mark device lists as stale/out of sync.
This PR uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now.
It also adds a looping call to retry all failed resync every 30s. I'm not entirely sure about that retry logic right now so would like some opinions on it. It shouldn't cause too much spam in the logs as I removed the "Failed to handle device list update for..." warning logs when catching
NotRetryingDestination
.Fixes #7418
TODO: Write some tests