-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix resync remote devices on receive PDU in worker mode. #7815
Fix resync remote devices on receive PDU in worker mode. #7815
Conversation
The replication client requires that arguments are given as keyword arguments, which was not done in this case. We also pull out the logic so that we can catch and handle any exceptions raised, rather than leaving them unhandled.
""" | ||
|
||
try: | ||
await self.store.mark_remote_user_device_cache_as_stale(sender) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this function will cause the next iteration of the resync retry loop (introduced in #7453) to retry this device list, which means that we can have 2 resync happen at the same time for a given user. This may be fine, but at least we should be aware of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting. I think its still correct to go and fire off a resync, as a) we don't want to wait for the loop to come round necessarily, b) its safe and c) I don't really want to assume that the loop will come round quickly or even retry this particular user (it might have backoff logic etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it makes sense, again I think it's fine as long as we're aware of it :)
. o O ( we could avoid this issue by storing and checking an in-memory list of the users we're currently resyncing (i.e. update it in ignore me, that still wouldn't work, and it's not that big a deal anyway.user_device_resync
) but that's probably out of scope here )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you give us some clues about the symptoms of this bug? was it throwing exceptions?
synapse/handlers/federation.py
Outdated
@@ -784,15 +784,23 @@ async def _process_received_pdu( | |||
resync = True | |||
|
|||
if resync: | |||
await self.store.mark_remote_user_device_cache_as_stale(event.sender) | |||
run_in_background(self._resync_device, event.sender) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while you're here, can you fix this to be run_as_background_process
please.
synapse/handlers/federation.py
Outdated
return run_in_background( | ||
self._device_list_updater.user_device_resync, event.sender | ||
) | ||
async def _resync_device(self, sender: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def _resync_device(self, sender: str): | |
async def _resync_device(self, sender: str) -> None: |
This produces the glorious stack trace of the following, I don't think it had any other observable behaviour.
|
…x_worker_fderation_device_resync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* commit 'f1245dc3c': Fix resync remote devices on receive PDU in worker mode. (#7815)
The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.