Fix resync remote devices on receive PDU in worker mode. #7815

erikjohnston · 2020-07-10T09:26:49Z

The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.

The replication client requires that arguments are given as keyword arguments, which was not done in this case. We also pull out the logic so that we can catch and handle any exceptions raised, rather than leaving them unhandled.

babolivier · 2020-07-10T09:36:37Z

synapse/handlers/federation.py

+        """
+
+        try:
+            await self.store.mark_remote_user_device_cache_as_stale(sender)


FWIW this function will cause the next iteration of the resync retry loop (introduced in #7453) to retry this device list, which means that we can have 2 resync happen at the same time for a given user. This may be fine, but at least we should be aware of it.

Oh, interesting. I think its still correct to go and fire off a resync, as a) we don't want to wait for the loop to come round necessarily, b) its safe and c) I don't really want to assume that the loop will come round quickly or even retry this particular user (it might have backoff logic etc)

Yeah it makes sense, again I think it's fine as long as we're aware of it :)
~~. o O ( we could avoid this issue by storing and checking an in-memory list of the users we're currently resyncing (i.e. update it in user_device_resync) but that's probably out of scope here )~~ ignore me, that still wouldn't work, and it's not that big a deal anyway.

richvdh

could you give us some clues about the symptoms of this bug? was it throwing exceptions?

richvdh · 2020-07-10T15:01:25Z

synapse/handlers/federation.py

@@ -784,15 +784,23 @@ async def _process_received_pdu(
                    resync = True

            if resync:
-                await self.store.mark_remote_user_device_cache_as_stale(event.sender)
+                run_in_background(self._resync_device, event.sender)


while you're here, can you fix this to be run_as_background_process please.

richvdh · 2020-07-10T15:02:07Z

synapse/handlers/federation.py

-                    return run_in_background(
-                        self._device_list_updater.user_device_resync, event.sender
-                    )
+    async def _resync_device(self, sender: str):


Suggested change

async def _resync_device(self, sender: str):

async def _resync_device(self, sender: str) -> None:

erikjohnston · 2020-07-10T15:39:31Z

This produces the glorious stack trace of the following, I don't think it had any other observable behaviour.

2020-07-10 13:31:17,205 - twisted - 192 - CRITICAL -  - Unhandled error in Deferred:
2020-07-10 13:31:17,215 - twisted - 192 - CRITICAL -  - 
Capture point (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/synapse/src/synapse/app/federation_reader.py", line 24, in <module>
    start(sys.argv[1:])
  File "/home/synapse/src/synapse/app/generic_worker.py", line 1046, in start
    _base.start_worker_reactor("synapse-generic-worker", config)
  File "/home/synapse/src/synapse/app/_base.py", line 80, in start_worker_reactor
    run_command=run_command,
  File "/home/synapse/src/synapse/app/_base.py", line 140, in start_reactor
    daemon.start()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/daemonize.py", line 248, in start
    self.action(*privileged_action_result)
  File "/home/synapse/src/synapse/app/_base.py", line 117, in run
    run_command()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 1283, in run
    self.mainLoop()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 1292, in mainLoop
    self.runUntilCurrent()
  File "/home/synapse/src/synapse/metrics/__init__.py", line 517, in f
    ret = func(*args, **kwargs)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 886, in runUntilCurrent
    f(*a, **kw)
...
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/synapse/src/synapse/util/async_helpers.py", line 157, in _concurrently_execute_inner
    await maybe_awaitable(func(next(it)))
  File "/home/synapse/src/synapse/federation/federation_server.py", line 281, in process_pdus_for_room
    await self._handle_received_pdu(origin, pdu)
  File "/home/synapse/src/synapse/federation/federation_server.py", line 660, in _handle_received_pdu
    await self.handler.on_receive_pdu(origin, pdu, sent_to_us_directly=True)
  File "/home/synapse/src/synapse/handlers/federation.py", line 416, in on_receive_pdu
    await self._process_received_pdu(origin, pdu, state=state)
  File "/home/synapse/src/synapse/handlers/federation.py", line 791, in _process_received_pdu
    return run_in_background(self._user_device_resync, event.sender)
  File "/home/synapse/src/synapse/logging/context.py", line 695, in run_in_background
    res = f(*args, **kwargs)
  File "/home/synapse/src/synapse/logging/opentracing.py", line 745, in _trace_inner
    result = func(*args, **kwargs)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/synapse/src/synapse/replication/http/_base.py", line 159, in send_request
    "Instance %r not in 'instance_map' config" % (instance_name,)
Exception: Instance '@witchent:zapdos.my-router.de' not in 'instance_map' config

…x_worker_fderation_device_resync

richvdh

lgtm

* commit 'f1245dc3c': Fix resync remote devices on receive PDU in worker mode. (#7815)

erikjohnston added 2 commits July 10, 2020 10:25

Fix resync remote devices on receive PDU in worker mode.

1e474fb

The replication client requires that arguments are given as keyword arguments, which was not done in this case. We also pull out the logic so that we can catch and handle any exceptions raised, rather than leaving them unhandled.

Newsfile

b6cb00f

erikjohnston requested a review from a team July 10, 2020 09:28

babolivier reviewed Jul 10, 2020

View reviewed changes

erikjohnston requested a review from a team July 10, 2020 10:04

richvdh reviewed Jul 10, 2020

View reviewed changes

erikjohnston added 3 commits July 10, 2020 16:41

Merge branch 'develop' of github.com:matrix-org/synapse into erikj/fi…

71a2509

…x_worker_fderation_device_resync

Review comments

67e0f74

isor

67b06ea

erikjohnston requested a review from richvdh July 10, 2020 16:32

richvdh approved these changes Jul 10, 2020

View reviewed changes

erikjohnston merged commit f1245dc into develop Jul 10, 2020

erikjohnston deleted the erikj/fix_worker_fderation_device_resync branch July 10, 2020 17:23

babolivier pushed a commit that referenced this pull request Sep 1, 2021

Merge commit 'f1245dc3c' into anoa/dinsic_release_1_18_x

169bbda

* commit 'f1245dc3c': Fix resync remote devices on receive PDU in worker mode. (#7815)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix resync remote devices on receive PDU in worker mode. #7815

Fix resync remote devices on receive PDU in worker mode. #7815

erikjohnston commented Jul 10, 2020

babolivier Jul 10, 2020

erikjohnston Jul 10, 2020

babolivier Jul 10, 2020 •

edited

Loading

richvdh left a comment

richvdh Jul 10, 2020

richvdh Jul 10, 2020

erikjohnston commented Jul 10, 2020 •

edited

Loading

richvdh left a comment

	async def _resync_device(self, sender: str):
	async def _resync_device(self, sender: str) -> None:

Fix resync remote devices on receive PDU in worker mode. #7815

Fix resync remote devices on receive PDU in worker mode. #7815

Conversation

erikjohnston commented Jul 10, 2020

babolivier Jul 10, 2020

Choose a reason for hiding this comment

erikjohnston Jul 10, 2020

Choose a reason for hiding this comment

babolivier Jul 10, 2020 • edited Loading

Choose a reason for hiding this comment

richvdh left a comment

Choose a reason for hiding this comment

richvdh Jul 10, 2020

Choose a reason for hiding this comment

richvdh Jul 10, 2020

Choose a reason for hiding this comment

erikjohnston commented Jul 10, 2020 • edited Loading

richvdh left a comment

Choose a reason for hiding this comment

babolivier Jul 10, 2020 •

edited

Loading

erikjohnston commented Jul 10, 2020 •

edited

Loading