Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

We may fail to send some m.device_list_update messages when there are multiple updates in a batch #5153

Closed
richvdh opened this issue May 8, 2019 · 3 comments
Assignees
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)

Comments

@richvdh
Copy link
Member

richvdh commented May 8, 2019

https://github.com/matrix-org/synapse/blob/de655e66/synapse/storage/devices.py#L100 is the code which looks for device list updates to send out.

In brief, the logic is:

  • get up to 20 rows from the update table for the current destination
  • if we hit the limit, note the maximum stream_id so that we can start from there next time
  • for each device, get the e2e keys, and add it to the list.

On the next pass, we will return only updates which have a stream_id greater than the maximum of the previous pass.

There are two problems with this, both of which mean that we will drop update messages:

  • The first is trivially solved: the query lacks an ORDER BY clause, so there is no guarantee that we will get the oldest updates. Suppose there are 50 updates waiting to go to a particular destination, with stream IDs N to N+49. We will get an arbitrary 20 rows - for example N+25 to N+44. That would mean that the updates with stream IDs N to N+24 will be lost.

  • The second is maybe a bit harder to solve, but relates to the fact that there may be several updates (for different devices) with the same stream_id. Suppose we've solved the previous problem, and we now have, in the table:

    • 15 updates, for one device each, with stream ids N to N+14
    • At stream id N+15, a single update for 6 devices.
      So then our query will return the first 15 updates, and 5 of the 6 updates at N+15. Again, the final update gets lost.
@richvdh
Copy link
Member Author

richvdh commented May 8, 2019

One solution to the latter might be to increase the limit to 21 rows[*], with the intention of discarding the 21st row. We then check that the 20th row has a different stream_id to the 21st row, to confirm that we have seen all the updates at that stream_id. If not, we discard the 20th row, and repeat. If you end up with an empty list, it means that there are more than 20 updates for a given stream_id, and you have lost.

*: of course it doesn't need to be 21: it works with any N+1 where N is the number of rows we actually want to send. I think there is a strong argument for increasing N here.

@anoadragon453
Copy link
Member

and you have lost.

In this case we just skip past this stream_id?

@ara4n ara4n changed the title We may fail to send some m.device_list_update messages when there are multiple updates in a a batch We may fail to send some m.device_list_update messages when there are multiple updates in a batch May 9, 2019
@neilisfragile neilisfragile added z-p2 (Deprecated Label) z-bug (Deprecated Label) labels May 13, 2019
@richvdh
Copy link
Member Author

richvdh commented Jun 6, 2019

hopefully fixed by #5156

@richvdh richvdh closed this as completed Jun 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

3 participants