Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Device lists can get badly out of sync during federation outages, breaking E2EE #5095

Closed
ara4n opened this issue Apr 25, 2019 · 6 comments
Closed
Assignees
Labels
z-bug (Deprecated Label)

Comments

@ara4n
Copy link
Member

ara4n commented Apr 25, 2019

After the matrix.org breach, there were a lot of UISIs for matrix.org<->elsewhere E2E rooms. Circumstantially we're assuming that this servers ended up with an inconsistent view of the devices present in a room.

Possible causes are:

  • If an m.device_list_update EDU gets lost, servers will be stuck with the stale list until the next m.device_list_update from that server, which happens infrequently.
  • However, EDUs should not get lost; we should retry m.device_list_updates until we get a 200, so it seems there is a bug here.
  • If we see an unrecognised device from a server, should we not re-sync our view of that server's device list anyway?
  • In theory we re-sync our view of the device list when we see an m.device_list_update which refers to an unknown previous m.device_list_update. But this might not be working?

(See element-hq/element-web#2996 for the main UISI bug)

@ara4n ara4n changed the title Device lists can get badly out of sync during federation outages Device lists can get badly out of sync during federation outages, breaking E2EE Apr 25, 2019
@neilisfragile neilisfragile added z-bug (Deprecated Label) p1 labels Apr 26, 2019
@richvdh
Copy link
Member

richvdh commented May 1, 2019

For the record: #4877 was a significant cause of this (people were missing the unique index on device_lists_remote_cache which meant that they ended up ignoring device list updates), but it sounds like other things may have been going on too.

@richvdh
Copy link
Member

richvdh commented May 8, 2019

#5153 looks like another culprit here.

@cyphar

This comment has been minimized.

@ara4n
Copy link
Member Author

ara4n commented Jun 12, 2019

(have filed #5441 to track an issue i mistook for this one)

@cyphar
Copy link

cyphar commented Jun 13, 2019

I finally figured out the cause of the above comment and have filed a separate issue for it in #5433.

If we see an unrecognised device from a server, should we not re-sync our view of that server's device list anyway?

Seems to be the most obvious solution, and would fix #5433 too.

@richvdh
Copy link
Member

richvdh commented Jun 20, 2019

with #5156 done, and #5320 fixing the device_lists_remote_cache index, I think we've fixed most of the known causes of this issue, so I am closing it for now.

@richvdh richvdh closed this as completed Jun 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

5 participants