Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential client/server state mismatch bugs #7843

Closed
lampholder opened this issue Dec 12, 2018 · 9 comments
Closed

Potential client/server state mismatch bugs #7843

lampholder opened this issue Dec 12, 2018 · 9 comments
Labels
P1 S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Z-Cache-Confusion Related to internal cache (clearing helps / causes the issue)

Comments

@lampholder
Copy link
Member

My spider sense is tingling about these bugs:

#7526 (comment) - The github issue is resolved but I don't think we've addressed this comment in particular
#7745 - Some matrix.org users can't join #matrix:matrix.org - CORS request rejected
#7775 - Desktop app is hiding one of my rooms from me!
#7800 - Sending a message into a room fails with CORS rejected while doing a /members query
#7790 - Sometimes legit invitees cannot write in a room

Maybe:
#7352 - Joining a room you've previously left (in the same session) shows an infinite spinner

They all smell like a client/server state mismatch not being recovered from gracefully.

@lampholder lampholder added T-Defect P1 S-Major Severely degrades major functionality or product features, with no satisfactory workaround labels Dec 12, 2018
@lampholder
Copy link
Member Author

#7838 too

We need to spend some time on this.

@lampholder
Copy link
Member Author

Okay... this seems sorta related, but is undermining its being a client/server state mismatch thing:
#7853

@richvdh
Copy link
Member

richvdh commented Jan 8, 2019

what do you mean by "client/server state mismatch", and what makes you think that these issues are related to it?

@jryans
Copy link
Collaborator

jryans commented Feb 21, 2019

Missing room name, fixed by clear cache seems like another example of this, I think.

@turt2live
Copy link
Member

I dug into a bunch of logs to try and figure out where #8136 might be happening and have had very little success.

The information is not revealing in that there's no definitive answer here. Given the data set however, the issue does appear to be more likely if you encounter database problems (corruption, full, etc) or if you get gappy syncs. This may just be confirmation bias in that the logs show these problems consistently, but cannot be proven to be the issue as of yet.

I'd generally encourage people to submit more rageshakes for more data points.


To expand on other data points for this issue's related issues, room complexity in terms of state and auth chain events does not appear to affect the probability of clashes happening. Given clients are apparently running into database problems and possible gappy syncs, I'm inclined to believe that these issues happen more often but are only noticed on high profile rooms. This is based on some of the reports happening on relatively tiny rooms (20-30 people, nothing particularly interesting in the room state) as well as massive rooms (HQ, #synapse, etc). We are probably still suffering state resets causing client state to get purged, and perhaps that is what is causing some of the "no issues found" reports above, but I do believe that ~50% of the problem is our fault as a client.

@turt2live
Copy link
Member

ftr I spent a couple hours going through other rageshakes to hunt down ones that might not be associated with the set of issues here. Found nothing of real interest, but did find a bunch of trends.

@turt2live
Copy link
Member

https://github.com/matrix-org/riot-web-rageshakes/issues/1328 has sync timeouts and other sync related errors on matrix.org - this might be more evidence that gappy syncs are indeed the problem.

@jryans
Copy link
Collaborator

jryans commented May 20, 2019

#9756 seems like another example.

@jryans
Copy link
Collaborator

jryans commented Mar 9, 2021

I am not convinced this sprawling meta issue has value at the moment... It's unlikely we would tackle this all together or that they would have a single solution. I have tagged the related open issues with a new Z-Cache-Confusion, so it's easy to find them.

For this meta issue, I think I'll go ahead and close it for now.

@jryans jryans closed this as completed Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Z-Cache-Confusion Related to internal cache (clearing helps / causes the issue)
Projects
None yet
Development

No branches or pull requests

4 participants