-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Sync query taking upwards of 30 minutes #7772
Comments
For some reason I didn't get an email about your response... Before submitting this issue I had a look at both, and they didn't really help, I tried setting higher statistics for pretty much everything mentioned in the query, I tried to see if there is anything extraordinary in EXPLAIN, but it doesn't seem like it. matrix=# EXPLAIN WITH RECURSIVE state(state_group) AS ( VALUES(33031::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Unique (cost=74388.84..78867.42 rows=119429 width=88)
CTE state
-> Recursive Union (cost=0.00..523.68 rows=371 width=8)
-> Result (cost=0.00..0.01 rows=1 width=8)
-> Nested Loop (cost=0.42..51.63 rows=37 width=8)
-> WorkTable Scan on state s (cost=0.00..0.20 rows=10 width=8)
-> Index Scan using state_group_edges_idx on state_group_edges e (cost=0.42..5.10 rows=4 width=16)
Index Cond: (state_group = s.state_group)
-> Sort (cost=73865.15..75358.01 rows=597144 width=88)
Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
-> Nested Loop (cost=8.90..16575.99 rows=597144 width=88)
-> HashAggregate (cost=8.35..10.35 rows=200 width=8)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..7.42 rows=371 width=8)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.55..78.79 rows=404 width=88)
Index Cond: (state_group = state.state_group)
(16 rows) I had a look at statistics again, and it seems some of those queries NEVER finish, some of them have been hanging around in postgres for days at this point (basically since the last restart of postgres). |
That issue causes most, if not all of the sync requests from client api to fail, so the homeserver is very much unusable in its current form. |
my first question is how exactly you have established that that particular query is slow. if it's from postgres queries, please paste the results you see. |
I run:
which results in:
I restarted pg fairly recently, so they aren't as old as they get |
ok; i'm not sure the lock info is helping us much here, and it omits some useful information from the pg_stat_activity table. could you just share the output of:
Also, do you think it is limited to a small range of state groups (from the above, it looks like they are all in the range 33015 to 33040)? |
I have seen state groups like |
right, I think you might have a loop in the
this should return relatively quickly: if it hangs that confirms the theory, in which case can you also do:
.... I don't suppose you have a backup of your database from before you, uh, broke it? Restoring that would be pretty handy right now.... |
It does hang
0 rows apparently
I do have a backup of the database when I first had that issue at the very least, which might come in handy |
After doing a backup of the current broken database, and restoring the older snapshot, it doesn't make much difference, and it posts the exact same queries as before, so it seems they are equally as broken. At the very least it's not getting more broken |
ok, can you do the same query with
right, but we believe it got broken when you transferred it to a different server and dropped the |
That's instant
Nope |
yes, but what does it return |
Abbreviated it a bit since copying output from terminal where a lot of lines are the exact same is hard ;) |
ohhhh you've got duplicate rows.
? |
in fact: did you import the backup twice, or something? |
I kinda give up pasting this, because it has 32k lines
I hope not, I would probably remember, but I cannot tell you if I didn't do it weeks ago, because that I wouldn't remember |
Oh yeah, repeating the query for |
well, sad times. You need to remove the duplicates. Perhaps you can come up with sql yourself for that, or find someone who can help. I'm going to go ahead and close this, since it's very much a case of "I corrupted my database and now synapse runs slowly", which is somewhat out-of-scope for something we can fix... |
(honestly: if you've messed it up that badly, and don't have an uncorrupted backup ... you may be better off throwing it away and starting again. who knows what other tables are going to be even more corrupted.) |
This was really helpful anyway, I got synapse going again, and I will take your advice if anything goes south :P |
Queries similar to the following (sent to postgres as a result of
/_matrix/client/r0/sync
) take a very long time to complete.The server is tuned, and works fine otherwise. I tried using state compression tool, but it did not improve the situation. The database isn't that big either, taking up around 800MB according to
\l+
The text was updated successfully, but these errors were encountered: