-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Forward extremity event missing from database, leading to No state group for unknown or outlier event
#12507
Comments
No state group for unknown or outlier event
Possibly, but there's not really enough info in those logs to help. @esackbauer: is that the complete log? there is nothing logged at INFO? |
[edited by @richvdh to fix formatting] There is actually nothing more. the room which is not working is
|
[please wrap your logs in triple-backticks (```) for legibility] well, that's frustrating. Could you maybe enable DEBUG logging (change the |
Ok, sorry. The debug output is rather large, I did a clean start of synapse and logged everything after I tried to post something in that room. Because its so large, I attached the homeserver.log |
Right, well, that sheds a bit of light on the situation, but not much. I'd like to ask you to run some queries on your database - could you DM me at |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
It seems that a particular event ( Attempts to delete the room are failing, because the delete operation tries to generate leave events for each of the members of the room - which requires the forward extremities to exist. The event id suggests that the event was created in November 2021, so this problem has probably been waiting undetected since then, and it only became a real problem when Synapse 1.57 added the extra validation. I'm at a bit of a loss to explain how this could have happened. There's essentially no way for an entry to be added to |
No state group for unknown or outlier event
No state group for unknown or outlier event
Hi I just want to report that I have the same issue and if you need some more sample data I'm happy to help with providing information. |
@Kidswiss thanks! What is the event id of the missing forward extremity? Do you know when it got added, and do you have logs going back that far? |
@richvdh how can I determine the date it was added? I found it in the |
well, that's good in that it confirms it is the same problem as @esackbauer. But no, it should not be the case.
Sadly that is hard to do. If it is a federated room, we might be able to check when other servers saw the event. If it is a V1 room, the event id will give us a clue. Otherwise... :( Other questions which might help eliminate potential causes:
|
For reference, here is a query which will confirm if your system has been affected by this bug: select * from event_forward_extremities efe left join events e using (event_id) where e.event_id is null; ... if that returns any rows, you have this issue. (If it doesn't, you likely have a different problem and should open a separate issue) |
I've since downgraded to 1.56 again, as that's a fairly important room for me. EDIT: |
attachments as in uploaded media? shouldn't be a factor.
It's conceivable this could have introduced a problem by not restoring all rows to the |
To work around the problem, it should be safe to remove the bad rows from DELETE FROM event_forward_extremities WHERE event_id='$....'; ... and then restart synapse. (Note: don't forget the It doesn't get us any closer to figuring out the cause, though. |
I took a dump, dropped the broken entries and updated to 1.57.1 and it looks good so far. Thanks! But I agree it's a bit weird that this happened in the first place. But I won't rule out any screw-ups from my side. |
This comment was marked as off-topic.
This comment was marked as off-topic.
I seemingly have the same issue. I had 2 entries in the My instance of matrix/synapse has been floating around for 4 or 5 years probably and updated regularly along the way (avhost container). If any other info is helpful just let me know what I can provide. EDIT: after some time the issue cleared itself up (after deleting the db entries). Not sure if some other gc process eventually happened that fully cleared it up or what. |
Just wanted to note that a user just reported this for a matrix.org room on my homeserver... |
This comment was marked as off-topic.
This comment was marked as off-topic.
Likely Synapse had cached the deleted rows. I've updated the instructions to suggest restarting it.
Please don't report different issues here. |
People encountering this issue as confirmed by the query at #12507 (comment): First: note that this error is reporting corruption in your database which may have happened a long time ago. The only recent change is that Synapse now checks for the corruption. Second, please confirm:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
My install is fully isolated currently.
I've never enabled that, so I'm guessing the default is no. Glad to now know it exists however! That's partly why I wasn't federating anything as I knew the data would grow massively on public rooms etc :)
No.
No. |
@MparkG: as the earlier comments make very clear: "If it doesn't [return any rows] you likely have a different problem and should open a separate issue". |
@travisghansen thanks for the answers, though honestly they don't help me get any closer to understanding what can have caused this. Be wary of retention: it's not a well-used feature and the reason I asked about it is that it is likely to cause corruption bugs like this. |
We're seeing exceptions like these as well and those rooms can't be written to. This is a room on the hackint IRC bridge, so lots of federated users, no usage of delete room or purge history API endpoints and we are using We're on Synapse Feel free to contact me at Traceback synapse.http.server: [PUT-672070] Failed handle request via 'RoomSendEventRestServlet': <XForwardedForRequest at 0x7fb87ab32a30 method='PUT' uri='/_matrix/client/r0/rooms/!lxUgQdCOpWnOrYNVxb%3Ahax404.de/send/m.room.message/1653997592776__inc2?user_id=%40hexa-%3Ahackint.org' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
File "/nix/store/ks8r355nmkcx9q23s4m3nm1y767rfrln-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration: [{'event_id': '$Ps2Ot0YLz20kbjS8lRFmK_Yo4t2fXPAvq3IU0CQTMoU', 'state_group': 820535}]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/nix/store/ks8r355nmkcx9q23s4m3nm1y767rfrln-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
result = current_context.run(gen.send, result)
File "/nix/store/lan1pgij0701dzi9nsvvy6nn62qk93b5-matrix-synapse-1.58.0/lib/python3.9/site-packages/synapse/storage/databases/main/state.py", line 332, in _get_state_group_for_events
raise RuntimeError("No state group for unknown or outlier event %s" % e)
RuntimeError: No state group for unknown or outlier event $FXKIdG6LkVKVsOEROx4IBer-eqPsQ-HCWdWpRuqyRFk RetentionWe are indeed using a pretty aggressive retention rule, since we only want to relay the bridged content, but not store it persistenly. retention:
allowed_lifetime_max: 1m
allowed_lifetime_min: 1m
default_policy:
max_lifetime: 1m
min_lifetime: 1m
enabled: true
purge_jobs:
- interval: 30m Query result
Not the event id mentioned in the traceback above, but I guess that's still worth debugging/fixing. No corruption log linesWe don't see any corruption log events from #12620 Removed events according to #12507 (comment)matrix-synapse=# SELECT count(event_id) FROM event_forward_extremities;
count
-------
2660
(1 row)
matrix-synapse=# BEGIN;
BEGIN
matrix-synapse=# DELETE FROM event_forward_extremities WHERE event_id IN (select event_id from event_forward_extremities efe left join events e using (event_id) where e.event_id is null);
DELETE 7
matrix-synapse=# SELECT count(event_id) FROM event_forward_extremities;
count
-------
2655
(1 row)
matrix-synapse=# END;
COMMIT 2660 - 7 = 2655 … okay. 🤔 |
I noticed it recently when restarting a bot.
Yes. The event is my own homeserver's though.
No.
Yes, only purge history; it has happened that Synapse was terminated during mass purging, it may have been while purging this room.
No. |
@14mRh4X0r Could you post the logs for your issue? |
Sure (Synapse 1.66.0 from Debian backports):
|
I'm going to assume @14mRh4X0r's issue is caused by their use of the history purge function and is hence a duplicate of #13476. As far as I can tell from the history in this issue, everybody who has seen this (apart from @travisghansen) has, at some point in the past, used one of the history deletion functions. @travisghansen's symptoms remain unexplained, but in the absence of further reports, it's unlikely to be worth further investigation. Accordingly, I'm closing this as a duplicate of #13476. |
Description
After upgrading to Synapse v1.57.0, one direct room is not working anymore. Cannot post new messages from any client or member.
Cannot roll back history more than the screen, then its stuck.
Cannot leave room "Internal Server Error 500"
Steps to reproduce
Tried rebooting the server, even tried to remove the room with Synapse-Admin - still get an error, room cannot be deleted.
Version information
If not matrix.org:
matrix.flyar.net
Version:
{"server_version":"1.57.0","python_version":"3.9.2"}
Install method:
Debian repo
Debian 11.3 VM x64
This is how the logfile looks like:
The part after "During handling" is repeated. It seems one event is blocking everything? How can I get rid of this?
The text was updated successfully, but these errors were encountered: