Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Can't join room on a remote server: violating constraint event_reference_hashes_event_id_algorithm_key #8768

Closed
JOJ0 opened this issue Nov 16, 2020 · 15 comments

Comments

@JOJ0
Copy link

JOJ0 commented Nov 16, 2020

Hi @MatMaul and @richvdh,
had a very similar issue to this one #8323 with joining a room on a friend's HS.

Version information

{'python_version': '3.6.9', 'server_version': '1.22.1'}
running on an Ubuntu 18.04 machine,
installed by deb package from https://packages.matrix.org/debian/ bionic main
Room version: 1

Description

This is the error I am getting when I click the "join" button

twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "event_reference_hashes_event_id_algorithm_key"
DETAIL:  Key (event_id, algorithm)=($1487157046293MNsht:matrix.xxx.yy, sha256) already exists.

I followed your suggestion and deleted all rows in event_reference_hashes table that did not have a row with that event_id in the events table. There were actually a whole lot of them! about a hundered-something!

Each time I deleted the row with the key mentioned in the error message and pressed the "Join" button again I got a new key hash mentioned which I then deleted. I ended up comparing events and events_reference_hashes and deleted all from hashes table that did not have a corresponding row in events or events_json

I also tried declining the invitation and let someone reinvite me (or reinvite myself with my user I have on my friends's server in this room). I now ended up with the same error message only that it does not show a DETAIL line in the log anymore:

twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "event_reference_hashes_event_id_algorithm_key"

I even checked event_json if there is anything related. Not sure if that's important but checked anyway:

select * from event_json where event_id like '$148%:matrix.xxx.yy';
(0 rows)

(also checked for other unrelated keys like '$1%:matrix.xxx.yy' - there were found 2 entries in this table that seem to represent recent events and also have corresponding lines in events table and event_reference_hashes. So all in all if I understood correctly this is properly cleaned up now.

Steps to reproduce

The error is logged each time I press the Join button in the invitation. The invitation stays. If I decline it I can reproduce by sending a new invitation and press Join button.

I have a user in the room on my friend's server as well, so easily can reinvite or get some room information from there.

Any suggestions on how to procede? Thanks in advance! :-)

@JOJ0
Copy link
Author

JOJ0 commented Nov 17, 2020

https://jojo.peek-a-boo.at/E950513F/synapse_invite_to_err.log

invite:
2020-11-05 17:57:28

end of join trying and db cleanup:
2020-11-05 19:04:13

@richvdh
Copy link
Member

richvdh commented Nov 17, 2020

I'm struggling to understand how those logs correspond to what happened.

As I understand it, you

  • tried to join a room, which failed with a duplicate key violation
  • deleted entries from event_reference_hashes which did not have corresponding entries in events
  • Tried once more to join but still got a duplicate key violation.

please can you give timestamps to help correlate to the logs? Possibly, try again so that we have a clear record of what happened when.

After the above three steps: are there now new entries in event_reference_hashes without corresponding entries in events?

I now ended up with the same error message only that it does not show a DETAIL line in the log anymore

I don't see this anywhere in your logs, so think it's a red herring.

@JOJ0
Copy link
Author

JOJ0 commented Nov 19, 2020

Hi, thanks for getting back to me so quickly!

I reproduced: Sent invitation from my user on my friends HS then clicked on "join" with my user on my own HS.

See the traceback when I press the join button at: 2020-11-19 09:15:09,388

This is the log: https://jojo.peek-a-boo.at/E950513F/synapse_invite_to_err_repr1.log

After the above three steps: are there now new entries in event_reference_hashes without corresponding entries in events?

no there are not:

synapse=# select count(*) from events where event_id like '%:matrix.forster.cc';
count | 3

synapse=# select count(*) from event_reference_hashes where event_id like '%:matrix.forster.cc';
count | 6
synapse=# select * from events where event_id like '%:matrix.forster.cc';
stream_ordering      | 202860
topological_ordering | 13094
event_id             | $16057736912644eaQQR:matrix.forster.cc
type                 | m.room.member
room_id              | !FeDIMBbiDuWmFfGIWM:matrix.forster.cc
content              |
unrecognized_keys    |
processed            | t
outlier              | t
depth                | 13094
origin_server_ts     | 1605773691480
received_ts          | 1605773691996
sender               | @jojo:matrix.forster.cc
contains_url         | f
instance_name        | master
---------------------+---------------------------------------
stream_ordering      | 199981
topological_ordering | 12877
event_id             | $16054608252328quShj:matrix.forster.cc
type                 | m.room.member
room_id              | !FeDIMBbiDuWmFfGIWM:matrix.forster.cc
content              |
unrecognized_keys    |
processed            | t
outlier              | t
depth                | 12877
origin_server_ts     | 1605460825630
received_ts          | 1605460825684
sender               | @jojo:matrix.forster.cc
contains_url         | f
instance_name        | master
---------------------+---------------------------------------
stream_ordering      | 199262
topological_ordering | 12868
event_id             | $16054594482319EaGxM:matrix.forster.cc
type                 | m.room.member
room_id              | !FeDIMBbiDuWmFfGIWM:matrix.forster.cc
content              |
unrecognized_keys    |
processed            | t
outlier              | t
depth                | 12868
origin_server_ts     | 1605459448435
received_ts          | 1605459448503
sender               | @jojo:matrix.forster.cc
contains_url         | f
instance_name        | master

by the look on these: I think I just forgot to to delete the $149 ones! The $160 one's are current and have entries in both tables. Guess I can savely delete the $149 rows, right?

synapse=# select * from event_reference_hashes where event_id like '%:matrix.forster.cc';
event_id  | $14980453181141SLaxo:matrix.forster.cc
algorithm | sha256
hash      | \xxx
----------+-------------------------------------------------------------------
event_id  | $1496381210149YCwyf:matrix.forster.cc
algorithm | sha256
hash      | \xxx
----------+-------------------------------------------------------------------
event_id  | $1494260361698TzqAB:matrix.forster.cc
algorithm | sha256
hash      | \xxx
----------+-------------------------------------------------------------------
event_id  | $16054608252328quShj:matrix.forster.cc
algorithm | sha256
hash      | \xxx
----------+-------------------------------------------------------------------
event_id  | $16054594482319EaGxM:matrix.forster.cc
algorithm | sha256
hash      | \xxx
----------+-------------------------------------------------------------------
event_id  | $16057736912644eaQQR:matrix.forster.cc
algorithm | sha256
hash      | \xxx

@JOJ0
Copy link
Author

JOJ0 commented Nov 19, 2020

Aha something else: I deleted the 3 $149 rows, then pressed "join" again and now see something with matrix.org in these tables. No time to check db now but will soon.....

2020-11-19 09:55:13,537 - synapse.http.server - 83 - ERROR - POST-2602300 - Failed handle request via 'JoinRoomAliasServlet': <XForwardedForRequest at 0x7f3174d04828 method='POST
' uri='/_matrix/client/r0/join/!FeDIMBbiDuWmFfGIWM%3Amatrix.forster.cc' clientproto='HTTP/1.1' site=8008>
Traceback (most recent call last):
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/http/server.py", line 228, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/http/server.py", line 405, in _async_render
    callback_return = await raw_callback_return
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/rest/client/v1/room.py", line 327, in on_POST
    third_party_signed=content.get("third_party_signed", None),
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/handlers/room_member.py", line 320, in update_membership
    require_consent=require_consent,
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/handlers/room_member.py", line 506, in _update_membership
    requester, remote_room_hosts, room_id, target, content
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/handlers/room_member.py", line 1040, in _remote_join
    remote_room_hosts, room_id, user.to_string(), content
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/handlers/federation.py", line 1398, in do_invite_join
    origin, room_id, auth_chain, state, event, room_version_obj
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/handlers/federation.py", line 2045, in _persist_auth_tree
    for e in itertools.chain(auth_events, state)
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/handlers/federation.py", line 2913, in persist_events_and_notify
    event_and_contexts, backfilled=backfilled
  File "/opt/venvs/matrix-synapse/lib/python3.6/site-packages/synapse/storage/persist_events.py", line 241, in persist_events
    defer.gatherResults(deferreds, consumeErrors=True)
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "event_r
eference_hashes_event_id_algorithm_key"
DETAIL:  Key (event_id, algorithm)=($1502951597947965QPmpS:matrix.org, sha256) already exists.

@clokep
Copy link
Member

clokep commented Nov 20, 2020

No time to check db now but will soon.....

I think it will be interesting to see what's in it now. Thanks!

@JOJ0
Copy link
Author

JOJ0 commented Nov 21, 2020

little difference

synapse=# select count(*) from event_reference_hashes where event_id like '%:matrix.org';
 count
--------
 110928
(1 row)

synapse=# select count(*) from events where event_id like '%:matrix.org';
 count
--------
 110926
(1 row)

let's have a look at the one in the error message:

synapse=# select * from event_reference_hashes where event_id like '$1502951597947965QPmpS:matrix.org';
-[ RECORD 1 ]-----------------------------------------------------------------
event_id  | $1502951597947965QPmpS:matrix.org
algorithm | sha256
hash      | \x62588d8f30cc22a5052ea8723db479cc76ccb58dd25ae5237ce15800b9d7ec0e

synapse=# select * from events where event_id like '$1502951597947965QPmpS:matrix.org';
(0 rows)

@JOJ0
Copy link
Author

JOJ0 commented Nov 24, 2020

I guess next step is again deleting all rows from event_reference_hashes that don't have corresponding key in events table?

@clokep
Copy link
Member

clokep commented Nov 24, 2020

@JOJ0 Do you have message retention policies enabled (look under the retention block in your config)? If so, this could be related to #8707 (comment).

@JOJ0
Copy link
Author

JOJ0 commented Nov 24, 2020

No, doesn't look like it, everything containing "retention" is commented out in homeserver.yaml

#stats:
#   enabled: true
#   bucket_size: 1d
#   retention: 1y
...
#redaction_retention_period: 28d

@clokep
Copy link
Member

clokep commented Nov 24, 2020

@JOJ0 What about the one that says just retention:? See

retention:
# The message retention policies feature is disabled by default. Uncomment the
# following line to enable it.
#
#enabled: true

@JOJ0
Copy link
Author

JOJ0 commented Nov 24, 2020

Not set in my config, so I guess it's falling back to its default value which your posted sample conf says it's disabled:

jojo@hostx:~$ grep -in retention /etc/matrix-synapse/homeserver.yaml
313:#redaction_retention_period: 28d
1471:# 'retention' controls how long historical statistics will be kept for.
1478:#   retention: 1y

@JOJ0
Copy link
Author

JOJ0 commented Nov 25, 2020

I guess next step is again deleting all rows from event_reference_hashes that don't have corresponding key in events table?

Is there any problem if I move on like this ^ and see what happens next or you 'd like to get some more information of the current state?

@anoadragon453
Copy link
Member

Is there any problem if I move on like this

I don't think we can get much more information from the database at the moment. Let's try removing the excess rows again and see what happens.

@JOJ0
Copy link
Author

JOJ0 commented Nov 29, 2020

Hi @anoadragon453
I deleted 4 or 5 more rows from event_reference_hashes table from matrix.org, my friends HS and another HS that was in the room and finally I could accept the invitation in Element-Web and am in the room :-)
In case you need anything else to be checked that could help finding the root-cause of this please tell me. Happy to help, otherwise you can certainly close this :-)

@anoadragon453
Copy link
Member

@JOJ0 Thanks for letting us know!

In case you need anything else to be checked that could help finding the root-cause of this please tell me. Happy to help, otherwise you can certainly close this :-)

Nothing for right now, but may come knocking again at some point if another lead sprouts up 🙂

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants