Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

/sync fails with SynapseError: 404 - Could not find event if the last event is a redaction of an event that we don't have #12864

Closed
n3rdybyn47ur3 opened this issue May 25, 2022 · 23 comments · Fixed by #12905
Assignees
Labels
A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@n3rdybyn47ur3
Copy link

Hello,

we noticed a strange login behaviour that only happens to some of our users:

When trying to log in (SAML) they got an 404 error after the SAML Authentication is allready completed (and the verification should start)
loginerror

There are warnings to cachebuster GET requests in my synapse log:
2022-05-24 12:35:33,568 - synapse.http.server - 723 - WARNING - GET-125345 - Not sending response to request <XForwardedForRequest at 0x7fe7c3c40d00 method='GET' uri='/_matrix/client/r0/sync?filter=0&timeout=0&_cacheBuster=1653395733090' clientproto='HTTP/1.0' site='8008'>, already disconnected.

Sadly I could not find the coresponding one.

We have updated synapse more than once during the last weeks but since existing sessions reconnect without login its impossible to say, since when this happens. Most of the users can login fine. The problem occurs with Element Web, Desktop and Android App (all newest version.

@squahtx squahtx added the X-Needs-Info This issue is blocked awaiting information from the reporter label May 25, 2022
@squahtx
Copy link
Contributor

squahtx commented May 25, 2022

It's probably unrelated to SAML. Sounds similar to #12820, but it's hard to confirm because no one's posting their errors or logs.

Could you dig out some INFO-level Synapse logs that include the 404 response to /sync?

@n3rdybyn47ur3
Copy link
Author

Yes, that should be possible. Tomorrow is a national holyday, so I will change the log level Friday morning and hopefully deliver logs Friday or Monday.

@rettichschnidi
Copy link
Contributor

rettichschnidi commented May 26, 2022

Same error on my homeserver. Multiple users affected, the two I investigated a bit more have the following logs (renamed to "user1" and "user2" here):

May 24 09:35:09 matrix-1 synapse[345874]: synapse.http.server: [GET-7246170] <XForwardedForRequest at 0x7fc781185bb0 method='GET' uri='/_matrix/client/r0/sync?filter=3&timeout=0&_cacheBuster=1653384908382' clientproto='HTTP/1.1' site='8008'> SynapseError: 404 - Could not find event $vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g
May 24 09:35:09 matrix-1 synapse[345874]: synapse.access.http.8008: [GET-7246170] 127.0.0.1 - 8008 - {@user1:digitale-gesellschaft.ch} Processed request: 0.540sec/0.021sec (0.340sec, 0.035sec) (0.956sec/1.158sec/172) 101B 404 "GET /_matrix/client/r0/sync?filter=3&timeout=0&_cacheBuster=1653384908382 HTTP/1.1" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0" [2 dbevts]
May 24 09:35:12 matrix-1 synapse[345874]: synapse.http.server: [GET-7246191] <XForwardedForRequest at 0x7fc73e1b4100 method='GET' uri='/_matrix/client/r0/sync?filter=1&timeout=0&_cacheBuster=1653384912246' clientproto='HTTP/1.1' site='8008'> SynapseError: 404 - Could not find event $2XH5WO75JB-dfBoAd7qjczsWwPVZUgcTYgNBa2p5_6s
May 24 09:35:12 matrix-1 synapse[345874]: synapse.access.http.8008: [GET-7246191] 127.0.0.1 - 8008 - {@user2:digitale-gesellschaft.ch} Processed request: 0.621sec/0.006sec (0.390sec, 0.017sec) (1.077sec/1.584sec/202) 101B 404 "GET /_matrix/client/r0/sync?filter=1&timeout=0&_cacheBuster=1653384912246 HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4832.0 Safari/537.36" [2 dbevts]

Issue exists on Synapse 1.58.1 and 1.59.1 and on Element-Web 1.10.11, 1.10.12 and 1.10.13. Unsure if it already affected older versions.

@richvdh
Copy link
Member

richvdh commented May 27, 2022

This sounds rather like #12571, but that should have been fixed in Synapse 1.59.0 (and later) :/

@rettichschnidi: can you run a query on your database and share the results?

SELECT * FROM events e LEFT JOIN rejections r USING (event_id)
WHERE event_id='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g';

@richvdh richvdh changed the title Strange SAML / Login behaviour /sync fails with SynapseError: 404 - Could not find event May 27, 2022
@rettichschnidi
Copy link
Contributor

rettichschnidi commented May 27, 2022

@richvdh:

synapse=# SELECT * FROM events e LEFT JOIN rejections r USING (event_id)
WHERE event_id='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g';
                   event_id                   | topological_ordering |       type       |                   room_id                    | content | unrecognized_keys | processed | outlier | depth | origin_server_ts |  received_ts  |                  sender                   | contains_url | instance_name | stream_ordering | state_key | rejection_reason | reason | last_check
----------------------------------------------+----------------------+------------------+----------------------------------------------+---------+-------------------+-----------+---------+-------+------------------+---------------+-------------------------------------------+--------------+---------------+-----------------+-----------+------------------+--------+------------
 $vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g |                   15 | m.room.redaction | !emUtfyPuSaXxMGNSQW:digitale-gesellschaft.ch |         |                   | t         | f       |    15 |    1633904626616 | 1633904626641 | @user3:digitale-gesellschaft.ch | f            | master        |         3665886 |           |                  |        |
(1 row)

synapse=#

(user3 is another alias for a 3rd user)

@richvdh
Copy link
Member

richvdh commented May 27, 2022

That's... odd. Can you check event_json too?

SELECT event_id, internal_metadata FROM event_json WHERE event_id='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g'; 

@rettichschnidi
Copy link
Contributor

synapse=# SELECT event_id, internal_metadata FROM event_json WHERE event_id='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g';
                   event_id                   |                        internal_metadata
----------------------------------------------+------------------------------------------------------------------
 $vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g | {"token_id":2639,"txn_id":"m1633904625401.6","historical":false}
(1 row)

synapse=#

@richvdh
Copy link
Member

richvdh commented May 27, 2022

hrm, another idea. Can you try these:

SELECT * FROM redactions WHERE event_id='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g';
SELECT * FROM redactions WHERE redacts='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g';

@rettichschnidi
Copy link
Contributor

synapse=# SELECT * FROM redactions WHERE event_id='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g';
                   event_id                   |                   redacts                    | have_censored |  received_ts
----------------------------------------------+----------------------------------------------+---------------+---------------
 $vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g | $ikN-_JY29tlFZpZ4oRzFPp7mRFR_UYWzDFLduvPSkO8 | t             | 1633904626660
(1 row)

synapse=# SELECT * FROM redactions WHERE redacts='$vrNBlJ52sq9ojxXRFl7JLoBU1A5zyEorDuoMYMxkz8g';
 event_id | redacts | have_censored | received_ts
----------+---------+---------------+-------------
(0 rows)

synapse=#

@richvdh
Copy link
Member

richvdh commented May 27, 2022

right, and can you do the same queries for $ikN-_JY29tlFZpZ4oRzFPp7mRFR_UYWzDFLduvPSkO8 ?

SELECT * FROM events e LEFT JOIN rejections r USING (event_id)
WHERE event_id='$ikN-_JY29tlFZpZ4oRzFPp7mRFR_UYWzDFLduvPSkO8';

SELECT event_id, internal_metadata FROM event_json
WHERE event_id='$ikN-_JY29tlFZpZ4oRzFPp7mRFR_UYWzDFLduvPSkO8';

(sorry, slightly grasping at straws here - I'm really struggling to understand what could cause that exception when the event is just there in the database)

@rettichschnidi
Copy link
Contributor

rettichschnidi commented May 28, 2022

synapse=# SELECT * FROM events e LEFT JOIN rejections r USING (event_id) WHERE event_id='$ikN-_JY29tlFZpZ4oRzFPp7mRFR_UYWzDFLduvPSkO8';
 event_id | topological_ordering | type | room_id | content | unrecognized_keys | processed | outlier | depth | origin_server_ts | received_ts | sender | contains_url | instance_name | stream_ordering | state_key | rejection_reason | reason | last_check
----------+----------------------+------+---------+---------+-------------------+-----------+---------+-------+------------------+-------------+--------+--------------+---------------+-----------------+-----------+------------------+--------+------------
(0 rows)

synapse=# SELECT event_id, internal_metadata FROM event_json WHERE event_id='$ikN-_JY29tlFZpZ4oRzFPp7mRFR_UYWzDFLduvPSkO8';
 event_id | internal_metadata
----------+-------------------
(0 rows)

synapse=#

Not great for the documentation here, but in case it helps I am available on Matrix in either #synapse:matrix.org or via DM (@rettichschnidi:digitale-gesellschaft.ch).

@rettichschnidi
Copy link
Contributor

rettichschnidi commented May 28, 2022

Maybe relevant details:

  • two out of 379 users are known to be affected (maybe more are, but I have only logs back to May 24)
  • Retention policy looks like this:
retention:
  enabled: true
  default_policy:
    min_lifetime: 1d
    max_lifetime: 90d
  allowed_lifetime_min: 1d
  allowed_lifetime_max: 90d

@Kokokokoka

This comment was marked as off-topic.

@rettichschnidi

This comment was marked as off-topic.

@rettichschnidi
Copy link
Contributor

Two more cleanup tasks I run once in a while on my home-server:

#!/bin/bash

set -eu -o pipefail

# Remove all forgotten rooms
for room_id in $(curl -sS --header "Authorization: Bearer ZENSORED" --request GET 'http://localhost:8008/_synapse/admin/v1/rooms?limit=5000' | jq -r '.rooms[] | select(.joined_local_members == 0) | .room_id'); do
  echo "Deleting room ${room_id}"
  curl -sS --request POST --header "Authorization: Bearer ZENSORED" --header "Content-Type: application/json" --data "{ \"room_id\": \"$room_id\" }" 'http://localhost:8008/_synapse/admin/v1/purge_room' > /dev/null
done
#!/bin/bash

set -eu -o pipefail

readonly auth_header="Authorization: Bearer censored"
declare -A purge_id_to_room_id
purge_ids=""

# Purge old history (90+ days) of all (well, up to 10'000) rooms
for room_id in $(curl -sS --header "${auth_header}" --request GET 'http://localhost:8008/_synapse/admin/v1/rooms?limit=10000' | jq -r '.rooms[] | .room_id'); do
  # Delete history after 90 days
  purge_id="$(curl -sS --request POST --header "${auth_header}" --header "Content-Type: application/json" --data "{ \"delete_local_events\": true, \"purge_up_to_ts\": $(( $(date +%s) - 86400 * 90))000 }" "http://localhost:8008/_synapse/admin/v1/purge_history/${room_id}" | jq -r .purge_id)"
  if [ "${purge_id}" == null ]; then
    echo "Failed to purge history of room '${room_id}'"
    continue
  fi
  echo "Handle for purge_history of ${room_id}: ${purge_id}"
  purge_ids="${purge_ids} ${purge_id}"
  purge_id_to_room_id["${purge_id}"]="${room_id}"
done

while [ -n "${purge_ids// }" ]; do
  echo "Left work: '${purge_ids/ /}'"
  for purge_id in ${purge_ids}; do
    progress="$(curl -sS --request GET --header "${auth_header}" --header 'Content-Type: application/json' "http://localhost:8008/_synapse/admin/v1/purge_history_status/${purge_id}")"

    echo "State of ${purge_id_to_room_id[${purge_id}]}: ${progress}"

    # Keep working on active purges
    if echo "${progress}" | jq -e 'select(.status == "active")' > /dev/null; then
      continue
    fi

    # Remove purges which are completed, failed no longer in database
    purge_ids="${purge_ids//${purge_id}/}"
  done
  sleep 1
done

However, both did not run around the time when the error started to show. Did run them today however.

@richvdh
Copy link
Member

richvdh commented May 28, 2022

Right, I think I understand what is causing this now. It is related to #12571 in that it is because the sync token points to an event that cannot be retrieved from the database; however in this case the reason it cannot be retrieved is that it is a redaction which attempts to redact an event which we no longer have.

@richvdh richvdh changed the title /sync fails with SynapseError: 404 - Could not find event /sync fails with SynapseError: 404 - Could not find event if the last event is a redaction of an event that we don't have May 28, 2022
@richvdh
Copy link
Member

richvdh commented May 28, 2022

(I hope this is also @n3rdybyn47ur3's problem - we have rather derailed this issue if not...)

@rettichschnidi
Copy link
Contributor

rettichschnidi commented May 28, 2022

Right, I think I understand what is causing this now. It is related to #12571 in that it is because the sync token points to an event that cannot be retrieved from the database; however in this case the reason it cannot be retrieved is that it is a redaction which attempts to redact an event which we no longer have.

Seems plausible to me. I am also affected by this, which also point towards errors when events get deleted.

Thanks a lot for your help!

@richvdh
Copy link
Member

richvdh commented May 28, 2022

If you're feeling adventurous, the branch at #12905 should fix the problem. It's based on current bleeding-edge though.

@richvdh richvdh added A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels May 28, 2022
@richvdh richvdh self-assigned this May 28, 2022
@rettichschnidi
Copy link
Contributor

rettichschnidi commented May 29, 2022

@richvdh I tried to apply the patches on top of 1.59.1 but (as you said) too much has changed since, so I gave up. In case having this tested would be very valuable I'll give it another try.

@n3rdybyn47ur3
Copy link
Author

I finally got hold of one of the users. Here are the corresponding log entries (IPs and usernames changed):

2022-05-30 06:51:09,472 - synapse.http.server - 104 - INFO - GET-496839 - <XForwardedForRequest at 0x7f83d3f70850 method='GET' uri='/_matrix/client/r0/sync?filter=0&timeout=0&_cacheBuster=1653893469201' clientproto='HTTP/1.0' site='8008'> SynapseError: 404 - Could not find event $Me3qK39xeOTDTWgDyPr6w-GVULndG9Rtkw9He9DcmeQ

2022-05-30 06:51:09,475 - synapse.access.http.8008 - 427 - INFO - GET-496839 - 192.168.0.20 - 8008 - {@username:matrix.something.de} Processed request: 0.265sec/0.003sec (0.058sec, 0.018sec) (0.219sec/0.541sec/38) 101B 404 "GET /_matrix/client/r0/sync?filter=0&timeout=0&_cacheBuster=1653893469201 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36" [102 dbevts]

It seems to be the same problem as @rettichschnidi mentioned earlier.
No manual cleanup tasks until now, but the configured retention is 30 days.
A fix would be nice. Our three to four candidates still have open sessions, but there will be more candidates soon.

@zzottel
Copy link

zzottel commented Jun 6, 2022

Just to add this:

It seems I'm seeing the same problem. I'll try installing the RC that anoa said should be released tomorrow.

Note that this will show up whenever an update of an Element app is installed, as the initial sync seems to trigger the problem. This is how it showed up on my end with the new version from TestFlight. So you might want to stall Element app releases until a fix is available in a regular release.

@zzottel
Copy link

zzottel commented Jun 7, 2022

Installing 1.61.0rc1 helped.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Sync defects related to /sync S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants