This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
SerializationFailure in notify_interested_services_ephemeral
under heavy load
#11195
Labels
A-Application-Service
Related to AS support
S-Minor
Blocks non-critical functionality, workarounds exist.
T-Defect
Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Description
We are seeing periodic
SerializationFailure
errors in Sentry in thenotify_interested_services_ephemeral
background process. This is running in a dedicated appservice-pusher worker instance. At peak we see around 30 ephemeral events per second being sent to the appservice (presence is disabled so these should all be read receipts), the error appears much more regularly during these peaks. Some initial investigation:The query in question is invoked in
set_type_stream_id_for_appservice_txn
, updating the stream position for the appservice:synapse/synapse/storage/databases/main/appservice.py
Lines 423 to 429 in c7a5e49
This itself is called in the appservice handler:
synapse/synapse/handlers/appservice.py
Lines 252 to 260 in c7a5e49
I'm new to most of the synapse codebase so may have this wrong but it appears this is a race condition between two parallel executions of the
notify_interested_services_ephemeral
process for the same appservice/stream. Am I correct in thinking this means some events are getting sent twice to the appservice as the position is not always updated? (Could events also be missed somehow?).We'd like to figure out a way to fix this issue, currently only two possible solutions have come to mind:
(appservice, stream_id)
, would probably need to be on the entire handler though which may be a significant performance impact. Not a fan of this!Keen to hear thoughts, and can most likely find time to work on any potential soltuion.
The text was updated successfully, but these errors were encountered: