You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.
We've now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect. We use MD5 checksums throughout the system, for example, to prevent, detect, and recover from corruption that can occur during receipt, storage, and retrieval of customers' objects. However, we didn't have the same protection in place to detect whether this particular internal state information had been corrupted. As a result, when the corruption occurred, we didn't detect it and it spread throughout the system causing the symptoms described above. We hadn't encountered server-to-server communication issues of this scale before and, as a result, it took some time during the event to diagnose and recover from it.
eventuate could check if
stored / replayed events got corrupted by adding a checksum to the stored event or event payload
replicated events got corrupted by adding a checksum to the serialized / transmitted data
The text was updated successfully, but these errors were encountered:
After reading this report about S3's outage in 2008 I think it makes sense to add data corruption checks to eventuate. The report says
eventuate could check if
The text was updated successfully, but these errors were encountered: