Skip to content
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.

Add checksums to detect data corruption #296

Open
magro opened this issue Jul 29, 2016 · 0 comments
Open

Add checksums to detect data corruption #296

magro opened this issue Jul 29, 2016 · 0 comments

Comments

@magro
Copy link
Contributor

magro commented Jul 29, 2016

After reading this report about S3's outage in 2008 I think it makes sense to add data corruption checks to eventuate. The report says

We've now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect. We use MD5 checksums throughout the system, for example, to prevent, detect, and recover from corruption that can occur during receipt, storage, and retrieval of customers' objects. However, we didn't have the same protection in place to detect whether this particular internal state information had been corrupted. As a result, when the corruption occurred, we didn't detect it and it spread throughout the system causing the symptoms described above. We hadn't encountered server-to-server communication issues of this scale before and, as a result, it took some time during the event to diagnose and recover from it.

eventuate could check if

  • stored / replayed events got corrupted by adding a checksum to the stored event or event payload
  • replicated events got corrupted by adding a checksum to the serialized / transmitted data
@krasserm krasserm changed the title Add checksums to detect / prevent data corruption Add checksums to detect data corruption Oct 5, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants