Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage, kv: replica corruption reporting and recovery #67568

Open
5 tasks
itsbilal opened this issue Jul 13, 2021 · 1 comment
Open
5 tasks

storage, kv: replica corruption reporting and recovery #67568

itsbilal opened this issue Jul 13, 2021 · 1 comment
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-sre For issues SRE opened or otherwise cares about tracking.

Comments

@itsbilal
Copy link
Member

itsbilal commented Jul 13, 2021

This is a meta issue to track the issue of recovering from corrupt sstables.
Currently, cockroach just crashes with an error like the one(s) in
#57934 . From sentry reports,
it's clear that this is a fairly frequent occurrence in the wild.

But we can do better than crashing the node. Cockroach could just mark the corresponding
replica(s) as corrupt, and up-replicate from other nodes, which should hopefully have
clean, non-corrupt copies of those keys. This reduces the impact on the cluster
and reduces the chances that a bunch of sporadic instances of hardware-level
sstable corruption would significantly impact uptime of the entire cluster.

Parts of this project (details TBD):

  • Update Pebble to report narrow bounds of corruption (sstable: isolate and report corruption in sstables pebble#1192)
  • Deduplicate instances of read-time corruption encountered, either in Pebble or Cockroach
  • Mark replica as corrupt and stop all in-flight writes into it from succeeding
  • Delete replica, lay down range tombstone in Pebble.
  • Update Pebble to be smarter with compacting a range tombstone that deletes the entirety of a corrupt block or sstable

Jira issue: CRDB-8611

@itsbilal itsbilal added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. labels Jul 13, 2021
@itsbilal itsbilal self-assigned this Jul 13, 2021
@petermattis
Copy link
Collaborator

If there is a Pebble checksum failure we may not want to delete the sstable on disk. That is, we might want to logically delete the sstable from the LSM, but move the sstable to some holding pen. I'm not up to date on what SSDs do nowadays with regard to hardware failures, but it seems safest to not allow the storage corresponding to a checksum failure to be reused.

itsbilal added a commit to itsbilal/cockroach that referenced this issue Aug 3, 2021
This PR proposes a design for responding to sstable
corruption by marking matching replicas as corrupt
and letting Cockroach replication delete / replace
them with data from other nodes.

Informs cockroachdb#67568.

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Aug 24, 2021
This PR proposes a design for responding to sstable
corruption by marking matching replicas as corrupt
and letting Cockroach replication delete / replace
them with data from other nodes.

Informs cockroachdb#67568.

Release note: None.

Release justification:
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 27, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 27, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 30, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 30, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Sep 2, 2021
This PR proposes a design for responding to sstable
corruption by marking matching replicas as corrupt
and letting Cockroach replication delete / replace
them with data from other nodes.

Informs cockroachdb#67568.

Release note: None.

Release justification:
itsbilal added a commit to itsbilal/cockroach that referenced this issue Sep 28, 2021
Adds a new roachtest, sstable-corruption/table, that
imports TPCC, then finds an sstable that contains
table keys (which are replicated) on one node, then
calls `dd` to write random bytes into the middle
of them. It then checks that the corrupt node
either crashes on restart, or crashes after the
tpcc workload is run on it.

Informs cockroachdb#67568.

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Sep 30, 2021
This PR proposes a design for responding to sstable
corruption by marking matching replicas as corrupt
and letting Cockroach replication delete / replace
them with data from other nodes.

Informs cockroachdb#67568.

Release note: None.

Release justification:
itsbilal added a commit to itsbilal/cockroach that referenced this issue Oct 1, 2021
This PR proposes a design for responding to sstable
corruption by marking matching replicas as corrupt
and letting Cockroach replication delete / replace
them with data from other nodes.

Informs cockroachdb#67568.

Release note: None.

Release justification:
itsbilal added a commit to itsbilal/cockroach that referenced this issue Oct 14, 2021
Adds a new roachtest, sstable-corruption/table, that
imports TPCC, then finds an sstable that contains
table keys (which are replicated) on one node, then
calls `dd` to write random bytes into the middle
of them. It then checks that the corrupt node
either crashes on restart, or crashes after the
tpcc workload is run on it.

Informs cockroachdb#67568.

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Oct 14, 2021
Adds a new roachtest, sstable-corruption/table, that
imports TPCC, then finds an sstable that contains
table keys (which are replicated) on one node, then
calls `dd` to write random bytes into the middle
of them. It then checks that the corrupt node
either crashes on restart, or crashes after the
tpcc workload is run on it.

Informs cockroachdb#67568.

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Oct 15, 2021
Adds a new roachtest, sstable-corruption/table, that
imports TPCC, then finds an sstable that contains
table keys (which are replicated) on one node, then
calls `dd` to write random bytes into the middle
of them. It then checks that the corrupt node
either crashes on restart, or crashes after the
tpcc workload is run on it.

Informs cockroachdb#67568.

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this issue Oct 18, 2021
Adds a new roachtest, sstable-corruption/table, that
imports TPCC, then finds an sstable that contains
table keys (which are replicated) on one node, then
calls `dd` to write random bytes into the middle
of them. It then checks that the corrupt node
either crashes on restart, or crashes after the
tpcc workload is run on it.

Informs cockroachdb#67568.

Release note: None.
craig bot pushed a commit that referenced this issue Oct 18, 2021
70845: roachtest: Add sstable-corruption roachtest r=tbg a=itsbilal

Adds a new roachtest, sstable-corruption/table, that
imports TPCC, then finds an sstable that contains
table keys (which are replicated) on one node, then
calls `dd` to write random bytes into the middle
of them. It then checks that the corrupt node
either crashes on restart, or crashes after the
tpcc workload is run on it.

Informs #67568.

Release note: None.

Co-authored-by: Bilal Akhtar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-sre For issues SRE opened or otherwise cares about tracking.
Projects
None yet
Development

No branches or pull requests

3 participants