Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: block checksum validation for files #1203

Open
sumeerbhola opened this issue Jul 16, 2021 · 3 comments
Open

db: block checksum validation for files #1203

sumeerbhola opened this issue Jul 16, 2021 · 3 comments

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Jul 16, 2021

(creating issue from internal storage stability doc)

When Pebble ingests an sstable, it only validates the block checksums necessary for it to retrieve the smallest and largest keys in the ingested table. Reading the entire sstable and validating every block’s checksum is expensive, but we could defer it to an asynchronous background job. The goal would be to detect corrupted ingested tables soon after they’re written, instead of waiting for the long-running CockroachDB consistency checker, or a compaction or a user query to read the table.

Additionally, we could perform this asynchronous validation for all sstables written by Pebble, not just ingested sstables. It would protect against bugs on the write path, but not delayed disk bit rot. Another mode would be to periodically validate every file, and set the period such that typically only L5 or L6 files are long-lived enough to have such validation.

All of this would be optional, and configured via the pebble.Options (possibly with the ability to change the setting on an open DB).

Jira issue: PEBBLE-209

@nicktrav nicktrav self-assigned this Aug 25, 2021
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 27, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 27, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 30, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 30, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable upon ingestion,
with any corruption that may be detected detected passed to the
corruption reporting and recovery mechanism outlined in
cockroachdb/cockroach#67568.

See also: cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Aug 30, 2021
To provide better detection of corruption upon ingestion of an SSTable,
add the ability for an `sstable.Reader` to validate the checksums of all
blocks in the underlying file for the SSTable.

The new `(*sstable.Reader).ValidateBlockChecksums` will be used by a
subsequent change to perform validation of an SSTable, either at
ingestion time, or as deferred work performed at some later point in
time.

See also: cockroachdb#1203.
@nicktrav
Copy link
Contributor

nicktrav commented Sep 1, 2021

Some notes for my own benefit on internal discussions on the work to be done:

Future work could also hook into the corruption reporting pipeline being built out. Neither this, nor that work block each other.

nicktrav added a commit to nicktrav/pebble that referenced this issue Oct 6, 2021
Currently, when Pebble ingests an sstable, it validates the block
checksums necessary for it to retrieve the smallest and largest keys in
the table.

Wire up the block checksum validation codepath from cockroachdb#1240, scheduling
the validation of the ingested sstable on a background goroutine.

This ingestion validation is gated on a new experimental DB option,
`ValidateOnIngest`, which is initially off by default.

See cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Oct 6, 2021
Currently, when Pebble ingests an sstable, it validates the block
checksums necessary for it to retrieve the smallest and largest keys in
the table.

Wire up the block checksum validation codepath from cockroachdb#1240, scheduling
the validation of the ingested sstable on a background goroutine.

This ingestion validation is gated on a new experimental DB option,
`ValidateOnIngest`, which is initially off by default.

See cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Oct 6, 2021
Currently, when Pebble ingests an sstable, it validates the block
checksums necessary for it to retrieve the smallest and largest keys in
the table.

Wire up the block checksum validation codepath from cockroachdb#1240, scheduling
the validation of the ingested sstable on a background goroutine.

This ingestion validation is gated on a new experimental DB option,
`ValidateOnIngest`, which is initially off by default.

See cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Oct 11, 2021
Currently, when Pebble ingests an sstable, it validates the block
checksums necessary for it to retrieve the smallest and largest keys in
the table.

Wire up the block checksum validation codepath from cockroachdb#1240, scheduling
the validation of the ingested sstable on a background goroutine.

This ingestion validation is gated on a new experimental DB option,
`ValidateOnIngest`, which is initially off by default.

See cockroachdb#1203.
nicktrav added a commit to nicktrav/pebble that referenced this issue Oct 12, 2021
Currently, when Pebble ingests an sstable, it validates the block
checksums necessary for it to retrieve the smallest and largest keys in
the table.

Wire up the block checksum validation codepath from cockroachdb#1240, scheduling
the validation of the ingested sstable on a background goroutine.

This ingestion validation is gated on a new experimental DB option,
`ValidateOnIngest`, which is initially off by default.

See cockroachdb#1203.
nicktrav added a commit that referenced this issue Oct 12, 2021
Currently, when Pebble ingests an sstable, it validates the block
checksums necessary for it to retrieve the smallest and largest keys in
the table.

Wire up the block checksum validation codepath from #1240, scheduling
the validation of the ingested sstable on a background goroutine.

This ingestion validation is gated on a new experimental DB option,
`ValidateOnIngest`, which is initially off by default.

See #1203.
nicktrav added a commit to nicktrav/cockroach that referenced this issue Oct 12, 2021
Bump Pebble to 0ee4290cea22. Includes the following commits.

```
0ee4290c db: schedule sstable validation on ingestion
543bc6fa db: use locked suffix for function names; simplify boolean expression
5393ca16 internal/rate: skip two flaky tests on darwin
7cd88488 sstable: print additional fields in verbose block layout
4f6402b1 *: enable staticcheck
```

Enable sstable validation on ingestion via the experimental Pebble
option, `ValidateOnIngest`.

See cockroachdb/pebble#1203.

Release note: None.
nicktrav added a commit to nicktrav/cockroach that referenced this issue Oct 27, 2021
Enable sstable validation on ingestion via the experimental Pebble
option, `ValidateOnIngest`.

See cockroachdb/pebble#1203.

Release note: None.
@nicktrav nicktrav removed their assignment Nov 24, 2021
@nicktrav
Copy link
Contributor

nicktrav commented Oct 3, 2022

The question we had internally was whether it was still worth keeping the code around even though this is implemented above Pebble, to some extent.

Chatted with @sumeerbhola about this and the opinion was that we probably do want to keep it around for now, as it may still prove useful.

There's some more context in this thread (internal).

Also from Sumeer:

In addition to the validation mentioned in that thread, EvalAddSSTable has various conditional code paths that involve reading parts or all of the sstable, or rewriting it. Which further makes the case that Pebble is doing something that doesn't seem useful in the end-to-end context of CockroachDB.

Going to backlog for now.

@nicktrav nicktrav removed their assignment Oct 3, 2022
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

3 participants