Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/storageccl/engineccl: crash testing #96670

Open
jbowens opened this issue Feb 6, 2023 · 0 comments
Open

ccl/storageccl/engineccl: crash testing #96670

jbowens opened this issue Feb 6, 2023 · 0 comments
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. quality-friday A good issue to work on on Quality Friday T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Feb 6, 2023

Pebble performs crash testing using vfs.NewStrictMem, a vfs.FS filesystem that intentionally loses all data that is not synced. This is invaluable within Pebble for finding durability bugs. We don't today perform the same type of testing up in Cockroach to test encryption-at-rest. We should improve our test coverage here.

We could consider running the Pebble metamorphic tests with encryption-at-rest if we converted it to an externally-runnable library.

Adjacent to cockroachdb/pebble#2086.

Jira issue: CRDB-24270

@jbowens jbowens added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Feb 6, 2023
@jbowens jbowens added the quality-friday A good issue to work on on Quality Friday label Jul 20, 2023
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jul 26, 2023
The encryptedFS can return an error after doing part of the work, as
modifying the encryption metadata and the underlying FS is not atomic.
This makes some operations (rename, link, remove) non-idempotent,
which is harmless for the CockroachDB use cases (since they don't
retry on the same files). The test works around these by retrying in
a way that makes them idempotent. Additionally, the test catches
panics caused by FS errors, in order to test a node that crashes
because of a panic caused by a transient error, and is subsequently
restarted.

Epic: none

Informs: cockroachdb#96670

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jul 27, 2023
The encryptedFS can return an error after doing part of the work, as
modifying the encryption metadata and the underlying FS is not atomic.
This makes some operations (rename, link, remove) non-idempotent,
which is harmless for the CockroachDB use cases (since they don't
retry on the same files). The test works around these by retrying in
a way that makes them idempotent. Additionally, the test catches
panics caused by FS errors, in order to test a node that crashes
because of a panic caused by a transient error, and is subsequently
restarted.

Epic: none

Informs: cockroachdb#96670

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jul 28, 2023
The encryptedFS can return an error after doing part of the work, as
modifying the encryption metadata and the underlying FS is not atomic.
This makes some operations (rename, link, remove) non-idempotent,
which is harmless for the CockroachDB use cases (since they don't
retry on the same files). The test works around these by retrying in
a way that makes them idempotent. Additionally, the test catches
panics caused by FS errors, in order to test a node that crashes
because of a panic caused by a transient error, and is subsequently
restarted.

Epic: none

Informs: cockroachdb#96670

Release note: None
craig bot pushed a commit that referenced this issue Aug 1, 2023
107618: engineccl: add randomized error injector test for encryptedFS r=jbowens a=sumeerbhola

The encryptedFS can return an error after doing part of the work, as modifying the encryption metadata and the underlying FS is not atomic. This makes some operations (rename, link, remove) non-idempotent, which is harmless for the CockroachDB use cases (since they don't retry on the same files). The test works around these by retrying in a way that makes them idempotent. Additionally, the test catches panics caused by FS errors, in order to test a node that crashes because of a panic caused by a transient error, and is subsequently restarted.

Epic: none

Informs: #96670

Release note: None

107927: roachtest: ignore some ORM tests r=rafiss a=rafiss

fixes #107698
fixes #107849
fixes #107861
fixes #107869

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@jbowens jbowens moved this to Backlog in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. quality-friday A good issue to work on on Quality Friday T-storage Storage Team
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

1 participant