Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logstore: sideloaded storage dirs not synced #114411

Closed
pav-kv opened this issue Nov 14, 2023 · 6 comments
Closed

logstore: sideloaded storage dirs not synced #114411

pav-kv opened this issue Nov 14, 2023 · 6 comments
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs

Comments

@pav-kv
Copy link
Collaborator

pav-kv commented Nov 14, 2023

The filesystem metadata for the sideloaded storage (sub-)directories is not synced when files or directories are added or removed. The sideloaded files are then used as raft log entries, and are required to be accessible. Safety properties of raft depend on the durability of these files.

However, in presence of power outages, or OS crashes, not syncing the filesystem can lead to losing these files.

Example "data loss" I observed in a minimal unit-test using the strict vfs.MemFS in-memory testing file system.

Before crash
    5801    000002.log
      37    000005.log
     115    000006.log
    2946    000008.sst
      16    CURRENT
       0    LOCK
     350    MANIFEST-000001
    2554    OPTIONS-000003
      10    STORAGE_MIN_VERSION
            auxiliary/
              sideloading/       * this entire substructure
                r0XXXX/          * is not synced up to auxiliary directory
                  r68/           * so it can be lost on crash
    1012            i23.t6       * and it does so below
              sstsnapshot/
       0    marker.format-version.000015.016
       0    marker.manifest.000001.MANIFEST-000001
After crash
    5730    000002.log
      16    CURRENT
       0    LOCK
      46    MANIFEST-000001
    2554    OPTIONS-000003
      10    STORAGE_MIN_VERSION
            auxiliary/
       0    marker.format-version.000015.016
       0    marker.manifest.000001.MANIFEST-000001

Jira issue: CRDB-33498

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Nov 14, 2023
@pav-kv pav-kv self-assigned this Nov 14, 2023
Copy link

blathers-crl bot commented Nov 14, 2023

cc @cockroachdb/replication

Copy link

blathers-crl bot commented Nov 14, 2023

Hi @shralex, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@shralex shralex added branch-master Failures and bugs on the master branch. branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs labels Nov 14, 2023
@erikgrinaker
Copy link
Contributor

Should this really be a blocker? This is a pre-existing bug that's been there forever afaik.

craig bot pushed a commit that referenced this issue Dec 5, 2023
115447: roachtest: indicate when version is too old in `UploadWorkload` r=DarrylWong,srosenberg a=renatolabs

In #115061, a `UploadWorkload` API was added to allow tests to upload "arbitrary" versions of the `workload` binary, especially useful in upgrade testing.

However, we only have `workload` binaries for v22.2+. In this commit, we add a `bool` return value to `UplodWorkload` indicating whether a binary was actually uploaded. Callers are expected to check this value before attempting to use a workload binary.

The `acceptance/version-upgrade` test is updated in this commit in the way described.

Epic: none

Release note: None

115462: storage: add cluster setting for ingest splits r=RaduBerinde a=itsbilal

Previously, the ingest-time splitting feature was only tunable through a code change. This change adds a cluster setting to allow this feature to be turned on dynamically.

Fixes #115430.

Epic: none

Release note: None

115593: ui: change some references to tenant to virtual cluster r=dhartunian a=stevendanna

This changes the "Tenant" selector on the metrics pages to use the new "Virtual Cluster" terminology.

Fixes #114170

Release note: None

<img width="1037" alt="Screenshot 2023-12-05 at 11 05 10" src="https://github.com/cockroachdb/cockroach/assets/852371/54f9ce9c-fc0b-4457-8214-de22c45b32bc">
<img width="1086" alt="Screenshot 2023-12-05 at 11 05 05" src="https://github.com/cockroachdb/cockroach/assets/852371/d0cb2f32-20a6-4522-8727-3c19ce0a91ab">


115595: server: support strict MemFS in StickyVFSRegistry r=jbowens a=pavelkalinnikov

The strict `MemFS` can be used for emulating crashes, and testing for data durability loss that they may cause if fsync is not used correctly.

Touches #113135, #114411
Epic: none
Release note: none

Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot pushed a commit that referenced this issue Dec 6, 2023
114616: logstore: sync sideloaded storage directories r=erikgrinaker a=pavelkalinnikov

This PR ensures that the hierarchy of directories/files created by the sideloaded log storage is properly synced.

Previously, only the "leaf" files in this hierarchy were fsync-ed. Even though this guarantees the files content and metadata is synced, this still does not guarantee that the references to these files are durable. For example, Linux man page for `fsync` [^1] says:

```
Calling fsync() does not necessarily ensure that the entry in the
directory containing the file has also reached disk.  For that an
explicit fsync() on a file descriptor for the directory is also
needed.
```

It means that these files can be lost after a system crash of power off. This leads to issues because:

1. Pebble WAL syncs are not atomic with the sideloaded files syncs. It is thus possible that raft log metadata "references" a sideloaded file and gets synced, but the file is not yet. A power off/on at this point leads to an internal inconsistency, and can result in a crash loop when raft will try to load these entries to apply and/or send to other replicas.

2. The durability of entry files is used as a pre-condition to sending raft messages that trigger committing these entries. A coordinated power off/on on a majority of replicas can thus lead to losing committed entries and unrecoverable loss-of-quorum.

This PR fixes the above issues, by syncing parents of all the directories and files that the sideloaded storage creates.

[^1]: https://man7.org/linux/man-pages/man2/fsync.2.html

Part of #114411

Epic: none

Release note (bug fix): this commit fixes a durability bug in raft log storage, caused by incorrect syncing of filesystem metadata. It was possible to lose writes of a particular kind (`AddSSTable`) that is e.g. used by `RESTORE`. This loss was possible only under power-off or OS crash conditions. As a result, CRDB could enter a crash loop on restart. In the worst case of a correlated power-off/crash across multiple nodes this could lead to loss of quorum or data loss.

115689: ui: add warning to network page when unavailable r=maryliag a=stevendanna

The network page doesn't work inside a virtual cluster yet.  Rather than just presenting a spinner, here we add a warning to the page. Additionally, it simplifies the text of the warning
presented on the Advanced Debug page.

Informs #115022

<img width="1239" alt="Screenshot 2023-12-06 at 16 09 16" src="https://github.com/cockroachdb/cockroach/assets/852371/43778020-c892-4e96-b1c4-ec58b20309ae">

<img width="1236" alt="Screenshot 2023-12-06 at 16 09 33" src="https://github.com/cockroachdb/cockroach/assets/852371/30643fbb-ec68-4973-b35f-60a9a874e6a5">




Release note: None

115705: kv,admission: only log empty admission warning for non-release builds r=aadityasondhi a=aadityasondhi

This error message, while useful for debugging, spams the logs with a stack trace which can be distracting when reading the logs.

Since AC defaults to skip when there is an empty header, this is not a concern, unless we see real-world performance impact (which we have not).

This patch removes it from release builds while we figure out all the sources for missing headers.

Informs #112680

Release note: None

Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>
@pav-kv
Copy link
Collaborator Author

pav-kv commented Dec 8, 2023

This is done and backported to 23.2. Still need to backport to 23.1 and 22.2, but leaving to bake in master and 23.2 first. So leaving this open for now.

@erikgrinaker
Copy link
Contributor

We usually don't leave the issue open once it's fixed on master and the current release branch, but I'll leave that to you.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Dec 8, 2023

Ah ok, thanks for the clarification. Closing this, and will backport asynchronously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
Projects
None yet
Development

No branches or pull requests

3 participants