-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logstore: sideloaded storage dirs not synced #114411
Comments
cc @cockroachdb/replication |
Hi @shralex, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Should this really be a blocker? This is a pre-existing bug that's been there forever afaik. |
115447: roachtest: indicate when version is too old in `UploadWorkload` r=DarrylWong,srosenberg a=renatolabs In #115061, a `UploadWorkload` API was added to allow tests to upload "arbitrary" versions of the `workload` binary, especially useful in upgrade testing. However, we only have `workload` binaries for v22.2+. In this commit, we add a `bool` return value to `UplodWorkload` indicating whether a binary was actually uploaded. Callers are expected to check this value before attempting to use a workload binary. The `acceptance/version-upgrade` test is updated in this commit in the way described. Epic: none Release note: None 115462: storage: add cluster setting for ingest splits r=RaduBerinde a=itsbilal Previously, the ingest-time splitting feature was only tunable through a code change. This change adds a cluster setting to allow this feature to be turned on dynamically. Fixes #115430. Epic: none Release note: None 115593: ui: change some references to tenant to virtual cluster r=dhartunian a=stevendanna This changes the "Tenant" selector on the metrics pages to use the new "Virtual Cluster" terminology. Fixes #114170 Release note: None <img width="1037" alt="Screenshot 2023-12-05 at 11 05 10" src="https://github.com/cockroachdb/cockroach/assets/852371/54f9ce9c-fc0b-4457-8214-de22c45b32bc"> <img width="1086" alt="Screenshot 2023-12-05 at 11 05 05" src="https://github.com/cockroachdb/cockroach/assets/852371/d0cb2f32-20a6-4522-8727-3c19ce0a91ab"> 115595: server: support strict MemFS in StickyVFSRegistry r=jbowens a=pavelkalinnikov The strict `MemFS` can be used for emulating crashes, and testing for data durability loss that they may cause if fsync is not used correctly. Touches #113135, #114411 Epic: none Release note: none Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>
114616: logstore: sync sideloaded storage directories r=erikgrinaker a=pavelkalinnikov This PR ensures that the hierarchy of directories/files created by the sideloaded log storage is properly synced. Previously, only the "leaf" files in this hierarchy were fsync-ed. Even though this guarantees the files content and metadata is synced, this still does not guarantee that the references to these files are durable. For example, Linux man page for `fsync` [^1] says: ``` Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. ``` It means that these files can be lost after a system crash of power off. This leads to issues because: 1. Pebble WAL syncs are not atomic with the sideloaded files syncs. It is thus possible that raft log metadata "references" a sideloaded file and gets synced, but the file is not yet. A power off/on at this point leads to an internal inconsistency, and can result in a crash loop when raft will try to load these entries to apply and/or send to other replicas. 2. The durability of entry files is used as a pre-condition to sending raft messages that trigger committing these entries. A coordinated power off/on on a majority of replicas can thus lead to losing committed entries and unrecoverable loss-of-quorum. This PR fixes the above issues, by syncing parents of all the directories and files that the sideloaded storage creates. [^1]: https://man7.org/linux/man-pages/man2/fsync.2.html Part of #114411 Epic: none Release note (bug fix): this commit fixes a durability bug in raft log storage, caused by incorrect syncing of filesystem metadata. It was possible to lose writes of a particular kind (`AddSSTable`) that is e.g. used by `RESTORE`. This loss was possible only under power-off or OS crash conditions. As a result, CRDB could enter a crash loop on restart. In the worst case of a correlated power-off/crash across multiple nodes this could lead to loss of quorum or data loss. 115689: ui: add warning to network page when unavailable r=maryliag a=stevendanna The network page doesn't work inside a virtual cluster yet. Rather than just presenting a spinner, here we add a warning to the page. Additionally, it simplifies the text of the warning presented on the Advanced Debug page. Informs #115022 <img width="1239" alt="Screenshot 2023-12-06 at 16 09 16" src="https://github.com/cockroachdb/cockroach/assets/852371/43778020-c892-4e96-b1c4-ec58b20309ae"> <img width="1236" alt="Screenshot 2023-12-06 at 16 09 33" src="https://github.com/cockroachdb/cockroach/assets/852371/30643fbb-ec68-4973-b35f-60a9a874e6a5"> Release note: None 115705: kv,admission: only log empty admission warning for non-release builds r=aadityasondhi a=aadityasondhi This error message, while useful for debugging, spams the logs with a stack trace which can be distracting when reading the logs. Since AC defaults to skip when there is an empty header, this is not a concern, unless we see real-world performance impact (which we have not). This patch removes it from release builds while we figure out all the sources for missing headers. Informs #112680 Release note: None Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Aaditya Sondhi <[email protected]>
This is done and backported to |
We usually don't leave the issue open once it's fixed on |
Ah ok, thanks for the clarification. Closing this, and will backport asynchronously. |
The filesystem metadata for the sideloaded storage (sub-)directories is not synced when files or directories are added or removed. The sideloaded files are then used as raft log entries, and are required to be accessible. Safety properties of raft depend on the durability of these files.
However, in presence of power outages, or OS crashes, not syncing the filesystem can lead to losing these files.
Example "data loss" I observed in a minimal unit-test using the strict
vfs.MemFS
in-memory testing file system.Before crash
After crash
Jira issue: CRDB-33498
The text was updated successfully, but these errors were encountered: