-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acceptance: TestDockerCLI_test_demo_multitenant shutdown synchronization #110645
Comments
Is it true that this test always creates in-memory Engines? In which case it looks like there's there's a race condition in file reference counting within Pebble, allowing the file 000410.sst to be deleted while the table stats collector still has a reference on a readState that contains the file. This is still a little difficult to believe. We'd expect much more widespread failures. Maybe the race is somehow specific to the initial version loaded from the manifest? In this instance, the table stats collector is still performing its initial scans of the database tables. The |
Double check the file reference counts before attempting to find/create a table cache node for a file. Once a file's reference count falls to zero, the file becomes obsolete and may be deleted at any moment. Today if we have a race, break this invariant and attempt to load a file with a nonpositive reference count, it's relatively unlikely that it manifests as an error. Typically tables remain open in the table cache, allowing the table cache to serve the request even if the file is no longer linked into the data directory. Additionally, even if it's not in the table cache presently, deletion of obsolete files may be delayed due to deletion pacing, hiding the race. This commit preemptively asserts on the file reference counts. I opted for not restricting this invariant check to invariants builds because it's cheap relative to a table cache lookup, and it's a particularly tricky form of corruption to debug otherwise. Informs cockroachdb/cockroach#110645.
I missed this initially: I'm pretty sure this test is opening the same store multiple times. In these logs we see that a high-numbered compaction job 362 is running at the same time as low-numbered job 3 compaction and most of the constituent files are the same. |
Previously MemFS's Lock method did not implement mutual exclusion under the assumption that databases are given separate processes which do not share memory. This commit adapts Lock to enforce mutual exclusion, preventing accidentally using same FS concurrently from more than one Pebble instance. It's believed that cockroachdb/cockroach#110645 is the result of Cockroach acceptance tests opening a MemFS store twice, with one Pebble instance deleting files from beneath another Pebble instance.
Previously MemFS's Lock method did not implement mutual exclusion under the assumption that databases are given separate processes which do not share memory. This commit adapts Lock to enforce mutual exclusion, preventing accidentally using same FS concurrently from more than one Pebble instance. It's believed that cockroachdb/cockroach#110645 is the result of Cockroach acceptance tests opening a MemFS store twice, with one Pebble instance deleting files from beneath another Pebble instance.
Double check the file reference counts before attempting to find/create a table cache node for a file. Once a file's reference count falls to zero, the file becomes obsolete and may be deleted at any moment. Today if we have a race, break this invariant and attempt to load a file with a nonpositive reference count, it's relatively unlikely that it manifests as an error. Typically tables remain open in the table cache, allowing the table cache to serve the request even if the file is no longer linked into the data directory. Additionally, even if it's not in the table cache presently, deletion of obsolete files may be delayed due to deletion pacing, hiding the race. This commit preemptively asserts on the file reference counts. I opted for not restricting this invariant check to invariants builds because it's cheap relative to a table cache lookup, and it's a particularly tricky form of corruption to debug otherwise. Informs cockroachdb/cockroach#110645.
Previously MemFS's Lock method did not implement mutual exclusion under the assumption that databases are given separate processes which do not share memory. This commit adapts Lock to enforce mutual exclusion, preventing accidentally using same FS concurrently from more than one Pebble instance. It's believed that cockroachdb/cockroach#110645 is the result of Cockroach acceptance tests opening a MemFS store twice, with one Pebble instance deleting files from beneath another Pebble instance.
Previously MemFS's Lock method did not implement mutual exclusion under the assumption that databases are given separate processes which do not share memory. This commit adapts Lock to enforce mutual exclusion, preventing accidentally using same FS concurrently from more than one Pebble instance. It's believed that cockroachdb/cockroach#110645 is the result of Cockroach acceptance tests opening a MemFS store twice, with one Pebble instance deleting files from beneath another Pebble instance.
``` 6d6570bf vfs: enforce mutual exclusion in MemFS.Lock 336c9979 metamorphic: vary sstable compression algorithm c91e8796 db: double check file reference counts when loading file 22fbb69a scripts: generate code coverage with invariants build tag 529d256a db: use invalidating iterators with low probability under invariants bb9d6ab6 internal/invalidating: propagate lazy value fetching 9dbff72c internal/invalidating: trash trailers with 0xff 2689f0d2 internal/invalidating: move invalidating iter 96978427 db: add explicit levelIter pointer 20e07e1f db: avoid type conversion during iterator read sampling ``` Informs cockroachdb#110645. Epic: none Release note: none
110651: kvnemesis: add support for shared lock r=miraradeva a=arulajmani This patch teaches KVNemesis to generate batches that acquire shared locks, using Get, Scan, and ReverseScan requests. Closes #100173 Release note: None 110696: release: do not bump version if already bumped r=celiala a=rail Previously, if the version was bumped manually, the automation would try to bump, write the same content to the file and fail committing changes. This PR skips version bump in case it was already bumped. Additionally, some of the commit messages were adjusted. Fixes: RE-468 Release note: None 110723: go.mod: bump Pebble to 6d6570bf1e25 r=RaduBerinde a=jbowens ``` 6d6570bf vfs: enforce mutual exclusion in MemFS.Lock 336c9979 metamorphic: vary sstable compression algorithm c91e8796 db: double check file reference counts when loading file 22fbb69a scripts: generate code coverage with invariants build tag 529d256a db: use invalidating iterators with low probability under invariants bb9d6ab6 internal/invalidating: propagate lazy value fetching 9dbff72c internal/invalidating: trash trailers with 0xff 2689f0d2 internal/invalidating: move invalidating iter 96978427 db: add explicit levelIter pointer 20e07e1f db: avoid type conversion during iterator read sampling ``` Informs #110645. Epic: none Release note: none Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Rail Aliiev <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
Closing as a duplicate of #110748 |
Previously MemFS's Lock method did not implement mutual exclusion under the assumption that databases are given separate processes which do not share memory. This commit adapts Lock to enforce mutual exclusion, preventing accidentally using same FS concurrently from more than one Pebble instance. It's believed that cockroachdb/cockroach#110645 is the result of Cockroach acceptance tests opening a MemFS store twice, with one Pebble instance deleting files from beneath another Pebble instance.
Failed here: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Ci_Tests_Acceptance/11765319?buildTab=artifacts#
The PR (#110595) is changing the text of an error message in SQL, so it is not related.
Jira issue: CRDB-31541
The text was updated successfully, but these errors were encountered: