-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: inconsistency failed #97337
Comments
@pavelkalinnikov Can you have a look?
|
roachtest.inconsistency failed with artifacts on master @ e9c96e7179e19aae2f8d386f67eb950db8c3354b:
Parameters: |
Interesting
indicates that |
Reproduces on GCE worker. I can see the checkpoint folder, but |
In my experiment: $ ls /mnt/data1/cockroach/auxiliary/checkpoints/r1_at_589
000022.log 000025.sst 000026.log 000027.log 000028.log 000029.log MANIFEST-000023 OPTIONS-000024 auxiliary checkpoint.txt marker.format-version.000001.012 marker.manifest.000001.MANIFEST-000023
$ ./cockroach debug range-descriptors /mnt/data1/cockroach/auxiliary/checkpoints/r1_at_589
ERROR: pebble: database "/mnt/data1/cockroach/auxiliary/checkpoints/r1_at_589" does not exist
Failed running "debug range-descriptors"
$ ./cockroach debug pebble db check /mnt/data1/cockroach/auxiliary/checkpoints/r1_at_589
checked 39111 points and 0 tombstone The Pebble tool ( |
roachtest.inconsistency failed with artifacts on master @ dd2749ae4ab61eed2f99238acb74e8d3c6b4cb1d:
Parameters: |
I think we are hitting one of the new error branches from #97054: cockroach/pkg/storage/pebble.go Line 1071 in d1a1924
Upd: confirmed that this PR introduced the problem, by running the roachtest before and after the commit. |
@RaduBerinde Is the version file required in checkpoints as well? Also, the "database does not exist" error sounds confusing. The directory does exist, but it apparently lacks the required file. Should the error be more specific? If it's a version problem, it should mention the version problem. At first I thought it was this error. It would be nice if the errors weren't exactly the same. |
Thanks for looking into it Pavel! Btw, do you know what's up with this weird output here?
Is this perhaps redaction markers or some such? |
Where did you copy this from? I'm seeing this:
Upd: Ah right, I can see some redacted version of it too in
Maybe you're seeing this |
Yeah, thought so. You can unredact it by wrapping the format args in |
97379: kvserver: unredact checkpoint paths in inconsistency message r=erikgrinaker a=pavelkalinnikov Touches #97337 Release note: none Epic: none Co-authored-by: Pavel Kalinnikov <[email protected]>
Yes, require the min version file to Open a store. If the min version file isn't there but the store exists when we try to open it, we assume it is from a very old CRDB version which didn't write the min version file. I agree the error message should be improved, I'll do that. What do you think is the right thing here wrt checkpoints? Should we include this file in all checkpoints? Or should we treat |
Btw how was the checkpoint here created? From CRDB code ( |
We require the min version filename to open a store. If it doesn't exist, we assume the store doesn't exist which can lead to a confusing message if just the min version file is missing. This commit makes this message more useful. Informs cockroachdb#97337 Release note: none Epic: none
@pavelkalinnikov I posted #97410. As a workaround for now you can copy the |
Yes, this path.
I think your PR #97410 will fix this since you're writing the min version file in the checkpoint path that we use. I you don't mind, you could run this to make sure it fixes the issue (on a GCE worker):
|
Any existing checkpoints though will be "broken" if created before this fix, but the existence of these checkpoints is a bigger problem (inconsistency) in the first place, so we would be aware of their existence in prod. We don't know of any for now. |
@RaduBerinde Is there indication in manifest/etc whether the store is a checkpoint? Maybe we could not require max version file when reading checkpoints. Alternatively, we could add a "skip version check" kind of option into |
@RaduBerinde Is this a problem only when storage is opened in read/write mode, or read-only would be problematic too? Would it be (somewhat) ok to not require this file in the read-only mode? |
Interesting idea.. but I'm not sure. The WAL still needs to be replayed (even if the results aren't flushed), and the WAL replay can call back into Cockroach with certain expectations. I'd guess we would at least be safe from corrupting the store though. |
We now write out the min version file with checkpoints generated through the storage layer. This allows checkpoints to be opened with the cockroach storage layer (not just with the low level pebble tool). Informs cockroachdb#97337 Release note: None Epic: none
Done, thank you! |
roachtest.inconsistency failed with artifacts on master @ 286b3e235171a39b8f9910555affcc7ce310741a:
Parameters: |
roachtest.inconsistency failed with artifacts on master @ 3d054f37c7c87f53cb56fac4e5500f0d1130d09a:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-24642
The text was updated successfully, but these errors were encountered: