-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvnemesis: committed reverse scan non-atomic timestamps [range key omission] #113973
Comments
In the failure, we see a reverse scan return two keys which should have been masked by two range tombstones. The range tombstone at the lower timestamp ( |
Given the involvement of range tombstones, it's possible that this is related to #112221. |
Given that we've so far only seen this failure and the MVCC range key stats failure in #112221 (comment) on There are some recent Pebble changes that could conceivably be related: |
The 8c0136f 2023-11-06: go.mod: bump Pebble to a0b01b62e8f9 Here's the list of changes in Pebble Building |
As @nvanbenschoten points out, the scan returns two versions that should have been masked by range keys. The scan returns the following versions, with their visible timestamp ranges in brackets:
This result is incorrect, because the scan must have happened at a timestamp above 1699383543.103706837,0 to see the latest version, and at that timestamp the versions
The reverse scan code path unconditionally enables Pebble range key masking at the read timestamp, because we never emit tombstones via cockroach/pkg/kv/kvserver/batcheval/cmd_reverse_scan.go Lines 42 to 58 in 2487571
Lines 4575 to 4581 in 3b6e770
Lines 1282 to 1286 in 3b6e770
This leads me to believe this is a bug in Pebble, since these points should have been masked by the iterator. If the points were emitted, they should have been hidden by the This is consistent with the MVCC stats failures, which always fail due to omitted range keys. I'll add that we never see replica inconsistencies, only MVCC stats discrepancies, so the bug must be deterministic across replicas (i.e. not likely a compaction problem). I'm currently attempting to bisect across Pebble bumps on |
Results:
Pebble changes in this interval:
I guess cockroachdb/pebble@babd592d seems most likely, I'll try to cherry-pick it onto |
@jbowens We'll need to backport this. We'd also like to understand why this began failing, and why this change fixed it. Is it relevant for previous versions? |
Thanks for investigating and bisecting @erikgrinaker. Looking now. I want to understand what's happening before we backport cockroachdb/pebble@babd592d, because that change should only change iterator behavior in the presence of I/O errors. It's possible cockroachdb/pebble@babd592d only began masking the issue. It is relevant for previous versions as well. |
I suppose it's possible that there are metamorphic test parameters that provoke it too. |
We're thinking this may have something to do with |
The new excise operation and eventually file-only snapshot (EFOS) features in Pebble are experimental. Within the 23.2 release, these features must only be enabled within the context of the disaggregated storage techncial preview. These features are still experimental and unstable. Remove the metamorphism of these settings in tests in order to stabilize the 23.2 release branch. As we appraoch release, we want our test failures to be high signal, and failures dependent on these settings should not block the release. On the master branch these settings will be enabled unconditionally soon, and we'll get plenty of test coverage before their general availability in 24.1. Epic: none Informs cockroachdb#112221. Informs cockroachdb#113973. Informs cockroachdb#114056. Release note: none
The new excise operation and eventually file-only snapshot (EFOS) features in Pebble are experimental. Within the 23.2 release, these features must only be enabled within the context of the disaggregated storage techncial preview. These features are still experimental and unstable. Remove the metamorphism of these settings in tests in order to stabilize the 23.2 release branch. As we appraoch release, we want our test failures to be high signal, and failures dependent on these settings should not block the release. On the master branch these settings will be enabled unconditionally soon, and we'll get plenty of test coverage before their general availability in 24.1. Epic: none Informs cockroachdb#112221. Informs cockroachdb#113973. Informs cockroachdb#114056. Release note: none
Unsurprisingly, this fails on every Pebble bump after 65bb6cc but not before it. This commit added We're disabling this metamorphism on |
Nice, thanks @erikgrinaker. I'm going to remove the release-blocker label since it's only reproducible under these experimental cluster settings. We'll continue to investigate. |
I dug into this with the assistance of cockroachdb/pebble#3044 . The underlying issue is that Consider an sstable (
Let's say 00003 and 00004 are a level above 00002. A levelIter produced out of those will see the rangedels a-b and d-f, so c should not be deleted. However, if you did a The only fuzzy part of this explanation is I'm not sure if all callers already do additional sanity-checking on the return values of Seek calls to ensure that it actually obeyed the contract of a Seek, but in places where they don't (eg. getIter for sure, probably others), this can be a real bug |
Previously if filteringIter's FilterFunc mutated the passed-in span to no longer be a valid return value for a SeekLT or SeekGE call, we would still return that span even though it could be >= the seek key (for SeekLT), or less than it (for SeekGE). This change updates filteringIter to guard for this case before returning from a seek call. Found with cockroachdb#3044. Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
Previously if filteringIter's FilterFunc mutated the passed-in span to no longer be a valid return value for a SeekLT or SeekGE call, we would still return that span even though it could be >= the seek key (for SeekLT), or less than it (for SeekGE). This change updates filteringIter to guard for this case before returning from a seek call. Found with cockroachdb#3044. Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
Fix for the above mentioned bug is in cockroachdb/pebble#3046 . |
Previously if filteringIter's FilterFunc mutated the passed-in span to no longer be a valid return value for a SeekLT or SeekGE call, we would still return that span even though it could be >= the seek key (for SeekLT), or less than it (for SeekGE). This change updates filteringIter to guard for this case before returning from a seek call. Found with #3044. Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
Previously if filteringIter's FilterFunc mutated the passed-in span to no longer be a valid return value for a SeekLT or SeekGE call, we would still return that span even though it could be >= the seek key (for SeekLT), or less than it (for SeekGE). This change updates filteringIter to guard for this case before returning from a seek call. Found with cockroachdb#3044. Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
Previously if filteringIter's FilterFunc mutated the passed-in span to no longer be a valid return value for a SeekLT or SeekGE call, we would still return that span even though it could be >= the seek key (for SeekLT), or less than it (for SeekGE). This change updates filteringIter to guard for this case before returning from a seek call. Found with cockroachdb#3044. Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
Previously if filteringIter's FilterFunc mutated the passed-in span to no longer be a valid return value for a SeekLT or SeekGE call, we would still return that span even though it could be >= the seek key (for SeekLT), or less than it (for SeekGE). This change updates filteringIter to guard for this case before returning from a seek call. Found with #3044. Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
``` c4f530dd internal/keyspan: obey Seek invariants in filteringIter ``` Fixes cockroachdb#113973, fixes cockroachdb#114056. Release note: None. Release justification: Low-risk bugfix for an experimental feature in this release. Epic: none
``` 158dfe17 internal/keyspan: fix defragmenting iterator error handling ee417657 internal/keyspan: consolidate datadriven iterator ops 869004e3 internal/dsl: new package c4f530dd internal/keyspan: obey Seek invariants in filteringIter ``` Fixes cockroachdb#113973, fixes cockroachdb#114056. Release note: None. Release justification: Low-risk bugfix for an experimental feature in this release, as well as general error-handling bugfix. Epic: none
Seen while running kvnemesis on
release-23.2
at c9ed4dc, with akvnemesis
patch that disables GuaranteedDurability, ForShare, Snapshot, and ReadCommitted (which have known problems):While running:
Jira issue: CRDB-33279
The text was updated successfully, but these errors were encountered: