-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: normalize MVCC version timestamps during key comparison #77520
storage: normalize MVCC version timestamps during key comparison #77520
Conversation
f1c9e17
to
fef5999
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need to fix this bug for v22.1?
Code looks fine, but I haven't looked at the tests yet (may not get to them before vacation).
Reviewed 1 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
No, I'm not planning to get this in for v22.1. I'm waiting for v22.2 with the rest of #77342. |
fef5999
to
55ef09a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r1, 2 of 7 files at r4, 5 of 5 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):
// upper bound. Iterators constrained to the global keyspace may have found // a key on the boundary and may now be scanning before the key, using the // boundary as an exclusive upper bound.
I don't quite understand this last sentence.
I suppose this comment is saying that an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U], which is why both constrainedTo{Local,Global} need to be handled.
But we weren't allowing constrainedToGlobal before and did not have test failures -- what changed?
pkg/storage/pebble.go, line 198 at r5 (raw file):
const withWall = mvccEncodedTimeSentinelLen + mvccEncodedTimeWallLen const withLogical = withWall + mvccEncodedTimeLogicalLen const withSynthetic = withLogical + mvccEncodedTimeSyntheticLen
How about a comment like
// In general, the version could also be a non-timestamp version, but we know that engineKeyVersionLockTableLen+mvccEncodedTimeSentinelLen is a different constant than the above, so there is no danger here of stripping parts from a non-timestamp version.
We should do something about unifying the two const pools that are mvccEncoded*
and engineKey*
. A TODO is fine.
pkg/storage/pebble_mvcc_scanner.go, line 981 at r6 (raw file):
// If the seek timestamp is empty, we've already seen all versions of this // key, so seek to the next key. Seeking to version zero of the current key // would be incorrect, as version zero is stored before all other versions.
Was this an existing bug caused when we do the following
prevTS := p.ts
if metaTS.LessEq(p.ts) {
prevTS = metaTS.Prev()
}
and then use prevTS for seeking?
55ef09a
to
651c533
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):
Previously, sumeerbhola wrote…
I don't quite understand this last sentence.
I suppose this comment is saying that an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U], which is why both constrainedTo{Local,Global} need to be handled.
But we weren't allowing constrainedToGlobal before and did not have test failures -- what changed?
This is needed when maxItersBeforeSeek = 0
. Imagine an iterator with bounds [L ,U) that is scanning backwards and finds a key at L. Before this change, it would Prev()
once before reaching its boundary and becoming invalid. After this change, when maxItersBeforeSeek = 0
, it will SeekLT(L)
and become invalid. This wasn't an issue before because intentInterleavingIter
does not check constraints on Prev
, but does on SeekLT
.
pkg/storage/pebble.go, line 198 at r5 (raw file):
Previously, sumeerbhola wrote…
How about a comment like
// In general, the version could also be a non-timestamp version, but we know that engineKeyVersionLockTableLen+mvccEncodedTimeSentinelLen is a different constant than the above, so there is no danger here of stripping parts from a non-timestamp version.We should do something about unifying the two const pools that are
mvccEncoded*
andengineKey*
. A TODO is fine.
Done.
pkg/storage/pebble_mvcc_scanner.go, line 981 at r6 (raw file):
Previously, sumeerbhola wrote…
Was this an existing bug caused when we do the following
prevTS := p.ts if metaTS.LessEq(p.ts) { prevTS = metaTS.Prev() }
and then use prevTS for seeking?
Yes, exactly. It was being hidden in tests that write values at Timestamp{WallTime: 0, Logical: 1}
(something we never see in practice) because of the itersBeforeSeek
. But when maxItersBeforeSeek = 0
, we hit the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r8, 1 of 1 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
-- commits, line 4 at r9:
I assume this commit will be squashed, yes?
pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is needed when
maxItersBeforeSeek = 0
. Imagine an iterator with bounds [L ,U) that is scanning backwards and finds a key at L. Before this change, it wouldPrev()
once before reaching its boundary and becoming invalid. After this change, whenmaxItersBeforeSeek = 0
, it willSeekLT(L)
and become invalid. This wasn't an issue before becauseintentInterleavingIter
does not check constraints onPrev
, but does onSeekLT
.
I see now -- thanks. Can you incorporate something like the following as a reminder to this code comment:
// NB: an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U]. For local keyspace iters, U can be LocalMax and for global keyspace iters L can be LocalMax.
pkg/storage/pebble_mvcc_scanner.go, line 981 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yes, exactly. It was being hidden in tests that write values at
Timestamp{WallTime: 0, Logical: 1}
(something we never see in practice) because of theitersBeforeSeek
. But whenmaxItersBeforeSeek = 0
, we hit the bug.
nice catch!
This commit temporarily disables maxItersBeforeSeek by setting its value to 0. This reveals a bug that was hidden by `maxItersBeforeSeek` and is demonstrated by the change to `TestMVCCHistories/uncertainty_interval_with_local_uncertainty_limit`. The bug is due to incorrect key ordering of MVCC keys with synthetic timestamps. With the current encoding of MVCC key versions and the custom key comparator `EngineKeyCompare`, the inclusion of a synthetic bit in a key's verion timestamp causes the key's encoding to sort _before_ the same key/timestamp without a synthetic bit. This is because `EngineKeyCompare` considers the timestamp to be larger and it sorts versions in decreasing timestamp order. As a result, a seek to a version timestamp without a synthetic bit will seek past and fail to observe a value with that same timestamp but with a synthetic bit. In other words, a seek to timestamp `10,20` will fail to see a version stored at `10,20?`. This is unintended behavior, as the synthetic bit should not affect key ordering or key equality (see `MVCCKey.Equal`). This change will be mostly reverted in a later commit when the bug is fixed.
This commit fixes the bug revealed in the previous commit and sets the stage for generalized MVCC version normalization during key comparison, which will be needed for cockroachdb#77342. To do so, the commit adds a normalization step to EngineKeyCompare, the custom `Compare` function we provide to Pebble. This normalization pass currently strips the synthetic bit from version timestamps, which fixes the bug revealed in the previous commit. The normalization pass also strips zero-valued logical components, which are typically not encoded into MVCCKeys today, but may be in the future (for instance, see cockroachdb/pebble#1314). In cockroachdb#77342, we can then extend this to strip the encoded logical timestamp, if present. In addition to updating the existing custom key comparator function passed to Pebble, the commit also introduces a new custom key equality function. This new function, called EngineKeyEqual, is provided to Pebble as its `Equal` function, replacing the default key equality function of `bytes.Equal`. EngineKeyEqual uses the same version normalization rules to strip portions of the key's version that should not affect ordering. The relationship between the different comparators is explored in a new property based unit test called `TestMVCCKeyCompareRandom`. The change allows us to say that for any two `MVCCKeys` `a` and `b`, the following identities hold: ``` a.Compare(b) = EngineKeyCompare(encode(a), encode(b)) a.Equal(b) = EngineKeyEqual(encode(a), encode(b)) (a.Compare(b) == 0) = a.Equal(b) (a.Compare(b) < 0) = a.Less(b) (a.Compare(b) > 0) = b.Less(a) ``` Care was taken to minimize the cost of this version normalization. With EngineKeyCompare, the normalization amounts to four new branches that should all be easy for a branch predictor to learn. With EngineKeyEqual, there is more of a concern that this change will regress performance because we switch from a direct call to `bytes.Equal` to a custom comparator. To minimize this cost, the change adds a fast-path to quickly defer to `bytes.Equal` when version normalization is not needed. Benchmarks show that with this fast-path and with an expected distribution of keys, the custom key equality function is about 2.5ns more expensive per call. This seems reasonable. ``` name time/op MVCCKeyCompare-10 12.2ns ± 1% MVCCKeyEqual-10 7.10ns ± 6% BytesEqual-10 4.72ns ± 2% ``` Release note: None. Release justification: None. Not intended for v22.1.
To prevent maxItersBeforeSeek from hidding inconsistencies between seeking and iterating in the future, the commit makes the value metamorphic, with a range of [0,3). Going forward, unit tests will use different values for this heuristic. Release justification: None. Not intended for v22.1.
651c533
to
39fbf41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
Previously, sumeerbhola wrote…
I assume this commit will be squashed, yes?
I wasn't planning on it. Do you think it needs to be squashed away? CI is green on this commit and the value of maxItersBeforeSeek
isn't reset to 10 until two commits later. I'd like to avoid squashing this PR all down to one commit, as there are too many semi-related moving parts for that.
pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):
Previously, sumeerbhola wrote…
I see now -- thanks. Can you incorporate something like the following as a reminder to this code comment:
// NB: an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U]. For local keyspace iters, U can be LocalMax and for global keyspace iters L can be LocalMax.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @sumeerbhola)
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I wasn't planning on it. Do you think it needs to be squashed away? CI is green on this commit and the value of
maxItersBeforeSeek
isn't reset to 10 until two commits later. I'd like to avoid squashing this PR all down to one commit, as there are too many semi-related moving parts for that.
Multiple commits is fine. My understanding was that a cockroachdb binary built after this first commit would be deliberately buggy. Did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
Previously, sumeerbhola wrote…
Multiple commits is fine. My understanding was that a cockroachdb binary built after this first commit would be deliberately buggy. Did I miss something?
The first commit doesn't introduce any new bugs, it just makes an existing bug slightly easier to hit. Either way, it's still possible to perform an AOST scan and fail to miss a key-value with a synthetic timestamp. Since the bug isn't new and it's then fixed immediately after, I think it's ok.
🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set. |
Build succeeded: |
Related to #77342.
This commit fixes the bug revealed in the previous commit and sets the stage for generalized MVCC version normalization during key comparison, which will be needed for #77342.
To do so, the commit adds a normalization step to EngineKeyCompare, the custom
Compare
function we provide to Pebble. This normalization pass currently strips the synthetic bit from version timestamps, which fixes the bug revealed in the previous commit. The normalization pass also strips zero-valued logical components, which are typically not encoded into MVCCKeys today, but may be in the future (for instance, see cockroachdb/pebble#1314). In #77342, we can then extend this to strip the encoded logical timestamp, if present.In addition to updating the existing custom key comparator function passed to Pebble, the commit also introduces a new custom key equality function. This new function, called EngineKeyEqual, is provided to Pebble as its
Equal
function, replacing the default key equality function ofbytes.Equal
. EngineKeyEqual uses the same version normalization rules to strip portions of the key's version that should not affect ordering.The relationship between the different comparators is explored in a new property based unit test called
TestMVCCKeyCompareRandom
. The change allows us to say that for any twoMVCCKeys
a
andb
, the following identities hold:Care was taken to minimize the cost of this version normalization. With EngineKeyCompare, the normalization amounts to four new branches that should all be easy for a branch predictor to learn.
With EngineKeyEqual, there is more of a concern that this change will regress performance because we switch from a direct call to
bytes.Equal
to a custom comparator. To minimize this cost, the change adds a fast-path to quickly defer tobytes.Equal
when version normalization is not needed. Benchmarks show that with this fast-path and with an expected distribution of keys, the custom key equality function is about 2.5ns more expensive per call. This seems reasonable.Release note: None.
Release justification: None. Not intended for v22.1.