Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: normalize MVCC version timestamps during key comparison #77520

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Mar 9, 2022

Related to #77342.

This commit fixes the bug revealed in the previous commit and sets the stage for generalized MVCC version normalization during key comparison, which will be needed for #77342.

To do so, the commit adds a normalization step to EngineKeyCompare, the custom Compare function we provide to Pebble. This normalization pass currently strips the synthetic bit from version timestamps, which fixes the bug revealed in the previous commit. The normalization pass also strips zero-valued logical components, which are typically not encoded into MVCCKeys today, but may be in the future (for instance, see cockroachdb/pebble#1314). In #77342, we can then extend this to strip the encoded logical timestamp, if present.

In addition to updating the existing custom key comparator function passed to Pebble, the commit also introduces a new custom key equality function. This new function, called EngineKeyEqual, is provided to Pebble as its Equal function, replacing the default key equality function of bytes.Equal. EngineKeyEqual uses the same version normalization rules to strip portions of the key's version that should not affect ordering.

The relationship between the different comparators is explored in a new property based unit test called TestMVCCKeyCompareRandom. The change allows us to say that for any two MVCCKeys a and b, the following identities hold:

a.Compare(b) = EngineKeyCompare(encode(a), encode(b))

a.Equal(b)   = EngineKeyEqual(encode(a), encode(b))

(a.Compare(b) == 0) = a.Equal(b)

(a.Compare(b) < 0) = a.Less(b)

(a.Compare(b) > 0) = b.Less(a)

Care was taken to minimize the cost of this version normalization. With EngineKeyCompare, the normalization amounts to four new branches that should all be easy for a branch predictor to learn.

With EngineKeyEqual, there is more of a concern that this change will regress performance because we switch from a direct call to bytes.Equal to a custom comparator. To minimize this cost, the change adds a fast-path to quickly defer to bytes.Equal when version normalization is not needed. Benchmarks show that with this fast-path and with an expected distribution of keys, the custom key equality function is about 2.5ns more expensive per call. This seems reasonable.

name               time/op
MVCCKeyCompare-10  12.2ns ± 1%
MVCCKeyEqual-10    7.10ns ± 6%
BytesEqual-10      4.72ns ± 2%

Release note: None.

Release justification: None. Not intended for v22.1.

@nvanbenschoten nvanbenschoten requested a review from a team as a code owner March 9, 2022 05:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to fix this bug for v22.1?

Code looks fine, but I haven't looked at the tests yet (may not get to them before vacation).

Reviewed 1 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

@nvanbenschoten
Copy link
Member Author

Do we not need to fix this bug for v22.1?

No, I'm not planning to get this in for v22.1. I'm waiting for v22.2 with the rest of #77342.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/metamorphicItersBeforeSeek branch from fef5999 to 55ef09a Compare March 23, 2022 03:57
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r1, 2 of 7 files at r4, 5 of 5 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):

		// upper bound. Iterators constrained to the global keyspace may have found
		// a key on the boundary and may now be scanning before the key, using the
		// boundary as an exclusive upper bound.

I don't quite understand this last sentence.
I suppose this comment is saying that an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U], which is why both constrainedTo{Local,Global} need to be handled.
But we weren't allowing constrainedToGlobal before and did not have test failures -- what changed?


pkg/storage/pebble.go, line 198 at r5 (raw file):

	const withWall = mvccEncodedTimeSentinelLen + mvccEncodedTimeWallLen
	const withLogical = withWall + mvccEncodedTimeLogicalLen
	const withSynthetic = withLogical + mvccEncodedTimeSyntheticLen

How about a comment like
// In general, the version could also be a non-timestamp version, but we know that engineKeyVersionLockTableLen+mvccEncodedTimeSentinelLen is a different constant than the above, so there is no danger here of stripping parts from a non-timestamp version.

We should do something about unifying the two const pools that are mvccEncoded* and engineKey*. A TODO is fine.


pkg/storage/pebble_mvcc_scanner.go, line 981 at r6 (raw file):

		// If the seek timestamp is empty, we've already seen all versions of this
		// key, so seek to the next key. Seeking to version zero of the current key
		// would be incorrect, as version zero is stored before all other versions.

Was this an existing bug caused when we do the following

	prevTS := p.ts
	if metaTS.LessEq(p.ts) {
		prevTS = metaTS.Prev()
	}

and then use prevTS for seeking?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/metamorphicItersBeforeSeek branch from 55ef09a to 651c533 Compare March 29, 2022 17:19
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):

Previously, sumeerbhola wrote…

I don't quite understand this last sentence.
I suppose this comment is saying that an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U], which is why both constrainedTo{Local,Global} need to be handled.
But we weren't allowing constrainedToGlobal before and did not have test failures -- what changed?

This is needed when maxItersBeforeSeek = 0. Imagine an iterator with bounds [L ,U) that is scanning backwards and finds a key at L. Before this change, it would Prev() once before reaching its boundary and becoming invalid. After this change, when maxItersBeforeSeek = 0, it will SeekLT(L) and become invalid. This wasn't an issue before because intentInterleavingIter does not check constraints on Prev, but does on SeekLT.


pkg/storage/pebble.go, line 198 at r5 (raw file):

Previously, sumeerbhola wrote…

How about a comment like
// In general, the version could also be a non-timestamp version, but we know that engineKeyVersionLockTableLen+mvccEncodedTimeSentinelLen is a different constant than the above, so there is no danger here of stripping parts from a non-timestamp version.

We should do something about unifying the two const pools that are mvccEncoded* and engineKey*. A TODO is fine.

Done.


pkg/storage/pebble_mvcc_scanner.go, line 981 at r6 (raw file):

Previously, sumeerbhola wrote…

Was this an existing bug caused when we do the following

	prevTS := p.ts
	if metaTS.LessEq(p.ts) {
		prevTS = metaTS.Prev()
	}

and then use prevTS for seeking?

Yes, exactly. It was being hidden in tests that write values at Timestamp{WallTime: 0, Logical: 1} (something we never see in practice) because of the itersBeforeSeek. But when maxItersBeforeSeek = 0, we hit the bug.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 6 files at r8, 1 of 1 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


-- commits, line 4 at r9:
I assume this commit will be squashed, yes?


pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is needed when maxItersBeforeSeek = 0. Imagine an iterator with bounds [L ,U) that is scanning backwards and finds a key at L. Before this change, it would Prev() once before reaching its boundary and becoming invalid. After this change, when maxItersBeforeSeek = 0, it will SeekLT(L) and become invalid. This wasn't an issue before because intentInterleavingIter does not check constraints on Prev, but does on SeekLT.

I see now -- thanks. Can you incorporate something like the following as a reminder to this code comment:
// NB: an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U]. For local keyspace iters, U can be LocalMax and for global keyspace iters L can be LocalMax.


pkg/storage/pebble_mvcc_scanner.go, line 981 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yes, exactly. It was being hidden in tests that write values at Timestamp{WallTime: 0, Logical: 1} (something we never see in practice) because of the itersBeforeSeek. But when maxItersBeforeSeek = 0, we hit the bug.

nice catch!

This commit temporarily disables maxItersBeforeSeek by setting its value to 0.

This reveals a bug that was hidden by `maxItersBeforeSeek` and is demonstrated
by the change to `TestMVCCHistories/uncertainty_interval_with_local_uncertainty_limit`.
The bug is due to incorrect key ordering of MVCC keys with synthetic timestamps.

With the current encoding of MVCC key versions and the custom key comparator
`EngineKeyCompare`, the inclusion of a synthetic bit in a key's verion timestamp
causes the key's encoding to sort _before_ the same key/timestamp without a
synthetic bit. This is because `EngineKeyCompare` considers the timestamp to be
larger and it sorts versions in decreasing timestamp order.

As a result, a seek to a version timestamp without a synthetic bit will seek
past and fail to observe a value with that same timestamp but with a synthetic
bit. In other words, a seek to timestamp `10,20` will fail to see a version
stored at `10,20?`. This is unintended behavior, as the synthetic bit should not
affect key ordering or key equality (see `MVCCKey.Equal`).

This change will be mostly reverted in a later commit when the bug is fixed.
This commit fixes the bug revealed in the previous commit and sets the
stage for generalized MVCC version normalization during key comparison,
which will be needed for cockroachdb#77342.

To do so, the commit adds a normalization step to EngineKeyCompare, the
custom `Compare` function we provide to Pebble. This normalization pass
currently strips the synthetic bit from version timestamps, which fixes
the bug revealed in the previous commit. The normalization pass also
strips zero-valued logical components, which are typically not encoded
into MVCCKeys today, but may be in the future (for instance, see
cockroachdb/pebble#1314). In cockroachdb#77342, we can then
extend this to strip the encoded logical timestamp, if present.

In addition to updating the existing custom key comparator function
passed to Pebble, the commit also introduces a new custom key equality
function. This new function, called EngineKeyEqual, is provided to
Pebble as its `Equal` function, replacing the default key equality
function of `bytes.Equal`. EngineKeyEqual uses the same version
normalization rules to strip portions of the key's version that should
not affect ordering.

The relationship between the different comparators is explored in a new
property based unit test called `TestMVCCKeyCompareRandom`. The change
allows us to say that for any two `MVCCKeys` `a` and `b`, the following
identities hold:
```
a.Compare(b) = EngineKeyCompare(encode(a), encode(b))

a.Equal(b)   = EngineKeyEqual(encode(a), encode(b))

(a.Compare(b) == 0) = a.Equal(b)

(a.Compare(b) < 0) = a.Less(b)

(a.Compare(b) > 0) = b.Less(a)
```

Care was taken to minimize the cost of this version normalization. With
EngineKeyCompare, the normalization amounts to four new branches that
should all be easy for a branch predictor to learn.

With EngineKeyEqual, there is more of a concern that this change will
regress performance because we switch from a direct call to `bytes.Equal`
to a custom comparator. To minimize this cost, the change adds a
fast-path to quickly defer to `bytes.Equal` when version normalization
is not needed. Benchmarks show that with this fast-path and with an
expected distribution of keys, the custom key equality function is about
2.5ns more expensive per call. This seems reasonable.

```
name               time/op
MVCCKeyCompare-10  12.2ns ± 1%
MVCCKeyEqual-10    7.10ns ± 6%
BytesEqual-10      4.72ns ± 2%
```

Release note: None.

Release justification: None. Not intended for v22.1.
To prevent maxItersBeforeSeek from hidding inconsistencies between seeking and
iterating in the future, the commit makes the value metamorphic, with a range of
[0,3). Going forward, unit tests will use different values for this heuristic.

Release justification: None. Not intended for v22.1.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/metamorphicItersBeforeSeek branch from 651c533 to 39fbf41 Compare March 30, 2022 15:12
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


-- commits, line 4 at r9:

Previously, sumeerbhola wrote…

I assume this commit will be squashed, yes?

I wasn't planning on it. Do you think it needs to be squashed away? CI is green on this commit and the value of maxItersBeforeSeek isn't reset to 10 until two commits later. I'd like to avoid squashing this PR all down to one commit, as there are too many semi-related moving parts for that.


pkg/storage/intent_interleaving_iter.go, line 745 at r4 (raw file):

Previously, sumeerbhola wrote…

I see now -- thanks. Can you incorporate something like the following as a reminder to this code comment:
// NB: an iterator with bounds [L, U) is allowed to SeekLT over any key in [L, U]. For local keyspace iters, U can be LocalMax and for global keyspace iters L can be LocalMax.

Done.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @sumeerbhola)


-- commits, line 4 at r9:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I wasn't planning on it. Do you think it needs to be squashed away? CI is green on this commit and the value of maxItersBeforeSeek isn't reset to 10 until two commits later. I'd like to avoid squashing this PR all down to one commit, as there are too many semi-related moving parts for that.

Multiple commits is fine. My understanding was that a cockroachdb binary built after this first commit would be deliberately buggy. Did I miss something?

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


-- commits, line 4 at r9:

Previously, sumeerbhola wrote…

Multiple commits is fine. My understanding was that a cockroachdb binary built after this first commit would be deliberately buggy. Did I miss something?

The first commit doesn't introduce any new bugs, it just makes an existing bug slightly easier to hit. Either way, it's still possible to perform an AOST scan and fail to miss a key-value with a synthetic timestamp. Since the bug isn't new and it's then fixed immediately after, I think it's ok.

@craig
Copy link
Contributor

craig bot commented Mar 30, 2022

🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@craig
Copy link
Contributor

craig bot commented Mar 30, 2022

Build succeeded:

@craig craig bot merged commit e81ec34 into cockroachdb:master Mar 30, 2022
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/metamorphicItersBeforeSeek branch March 31, 2022 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants