internal/metamorphic: add block-property filter coverage #1737

jbowens · 2022-06-03T19:51:23Z

sstable: fix additional bounds check bugs

In 4406dab a bounds-checking bug was fixed in the two-level index
iterator's SeekLT method. The same bug existed within Last and
skipBackward and was missed.

db: refactor levelIter/mergingIter boundary interface

The levelIter and mergingIter are tightly coupled in a few places, with the
levelIter passing additional context to the mergingIter by writing into fields
owned by the mergingIter. This commit is a small refactor to organize these
fields together into a levelIterBoundaryContext struct. This is in
preparation for adding an additional field.

db: don't filter range deletions based on point-key block filters

Previously, if an iterator configured with a block-property filter encountered
a table that contained all excluded point keys, the iterator would skip the
table entirely, including its range deletions. An iterator configured with a
block-property filter could observe keys outside the filtered range that no
longer exist.

This behavior was fine from CockroachDB's perspective, which always pairs a
filtered iterator with an unfiltered iterator, seeking the latter in tandem.
The behavior is likely to be surprising in non-CockroachDB use cases.
Additionally, the behavior is an obstacle to range-key masking which must
respect range deletions for keys not covered by the masking range key.

This change requires additional coordination between the mergingIter and
levelIter to ensure that a levelIter doesn't close a file while other levels
still require its range deletions. This coordination takes the form a new
isIgnorableBoundaryKey field. The levelIter may return a file's smallest or
largest boundary key [without any value], setting isIgnorableBoundaryKey to
true to signal the key doesn't repesent a real KV pair. This scheme is also
extended to SeekPrefixGE, which previously synthesized a RANGEDEL key to serve
a similar purpose.

internal/metamorphic: add block-property filter coverage

Add test coverage for block-property filters to the metamorphic tests.
Division of keys into blocks are nondeterministic, so this introduces
nondeterminism into Iterator behavior. To account for this, the metamorphic
test's iterator wrapper is adjusted to skip any keys that are eligible for
filtering.

Fix #1725.

cockroach-teamcity · 2022-06-03T19:51:30Z

This change is

jbowens

Reviewable status: 0 of 15 files reviewed, 1 unresolved discussion

-- commits line 5 at r4:
I swear I checked for this; but somehow they slipped through.

sumeerbhola

haven't looked at the last commit yet. Generally looks good.

Reviewed 2 of 2 files at r1, 5 of 5 files at r2, 3 of 5 files at r3, 2 of 8 files at r5.
Reviewable status: 9 of 15 files reviewed, 10 unresolved discussions (waiting on @jbowens and @sumeerbhola)

level_iter.go line 79 at r5 (raw file):

	// iterators that perform filtering of keys. When a new file's iterator is
	// opened, it's tested to see if it implements filteredIter. If it does,
	// it's stored here to allow the level iterator recognize when keys were

to recognize ...

level_iter.go line 680 at r5 (raw file):

			// possible the file's range deletions are still relevant to other
			// levels. Return the largest boundary as a special ignorable marker
			// to avoid advancing to the next file.

I think its worth saying in a code comment here something like: it is safe to return l.iterFile.Largest since it must have been skipped, and therefore must be > what we have previously returned when iterating in this direction, and so the monotonicity required by the mergingIter heap is not violated.

level_iter.go line 750 at r5 (raw file):

			// possible the file's range deletions are still relevant to other
			// levels. Return the smallest boundary as a special ignorable key
			// to avoid advancing to the next file.

ditto

merging_iter.go line 48 at r2 (raw file):

	smallestUserKey, largestUserKey []byte
	// isLargestUserKeyRangeDelSentinel is set to true when a file's largest
	// boundary is an exclusive range deletion sentinel file. If true, the file

sentinel.

table_cache.go line 400 at r5 (raw file):

		// interface, so that the level iterator surfaces file boundaries when
		// range deletions are present.
		return filteredAll, rangeDelIter, err

would it be slightly more efficient if we returned filteredAll only if rangeDelIter was non-nil?

sstable/reader.go line 96 at r5 (raw file):

	// any keys due to block-property filters. This is used by the Pebble
	// levelIter to control when an iterator steps to the next sstable.
	FilteredAnyKeys() bool

this change could use some datadriven testing.

sstable/reader.go line 219 at r5 (raw file):

	// filteredKeys indicates whether the last iterator positioning operation
	// skipped any keys due to block-property filters. It is exposed through
	// FilteredAnyKeys and used by the level iterator.

for naming consistency, can you call this filteredAnyKeys.

sstable/reader.go line 527 at r5 (raw file):

	}
	i.exhaustedBounds = 0
	i.filteredKeys = false

This makes me nervous, in that I think this happens to be correct because of the specifics of the logic in seekGEHelper that tries to use the current iterator position and avoids doing a seek.
But in general I am worried that if we have previously stopped because the block was irrelevant, it could also include keys that are relevant to this seek position. And so if we now decide to not do a full seek (which is correct) we may erroneously return filteredKey=false, when it is in fact true.

Maybe one way to build more confidence is to establish an invariant on these next optimizations for the seeks e.g. do we ever try to use a next optimization if the last positioning call (step or seek) returned nothing because of loadBlockIrrelevant. I think not, because we seem to condition on !data.isDataInvalidated() and on !i.index.isDataInvalidated(), but I would appreciate you taking a closer look.

sstable/reader.go line 695 at r5 (raw file):

func (i *singleLevelIterator) SeekLT(key []byte) (*InternalKey, []byte) {
	i.exhaustedBounds = 0
	i.filteredKeys = false

same worry here.

sumeerbhola

btw, I'm assuming you don't want to narrow the block property collector interface to only allow keys. I'm a bit torn about this -- I can see some utility to a future where values also contribute.

Either way, I think we could use a better version of the "truthful-bpf" comment from #1725 (comment) in block_property.go, to lay out the meaning of correctness.

Reviewable status: 9 of 15 files reviewed, 10 unresolved discussions (waiting on @jbowens and @sumeerbhola)

jbowens

nice, yeah, I added a comment to block_property.go describing this 'deterministic' block-property filtering.

Reviewable status: 1 of 16 files reviewed, 10 unresolved discussions (waiting on @jbowens and @sumeerbhola)

level_iter.go line 79 at r5 (raw file):

Previously, sumeerbhola wrote…

to recognize ...

Done.

level_iter.go line 680 at r5 (raw file):

Previously, sumeerbhola wrote…

I think its worth saying in a code comment here something like: it is safe to return l.iterFile.Largest since it must have been skipped, and therefore must be > what we have previously returned when iterating in this direction, and so the monotonicity required by the mergingIter heap is not violated.

Done.

level_iter.go line 750 at r5 (raw file):

Previously, sumeerbhola wrote…

ditto

Done.

merging_iter.go line 48 at r2 (raw file):

Previously, sumeerbhola wrote…

sentinel.

Done.

table_cache.go line 400 at r5 (raw file):

Previously, sumeerbhola wrote…

would it be slightly more efficient if we returned filteredAll only if rangeDelIter was non-nil?

i'm not sure. we won't ever attempt to call FilteredAnyKeys() unless the level's rangedel iter is non-nil. i think the thing that might be slightly more efficient is to only populate levelIter.filteredIter if the rangedel iter is non-nil, so we can avoid the interface type assertion: i'll make that change now.

sstable/reader.go line 219 at r5 (raw file):

Previously, sumeerbhola wrote…

for naming consistency, can you call this filteredAnyKeys.

Done.

sstable/reader.go line 527 at r5 (raw file):

Previously, sumeerbhola wrote…

This makes me nervous, in that I think this happens to be correct because of the specifics of the logic in seekGEHelper that tries to use the current iterator position and avoids doing a seek.
But in general I am worried that if we have previously stopped because the block was irrelevant, it could also include keys that are relevant to this seek position. And so if we now decide to not do a full seek (which is correct) we may erroneously return filteredKey=false, when it is in fact true.

Maybe one way to build more confidence is to establish an invariant on these next optimizations for the seeks e.g. do we ever try to use a next optimization if the last positioning call (step or seek) returned nothing because of loadBlockIrrelevant. I think not, because we seem to condition on !data.isDataInvalidated() and on !i.index.isDataInvalidated(), but I would appreciate you taking a closer look.

I see what you mean. I scrutinized the code, and my understanding matches yours: The conditions on !i.data.isDataInvalidated() and !i.index.isDataInvalidated() ensure we'll never trySeekUsingNext if the last positioning method returned nothing. I'm going to spend a little more time with this tomorrow to find the right invariants to document and assert.

jbowens

Note to self: this needs to be updated to apply similar logic for range-key deletions.

Reviewable status: 1 of 16 files reviewed, 10 unresolved discussions (waiting on @jbowens and @sumeerbhola)

jbowens

Reviewable status: 0 of 18 files reviewed, 10 unresolved discussions (waiting on @jbowens and @sumeerbhola)

level_iter.go line 680 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Done.

Actually, it is possible l.iterFile.Largest was not skipped. A SeekGE that lands on a index separator k only knows that the block at the index entry contains keys ≤ k. We can't know whether there were actually keys between the seek key and the index key. If the block is then excluded due to block-property filters, we'll think keys were filtered when none were.

I think this ≤ monotonicity—as opposed to strictly increasing monotonicity—is still okay with respect to the merging iterator. I adjusted the commentary here and renamed the FilteredAnyKeys function to MaybeFilteredKeys().

sstable/reader.go line 96 at r5 (raw file):

Previously, sumeerbhola wrote…

this change could use some datadriven testing.

Done.

nicktrav

Reviewed 1 of 2 files at r1, 14 of 14 files at r7, 5 of 5 files at r8, 6 of 6 files at r9, 12 of 12 files at r11, 6 of 6 files at r12, 3 of 3 files at r13, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)

sstable/block_property.go line 53 at r9 (raw file):

// into blocks is nondeterministic. Block propery filtering may introduce two
// kinds of errors:
//   a) Point keys that match the filter may be surfaced. This may happen

In the context of block property filters, doesn't "filter" imply that it matches (intersects / selects) the block? Yet here we're saying that "point keys matching the filter" is a type of error?

sstable/block_property.go line 52 at r13 (raw file):

// Block property filtering is nondeterministic because the separation of keys
// into blocks is nondeterministic. Block propery filtering may introduce two
// kinds of errors:

Is "error" accurate in this context? Can this be thought of more like a "false positive"?

sstable/testdata/block_properties line 443 at r11 (raw file):

<a@1:1> MaybeFilteredKeys()=true

# Test FilteredAnyKeys().

nit: MaybeFilteredKeys? Same elsewhere.

jbowens

Reviewable status: 9 of 18 files reviewed, 13 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)

sstable/block_property.go line 53 at r9 (raw file):

Previously, nicktrav (Nick Travers) wrote…

In the context of block property filters, doesn't "filter" imply that it matches (intersects / selects) the block? Yet here we're saying that "point keys matching the filter" is a type of error?

Since block boundaries are completely variable, it is only meaningful to define block-property filters in order to apply a filter function that is a function of the key-value pairs. Block and table level filtering is just the mechanic, but the intended filter is always a function of the point keys themselves.

sstable/block_property.go line 52 at r13 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Is "error" accurate in this context? Can this be thought of more like a "false positive"?

I think "false positive" or "false negative" are errors, in the statistical sense. I used the "false negative" language down below to describe the first form of error. I'm not sure whether "false positive" (positive as in what survives the filter) or "false negative" (negative as in the filter failing to detect the key) is the appropriate term?

I don't think the second form of error can be described as a false negative/positive, because it's a violation of the semantics ordinarily guaranteed by Pebble: If you delete a key, it should appear deleted on all subsequent reads.

sstable/testdata/block_properties line 443 at r11 (raw file):

Previously, nicktrav (Nick Travers) wrote…

nit: MaybeFilteredKeys? Same elsewhere.

good catch, done.

nicktrav

Reviewed 9 of 9 files at r14, 9 of 9 files at r15, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)

sstable/block_property.go line 53 at r9 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Since block boundaries are completely variable, it is only meaningful to define block-property filters in order to apply a filter function that is a function of the key-value pairs. Block and table level filtering is just the mechanic, but the intended filter is always a function of the point keys themselves.

The mechanics of it I understand. Just a matter of interpretation ... does "match the filter" mean "filtering for" or "filtering out"? /shrug

sstable/block_property.go line 52 at r13 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think "false positive" or "false negative" are errors, in the statistical sense. I used the "false negative" language down below to describe the first form of error. I'm not sure whether "false positive" (positive as in what survives the filter) or "false negative" (negative as in the filter failing to detect the key) is the appropriate term?

I don't think the second form of error can be described as a false negative/positive, because it's a violation of the semantics ordinarily guaranteed by Pebble: If you delete a key, it should appear deleted on all subsequent reads.

👍

sumeerbhola

Reviewed 3 of 5 files at r8, 3 of 6 files at r9, 2 of 12 files at r11, 4 of 9 files at r15, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)

level_iter.go line 680 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Actually, it is possible l.iterFile.Largest was not skipped. A SeekGE that lands on a index separator k only knows that the block at the index entry contains keys ≤ k. We can't know whether there were actually keys between the seek key and the index key. If the block is then excluded due to block-property filters, we'll think keys were filtered when none were.

I think this ≤ monotonicity—as opposed to strictly increasing monotonicity—is still okay with respect to the merging iterator. I adjusted the commentary here and renamed the FilteredAnyKeys function to MaybeFilteredKeys().

I see. Can you also add this index separator example to the code comment to clarify to the reader why we need this maybe-semantics.

table_cache.go line 400 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

i'm not sure. we won't ever attempt to call FilteredAnyKeys() unless the level's rangedel iter is non-nil. i think the thing that might be slightly more efficient is to only populate levelIter.filteredIter if the rangedel iter is non-nil, so we can avoid the interface type assertion: i'll make that change now.

ack

sstable/block_property.go line 59 at r15 (raw file):

//      used for filtering is not a strict function of the user key, a block
//      containing k.DEL may be filtered, while a block containing the deleted
//      key k.SET may not be filtered.

The comment is missing a correctness definition. I think it is worth separating out the abstract notion of the kv-filter. Also, it may be easier to read if the solution to each problem goes together with the problem definition.

So I'd suggest organizing it in a manner like (of course with all the other details in your comment):

There is a filter that applies to key-value pairs. Correctness is defined as surfacing exactly the key-value pairs that correspond to applying the kv-filter after reading from the DB.
Block property filtering that uses that kv-filter produces additional values that don't satisfy the filter because of the separation of keys into blocks. These can be removed by reapplying the kv-filter above the filtering applied by bpf.
Block property filtering may surface point keys that have been deleted if the kv-filter looks at the value. Don't use such a filter (actually the more nuanced statement in your comment).

sstable/reader.go line 527 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I see what you mean. I scrutinized the code, and my understanding matches yours: The conditions on !i.data.isDataInvalidated() and !i.index.isDataInvalidated() ensure we'll never trySeekUsingNext if the last positioning method returned nothing. I'm going to spend a little more time with this tomorrow to find the right invariants to document and assert.

I think we also need a code comment here talking about this potential interaction between maybeFilteredKeys and trySeekUsingNext so that the next person changing this code is careful about it. And it isn't limited to trySeekUsingNext -- I believe it applies to all stepping instead of seeking optimizations here (there is the monotonic bounds one, that has nothing to do with trySeekUsingNext). So it also needs a code comment in SeekLT.

sstable/testdata/block_properties line 508 at r15 (raw file):

seek-lt a
seek-lt c
seek-lt ca

can you add seek-lt f and seek-lt e and seek-lt d -- I am assuming the first two will produce MaybeFilteredKeys=true.

And is it possible to exercise some of the stepping optimizations by doing SeekGE followed by Next and confirming the correctness of the MaybeFilteredKeys. Also with the one with monotonic bounds that apply to both SeekGE and SeekLT.

jbowens

@sumeerbhola —I further narrowed MaybeFilteredKeys to only be defined at exhausted iterator positions, because otherwise seeks that apply the try-seek-using-next optimization and don't move the iterator position at all will violate the contract. We only care about the exhausted iterator position for levelIter.

Reviewable status: 8 of 18 files reviewed, 11 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)

level_iter.go line 680 at r5 (raw file):

Previously, sumeerbhola wrote…

I see. Can you also add this index separator example to the code comment to clarify to the reader why we need this maybe-semantics.

Done.

sstable/block_property.go line 59 at r15 (raw file):

Previously, sumeerbhola wrote…

The comment is missing a correctness definition. I think it is worth separating out the abstract notion of the kv-filter. Also, it may be easier to read if the solution to each problem goes together with the problem definition.

So I'd suggest organizing it in a manner like (of course with all the other details in your comment):

There is a filter that applies to key-value pairs. Correctness is defined as surfacing exactly the key-value pairs that correspond to applying the kv-filter after reading from the DB.

Block property filtering that uses that kv-filter produces additional values that don't satisfy the filter because of the separation of keys into blocks. These can be removed by reapplying the kv-filter above the filtering applied by bpf.

Block property filtering may surface point keys that have been deleted if the kv-filter looks at the value. Don't use such a filter (actually the more nuanced statement in your comment).

Done.

sstable/reader.go line 527 at r5 (raw file):

Previously, sumeerbhola wrote…

I think we also need a code comment here talking about this potential interaction between maybeFilteredKeys and trySeekUsingNext so that the next person changing this code is careful about it. And it isn't limited to trySeekUsingNext -- I believe it applies to all stepping instead of seeking optimizations here (there is the monotonic bounds one, that has nothing to do with trySeekUsingNext). So it also needs a code comment in SeekLT.

Done.

sstable/reader.go line 695 at r5 (raw file):

Previously, sumeerbhola wrote…

same worry here.

Done.

sumeerbhola

because otherwise seeks that apply the try-seek-using-next optimization and don't move the iterator position at all will violate the contract

Because the only safe thing to do in that case is to return true?
Narrowing to exhausted sounds good.

Reviewed 1 of 12 files at r11, 2 of 10 files at r16, 8 of 8 files at r17.
Reviewable status: all files reviewed (commit messages unreviewed), 16 unresolved discussions (waiting on @jbowens and @sumeerbhola)

sstable/block_property.go line 54 at r17 (raw file):

// implement efficient application of a filter F that applies to key-value
// pairs. Consider correctness defined as surfacing exactly the same key-value
// pairs that would be surfaced if one applied the filter F above block-property

I think it should say "if one applied the filter F above regular iteration"

sstable/block_property.go line 55 at r17 (raw file):

// pairs. Consider correctness defined as surfacing exactly the same key-value
// pairs that would be surfaced if one applied the filter F above block-property
// filtering With this correctness definition, block property filtering may

nit: missing period

sstable/block_property.go line 58 at r17 (raw file):

// introduce two kinds of errors:
//
//   a) Block property filtering that uses a kv-filter may produce additional

we haven't defined "kv-filter". Perhaps the earlier comment that introduces F can say "... of a filter F that applies to key-value pairs (abbreviated as kv-filter)."

sstable/testdata/block_properties line 538 at r17 (raw file):

. MaybeFilteredKeys()=true
. MaybeFilteredKeys()=true
<b@10:2> MaybeFilteredKeys()=true

with bounds [b,e), isn't b@10 the first key that is encountered regardless of block property filtering? I'm confused by this true value.

sstable/testdata/block_properties line 540 at r17 (raw file):

<b@10:2> MaybeFilteredKeys()=true
<c@15:3> MaybeFilteredKeys()=false

The corner case I was worried about was something like

set-bounds lower=b upper=ee
seek-ge d
set-bounds lower=ee upper=g
seek-ge ee

where I think the layout of blocks and indices will cause the first seek to skip the lower level index block containing e and f. Then the next set-bounds qualifies for the monotonic bounds stepping optimization, and we want to ensure that seek-ge ee also returns true.

jbowens

Reviewable status: 15 of 18 files reviewed, 15 unresolved discussions (waiting on @sumeerbhola)

sstable/block_property.go line 54 at r17 (raw file):

Previously, sumeerbhola wrote…

I think it should say "if one applied the filter F above regular iteration"

Done.

sstable/block_property.go line 58 at r17 (raw file):

Previously, sumeerbhola wrote…

we haven't defined "kv-filter". Perhaps the earlier comment that introduces F can say "... of a filter F that applies to key-value pairs (abbreviated as kv-filter)."

Done.

sstable/testdata/block_properties line 508 at r15 (raw file):

Previously, sumeerbhola wrote…

can you add seek-lt f and seek-lt e and seek-lt d -- I am assuming the first two will produce MaybeFilteredKeys=true.

And is it possible to exercise some of the stepping optimizations by doing SeekGE followed by Next and confirming the correctness of the MaybeFilteredKeys. Also with the one with monotonic bounds that apply to both SeekGE and SeekLT.

Done.

sstable/testdata/block_properties line 538 at r17 (raw file):

Previously, sumeerbhola wrote…

with bounds [b,e), isn't b@10 the first key that is encountered regardless of block property filtering? I'm confused by this true value.

This is that case we discussed earlier that makes it hard to tell whether we skipped keys. The index separator between a@1 and b@10 is calculated to be b. The seek-ge b lands on the first block, sees that it might contain b, sees that it's filtered by bpfs and proceeds to the next block.

sstable/testdata/block_properties line 540 at r17 (raw file):

Previously, sumeerbhola wrote…

The corner case I was worried about was something like
set-bounds lower=b upper=ee
seek-ge d
set-bounds lower=ee upper=g
seek-ge ee
where I think the layout of blocks and indices will cause the first seek to skip the lower level index block containing e and f. Then the next set-bounds qualifies for the monotonic bounds stepping optimization, and we want to ensure that seek-ge ee also returns true.

Thanks for this test case—it revealed a bug. We handle this case appropriately within the single-level iterator, but we don't do the comparable thing within the two-level iterator.

The second seek finds the two-level iterator is already positioned and eligible for the bounds optimization. When deciding whether to enter the slow path, it only checks if !i.twoLevelIndex.isDataInvalidated(), so it falls into the optimization.

I made two code changes to fix this:

loadIndex now invalidates i.index too, not just i.data.
twoLevelIterator.Seek{GE,PrefixGE} do not fast path if i.index.isDataInvalidated() .

jbowens

Pondering a randomized test that randomizes the index block size and data block size. Maybe two variants, one that tests trySeekUsingNext and one that tests monotonically increasing/decreasing bounds. Might leave it as a TODO/issue to follow up with during stability period.

Reviewable status: 15 of 18 files reviewed, 15 unresolved discussions (waiting on @sumeerbhola)

sumeerbhola

Reviewed 1 of 3 files at r18, all commit messages.
Reviewable status: 16 of 18 files reviewed, 17 unresolved discussions (waiting on @jbowens and @sumeerbhola)

sstable/reader.go line 520 at r18 (raw file):

	// reached. The i.data.isDataInvalidated() indicates that the sstable was
	// exhausted.
	if trySeekUsingNext && (i.exhaustedBounds == +1 || i.data.isDataInvalidated()) {

If this case ok because if maybeFilteredKeys=true, it will stay true (since the reset to false is after this return)?

Are we losing this optimization for the twoLevelIterator case with your latest change because trySeekUsingNext will get set to false in the expanded slow-path for twoLevelIterator.SeekGE?

sstable/reader.go line 581 at r18 (raw file):

		// caller claimed externally known invariant represented by
		// trySeekUsingNext=true.
		if trySeekUsingNext {

also, I wonder whether I overlooked this case in my original "This makes me nervous" comment. It doesn't use anything about i.data.isDataInvalidated or i.index.isDataInvalidated.

isn't it possible for erroneously return maybeFilteredKeys=false
And this optimization seems to suffer from the same regression in the twoLevelIterator case.

If we ever use trySeekUsingNext should we just conservatively return maybeFilteredKeys=true? I'm generally more concerned about losing a seek optimization than returning an extra synthetic key -- and there are some microbenchmarks that suggested that optimizations for seeks on exhausted iterators were quite helpful since with 90% of the data in L6, it is not uncommon to have L1-L5 level iterators already exhausted.

In 4406dab a bounds-checking bug was fixed in the two-level index iterator's SeekLT method. The same bug existed within Last and skipBackward and was missed.

The levelIter and mergingIter are tightly coupled in a few places, with the levelIter passing additional context to the mergingIter by writing into fields owned by the mergingIter. This commit is a small refactor to organize these fields together into a `levelIterBoundaryContext` struct. This is in preparation for adding an additional field.

Previously, if an iterator configured with a block-property filter encountered a table that contained all excluded point keys, the iterator would skip the table entirely, including its range deletions. An iterator configured with a block-property filter could observe keys outside the filtered range that no longer exist. This behavior was fine from CockroachDB's perspective, which always pairs a filtered iterator with an unfiltered iterator, seeking the latter in tandem. The behavior is likely to be surprising in non-CockroachDB use cases. Additionally, the behavior is an obstacle to range-key masking which must respect range deletions for keys not covered by the masking range key. This change requires additional coordination between the mergingIter and levelIter to ensure that a levelIter doesn't close a file while other levels still require its range deletions. This coordination takes the form a new `isIgnorableBoundaryKey` field. The levelIter may return a file's smallest or largest boundary key [without any value], setting `isIgnorableBoundaryKey` to true to signal the key doesn't repesent a real KV pair. This scheme is also extended to SeekPrefixGE, which previously synthesized a RANGEDEL key to serve a similar purpose.

Add test coverage for block-property filters to the metamorphic tests. Division of keys into blocks are nondeterministic, so this introduces nondeterminism into Iterator behavior. To account for this, the metamorphic test's iterator wrapper is adjusted to skip any keys that are eligible for filtering. Fix cockroachdb#1725.

jbowens

Reviewable status: 0 of 18 files reviewed, 17 unresolved discussions (waiting on @nicktrav and @sumeerbhola)

sstable/reader.go line 520 at r18 (raw file):

Previously, sumeerbhola wrote…

If this case ok because if maybeFilteredKeys=true, it will stay true (since the reset to false is after this return)?

Are we losing this optimization for the twoLevelIterator case with your latest change because trySeekUsingNext will get set to false in the expanded slow-path for twoLevelIterator.SeekGE?

Yeah, both of those are right.

sstable/reader.go line 581 at r18 (raw file):

Previously, sumeerbhola wrote…

also, I wonder whether I overlooked this case in my original "This makes me nervous" comment. It doesn't use anything about i.data.isDataInvalidated or i.index.isDataInvalidated.

isn't it possible for erroneously return maybeFilteredKeys=false

And this optimization seems to suffer from the same regression in the twoLevelIterator case.

If we ever use trySeekUsingNext should we just conservatively return maybeFilteredKeys=true? I'm generally more concerned about losing a seek optimization than returning an extra synthetic key -- and there are some microbenchmarks that suggested that optimizations for seeks on exhausted iterators were quite helpful since with 90% of the data in L6, it is not uncommon to have L1-L5 level iterators already exhausted.

This case was relying on the invariant at the function start.

// Invariant: trySeekUsingNext => !i.data.isDataInvalidated() && i.exhaustedBounds != +1

The invariant is 'enforced' by the no-op case in SeekGE and SeekPrefixGE, since we never enter seekGEHelper with trySeekUsingNext=true if i.data.isDataInvalidated.

Agreed, the reasoning of correctness was getting too complicated. I've updated to preserve the existing maybeFilteredKeys value during fast paths.

sumeerbhola

Might leave it as a TODO/issue to follow up with during stability period.
+1

Reviewed 3 of 9 files at r21.
Reviewable status: 3 of 18 files reviewed, 17 unresolved discussions (waiting on @nicktrav and @sumeerbhola)

jbowens

TFTR!

filed #1817

Reviewable status: 3 of 18 files reviewed, 17 unresolved discussions (waiting on @nicktrav and @sumeerbhola)

In cockroachdb#1737, we added some intricate interactions around block property filters, and added the contractual requirement that all sstable iterators return MaybeFilteredKeys() == true after a positioning operation that led to filtered keys due to BPFs. We want to build more confidence around that function obeying that contract after seek optimizations, so this change adds a randomized test that does repeated seeks (incl. some after monotonic bound changes and with trySeekUsingNext), and ensures `MaybeFilteredKeys()` returns true after any skipped keys. I would have put this test in the sstable package, except it'd cause a cyclical dependency between it and `internal.testkeys.blockprops` so I moved it to the outer package. Fixes cockroachdb#1817.

In #1737, we added some intricate interactions around block property filters, and added the contractual requirement that all sstable iterators return MaybeFilteredKeys() == true after a positioning operation that led to filtered keys due to BPFs. We want to build more confidence around that function obeying that contract after seek optimizations, so this change adds a randomized test that does repeated seeks (incl. some after monotonic bound changes and with trySeekUsingNext), and ensures `MaybeFilteredKeys()` returns true after any skipped keys. I would have put this test in the sstable package, except it'd cause a cyclical dependency between it and `internal.testkeys.blockprops` so I moved it to the outer package. Fixes #1817.

jbowens commented Jun 3, 2022

View reviewed changes

jbowens force-pushed the bpfs-meta branch from 239672d to 5f4085a Compare June 3, 2022 20:01

jbowens requested a review from a team June 3, 2022 20:06

jbowens mentioned this pull request Jun 6, 2022

internal/metamorphic: test block property filters #1725

Closed

sumeerbhola requested changes Jun 6, 2022

View reviewed changes

sumeerbhola reviewed Jun 6, 2022

View reviewed changes

jbowens mentioned this pull request Jun 25, 2022

db: block property filtering does not respect range tombstones #1786

Closed

jbowens force-pushed the bpfs-meta branch from 5f4085a to e221799 Compare June 29, 2022 22:27

jbowens commented Jun 29, 2022

View reviewed changes

jbowens force-pushed the bpfs-meta branch 4 times, most recently from 200326b to ff822b8 Compare June 30, 2022 17:59

jbowens commented Jun 30, 2022

View reviewed changes

jbowens force-pushed the bpfs-meta branch from ff822b8 to 2ee9c7e Compare June 30, 2022 20:24

jbowens requested a review from a team June 30, 2022 20:24

nicktrav reviewed Jul 1, 2022

View reviewed changes

jbowens force-pushed the bpfs-meta branch from 2ee9c7e to 3781723 Compare July 1, 2022 18:54

jbowens commented Jul 1, 2022

View reviewed changes

nicktrav approved these changes Jul 1, 2022

View reviewed changes

sumeerbhola approved these changes Jul 5, 2022

View reviewed changes

jbowens force-pushed the bpfs-meta branch 2 times, most recently from 15939ba to 8635bb0 Compare July 6, 2022 23:01

jbowens commented Jul 6, 2022

View reviewed changes

sumeerbhola requested changes Jul 7, 2022

View reviewed changes

jbowens force-pushed the bpfs-meta branch from 8635bb0 to f72b941 Compare July 7, 2022 19:16

jbowens commented Jul 7, 2022

View reviewed changes

jbowens commented Jul 8, 2022

View reviewed changes

sumeerbhola requested changes Jul 10, 2022

View reviewed changes

jbowens added 2 commits July 14, 2022 16:44

sstable: fix additional bounds check bugs

0ced03c

In 4406dab a bounds-checking bug was fixed in the two-level index iterator's SeekLT method. The same bug existed within Last and skipBackward and was missed.

jbowens force-pushed the bpfs-meta branch from f72b941 to 3f0a513 Compare July 14, 2022 20:47

jbowens added 2 commits July 14, 2022 17:01

jbowens force-pushed the bpfs-meta branch from 3f0a513 to 551baed Compare July 14, 2022 21:01

jbowens commented Jul 14, 2022

View reviewed changes

sumeerbhola approved these changes Jul 18, 2022

View reviewed changes

jbowens mentioned this pull request Jul 18, 2022

sstable: write randomized test for MaybeFilteredKeys and step optimizations #1817

Closed

jbowens commented Jul 18, 2022

View reviewed changes

jbowens merged commit 6b9be03 into cockroachdb:master Jul 18, 2022

jbowens deleted the bpfs-meta branch July 18, 2022 17:10

itsbilal mentioned this pull request Aug 30, 2022

*: Add random test for iterator seek - BlockPropertyFilters interaction #1936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal/metamorphic: add block-property filter coverage #1737

internal/metamorphic: add block-property filter coverage #1737

jbowens commented Jun 3, 2022

cockroach-teamcity commented Jun 3, 2022

jbowens left a comment

sumeerbhola left a comment

sumeerbhola left a comment

jbowens left a comment

jbowens left a comment

jbowens left a comment

nicktrav left a comment

jbowens left a comment

nicktrav left a comment

sumeerbhola left a comment

jbowens left a comment

sumeerbhola left a comment

jbowens left a comment

jbowens left a comment

sumeerbhola left a comment

jbowens left a comment

sumeerbhola left a comment

jbowens left a comment

internal/metamorphic: add block-property filter coverage #1737

internal/metamorphic: add block-property filter coverage #1737

Conversation

jbowens commented Jun 3, 2022

cockroach-teamcity commented Jun 3, 2022

jbowens left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

nicktrav left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

nicktrav left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment