-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sstable: fixes to prefix replacing iterator #3344
Conversation
bd7b217
to
e00f715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit hard to know for sure this is correct without some of the semantics being clear (see discussion in #3329) but this passes more iterations of the metamorphic test (w prefix replacement added) than what we have. I haven't stressed it for a long time yet but I plan to do that before merging.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @dt, @itsbilal, and @jbowens)
sstable/prefix_replacing_iterator.go
line 157 at r1 (raw file):
// We still want to seek the underlying iterator to potentially enable // optimizations passed through flags. p.i.SeekGE(p.rewriteArg(p.lower), flags)
@jbowens do you think this is enough to retain any seek optimizations?
e00f715
to
74ccc4e
Compare
sstable/prefix_replacing_iterator.go
Outdated
const ( | ||
// inSync indicates that the prefix replacing iterator is "in sync" with the | ||
// underlying iterator; any Next/Prev calls can be passed through. | ||
inSync iteratorState = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be inRange
, or alternatively iterPosCur
if we want to really mimic wording in other iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return nil, base.LazyValue{} | ||
case beforeRange: | ||
p.state = inSync | ||
return p.rewriteResult(p.i.First()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a violation of the internalIterator contract, right? We don't allow First() to be called if there's a lower bound, instead we want SeekGE(lower) to be called. Same with Last() and the upper bound. You might need to save the bounds in SetBounds
so you can do the appropriate translation here instead of passing through a first/last to the child iter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done. TFTR!
74ccc4e
to
570d0eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM save for NextPrefix
fix
case beforeRange: | ||
p.state = inRange | ||
if p.lowerBound != nil { | ||
return p.rewriteResult(p.i.SeekGE(p.lowerBound, base.SeekGEFlagsNone)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might need to be SeekGE(succKey)
as the lower bound might not have the prefix we want. And same for the First() call below; we can actually unconditionally call SeekGE(succKey)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this some more. What I did is incorrect, we can't just pass a lower bound below without rewriting its prefix.. We need to basically figure out the intersection of the given bounds and the vtable bounds (most of that is happening today below us).
570d0eb
to
d6568d2
Compare
We add more logic inside the prefix replacing iterators: - pass through `SeekGE` calls even when not strictly necessary, to enable optimizations indicated by the flags. - keep track of whether we are positioned before/after the synthetic prefix range, to enable expected behaviors like `SeekGE` after the range followed by `Prev` (or `SeekLT` to before the range followed by `Next`)
d6568d2
to
39c553a
Compare
Does it require that much more work? If it's just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 14 unresolved discussions (waiting on @dt, @itsbilal, @jbowens, and @RaduBerinde)
sstable/prefix_replacing_iterator.go
line 157 at r1 (raw file):
Previously, RaduBerinde wrote…
@jbowens do you think this is enough to retain any seek optimizations?
I'm a little confused here. I think all we need is to track whether we did the absolute positioning or ignored the SeekPrefixGE (see other comment below).
So we would remember positioned/unpositioned. The ignore case here is the only one which would result in a state transition to unpositioned. Any relative positioning call when unpositioned will return nil. Any absolute positioning call where the state was unpositioned would turn off the TrySeekUsingNext. This is following the same pattern as the next lower layer where we do
if !i.lastBloomFilterMatched {
// Iterator is not positioned based on last seek.
flags = flags.DisableTrySeekUsingNext()
}
I could be wrong, since we need to flesh out the reasoning in https://cockroachlabs.slack.com/archives/CAC6K3SLU/p1708716407912329?thread_ts=1708549343.720779&cid=CAC6K3SLU
sstable/prefix_replacing_iterator.go
line 26 at r2 (raw file):
syntheticPrefix []byte // lower is a valid key that has syntheticPreficx and is a lower bound for all
nit: syntheticPrefix
Aren't we requiring that i
only produce keys with contentPrefix
. So how can a key starting with syntheticPrefix
be a lower bound for i
?
sstable/prefix_replacing_iterator.go
line 63 at r2 (raw file):
var _ Iterator = (*prefixReplacingIterator)(nil) // newPrefixReplacingIterator wraps an iterator over keys that have
nit: over point keys ...
sstable/prefix_replacing_iterator.go
line 68 at r2 (raw file):
// `contentPrefix`. // // lower is a valid key that has syntheticPreficx and is a lower bound for all
syntheticPrefix
Same question about "lower bound for all keys produced by i`.
sstable/prefix_replacing_iterator.go
line 201 at r2 (raw file):
Previously, RaduBerinde wrote…
Good point, done. TFTR!
I suspect we are trying to be too clever with this code.
The underlying iterator i
already has the bounds, but they are in terms of ContentPrefix
. So our difficulty arises when someone tries SeekGE
above the upper bound or SeekLT
below the lower bound, since we can't translate them into seek keys in terms of ContentPrefix
. But levelIter
shouldn't be doing that, since it loads the relevant file with the first key >= seek key in the case of SeekGE
and < seek key in case of SeekLT
. So how about we treat those seek cases as simply calls to Last(); Next()
and First(); Prev()
, since they are not performance sensitive (unless I am mistaken).
I think this eliminates the need for beforeRange
and afterRange
and lowerBound, upperBound
.
sstable/prefix_replacing_iterator.go
line 49 at r3 (raw file):
// inRange indicates that the prefix replacing iterator is "in sync" with the // underlying iterator; any Next/Prev calls can be passed through. inRange prefixReplacingIteratorState = iota
is there a reason to default to inRange
instead of making the iota value panic-worthy?
sstable/prefix_replacing_iterator.go
line 56 at r3 (raw file):
// prefix range. A Next() call should return the first key/span in the range. beforeRange empty
comment for empty?
sstable/prefix_replacing_iterator.go
line 136 at r3 (raw file):
p.state = inRange if !bytes.HasPrefix(key, p.syntheticPrefix) { if p.cmp(key, p.lower) > 0 {
this is an odd use of p.lower
that is worth a comment. One would expect this check to happen against upper, but this works because lower and upper share SyntheticPrefix
.
sstable/prefix_replacing_iterator.go
line 142 at r3 (raw file):
// Key must be before the range; seek to the lower bound instead. // We don't use First because we may miss out on optimizations passed // through SeekEFlags.
SeekGEFlags
sstable/prefix_replacing_iterator.go
line 143 at r3 (raw file):
// We don't use First because we may miss out on optimizations passed // through SeekEFlags. key = p.lower
Presumably prefixReplacingIterator
is being used in the performance sensitive case after the sstable has been ingested, so this is in a level and being iterated over using levelIter
. Is that accurate?
sstable/prefix_replacing_iterator.go
line 226 at r3 (raw file):
return p.rewriteResult(p.i.First()) case inRange: return p.rewriteResult(p.i.NextPrefix(succKey))
what is the contract regarding succKey
. Are we guaranteed that succKey
has SyntheticPrefix
? I believe not since levelIter
will just pass through. So we have to be able to handle a succKey
that is greater than the upper, i.e.,
if !bytes.HasPrefix(succKey, p.syntheticPrefix) {
if p.cmp(key, p.lower) > 0 {
// Go past the end of this sstable
...
return
} else {
//panic. Can't be calling NextPrefix on this iterator if it is already positioned after `succKey`
}
}
// succKey has synthetic prefix. So rewrite succKey and call on i.
sstable/prefix_replacing_iterator.go
line 268 at r3 (raw file):
// is going to rewrite them itself or pass them to something e.g. vState that // will rewrite them. if x, ok := p.i.(interface{ SetBoundsWithSyntheticPrefix() bool }); ok && x.SetBoundsWithSyntheticPrefix() {
this seems error-prone. What if someone has a typo and causes us to fall into the path below. I guess tests will catch it quickly enough, but what iterator type do we wrap that does not implement SetBoundsWithSyntheticPrefix
?
sstable/reader_virtual.go
line 137 at r2 (raw file):
i, err := v.reader.newIterWithBlockPropertyFiltersAndContext( ctx, transforms, lower, upper, filterer, useFilterBlock, stats, categoryAndQoS, statsCollector, rp, &v.vState)
Seems like since virtualState.{lower,upper}
also have SyntheticPrefix
, the narrowed bound in virtualState.constrainBounds
also have SyntheticPrefix
, so the inverted bounds will have ContentPrefix
. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 14 unresolved discussions (waiting on @dt, @itsbilal, @jbowens, and @sumeerbhola)
sstable/prefix_replacing_iterator.go
line 157 at r1 (raw file):
Previously, sumeerbhola wrote…
I'm a little confused here. I think all we need is to track whether we did the absolute positioning or ignored the SeekPrefixGE (see other comment below).
So we would remember positioned/unpositioned. The ignore case here is the only one which would result in a state transition to unpositioned. Any relative positioning call when unpositioned will return nil. Any absolute positioning call where the state was unpositioned would turn off the TrySeekUsingNext. This is following the same pattern as the next lower layer where we doif !i.lastBloomFilterMatched { // Iterator is not positioned based on last seek. flags = flags.DisableTrySeekUsingNext() }
I could be wrong, since we need to flesh out the reasoning in https://cockroachlabs.slack.com/archives/CAC6K3SLU/p1708716407912329?thread_ts=1708549343.720779&cid=CAC6K3SLU
I see, so we would not call SeekGE here at all and for any further Seek we will DisableTrySeekUsingNext
in the flags. That makes sense but I don't have a good understanding of all the details.
sstable/prefix_replacing_iterator.go
line 26 at r2 (raw file):
Previously, sumeerbhola wrote…
nit: syntheticPrefix
Aren't we requiring that
i
only produce keys withcontentPrefix
. So how can a key starting withsyntheticPrefix
be a lower bound fori
?
Yeah, the description is incorrect. It's a lower bound for all keys produced by i after prefix replacement.
sstable/prefix_replacing_iterator.go
line 201 at r2 (raw file):
Previously, sumeerbhola wrote…
I suspect we are trying to be too clever with this code.
The underlying iterator
i
already has the bounds, but they are in terms ofContentPrefix
. So our difficulty arises when someone triesSeekGE
above the upper bound orSeekLT
below the lower bound, since we can't translate them into seek keys in terms ofContentPrefix
. ButlevelIter
shouldn't be doing that, since it loads the relevant file with the first key >= seek key in the case ofSeekGE
and < seek key in case ofSeekLT
. So how about we treat those seek cases as simply calls toLast(); Next()
andFirst(); Prev()
, since they are not performance sensitive (unless I am mistaken).I think this eliminates the need for
beforeRange
andafterRange
andlowerBound, upperBound
.
The problem is that if SetBounds()
was passed down, then it's illegal to call First
and/or Last
.. We would need to Seek from a bound. But we need to "trim" those bounds to the synthetic prefix range and convert them to content-prefix.
Currently, this happens underneath us - we pass the unchanged bounds to SingleLeveliterator.SetBounds
which deals with the prefix replacement. It's a big mess.
https://github.com/cockroachdb/pebble/blob/master/sstable/reader_iter_single_lvl.go#L367
I think the correct thing to do is move all that logic in here instead. But I'm kind of losing steam on making this work given that we may be able to switch to just prepending a synthetic prefix and doing that in blockIter
(#3350).
sstable/prefix_replacing_iterator.go
line 56 at r3 (raw file):
Previously, sumeerbhola wrote…
comment for empty?
Sorry, left over
sstable/prefix_replacing_iterator.go
line 143 at r3 (raw file):
Previously, sumeerbhola wrote…
Presumably
prefixReplacingIterator
is being used in the performance sensitive case after the sstable has been ingested, so this is in a level and being iterated over usinglevelIter
. Is that accurate?
Correct.
sstable/prefix_replacing_iterator.go
line 226 at r3 (raw file):
Previously, sumeerbhola wrote…
what is the contract regarding
succKey
. Are we guaranteed thatsuccKey
hasSyntheticPrefix
? I believe not sincelevelIter
will just pass through. So we have to be able to handle asuccKey
that is greater than the upper, i.e.,if !bytes.HasPrefix(succKey, p.syntheticPrefix) { if p.cmp(key, p.lower) > 0 { // Go past the end of this sstable ... return } else { //panic. Can't be calling NextPrefix on this iterator if it is already positioned after `succKey` } } // succKey has synthetic prefix. So rewrite succKey and call on i.
Hm, good point. The contract as written is that succKey
is the immediate successor key to the iterator's current key prefix: https://github.com/cockroachdb/pebble/blob/master/internal/base/iterator.go#L168
In practice at least, the immediate successor key should have the same synthetic prefix. But I learned to take these docs with a grain of salt :)
sstable/prefix_replacing_iterator.go
line 268 at r3 (raw file):
Previously, sumeerbhola wrote…
this seems error-prone. What if someone has a typo and causes us to fall into the path below. I guess tests will catch it quickly enough, but what iterator type do we wrap that does not implement
SetBoundsWithSyntheticPrefix
?
I don't think we wrap anything else. I agree this kind of thing is very error prone. The correct thing would be to do the bounds conversion here instead of below us.
sstable/reader_virtual.go
line 137 at r2 (raw file):
Previously, sumeerbhola wrote…
Seems like since
virtualState.{lower,upper}
also haveSyntheticPrefix
, the narrowed bound invirtualState.constrainBounds
also haveSyntheticPrefix
, so the inverted bounds will haveContentPrefix
. Is my understanding correct?
Right, the constrainBounds
call in single/twoLevelIterator.init()
will intersect [lower, upper)
with [vState.lower, upper)
then convert from SyntheticPrefix to ContentPrefix.
Removing prefix replacing iterator in #3405 |
We add more logic inside the prefix replacing iterators:
SeekGE
calls even when not strictly necessary, toenable optimizations indicated by the flags.
prefix range, to enable expected behaviors like
SeekGE
after therange followed by
Prev
(orSeekLT
to before the range followedby
Next
)