-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: implement iterator to interleave separated intents #56535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test will fail until pebble is bumped in #56534
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just skimmed this as I'm expecting @nvanbenschoten to scrutinize the logic. Are you planning to add microbenchmarks to compare the performance of iterating over separated vs interleaved intents?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/intent_interleaving_iter.go, line 25 at r1 (raw file):
// intentInterleavingIter makes separated intents appear as interleaved. It // assumes that there can also be intents that are physically interleaved. // However, for a particular roachpb.Key there will be at most intent, either
Nit: Something is off with the grammar of at most intent
. Perhaps at most one intent
.
f56915c
to
6316b09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to add microbenchmarks to compare the performance of iterating over separated vs interleaved intents?
I've added a TODO for microbenchmarks. Both the randomized test and microbenchmarks will happen in follow-up PRs, but before I integrate this with existing code.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
pkg/storage/intent_interleaving_iter.go, line 25 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: Something is off with the grammar of
at most intent
. Perhapsat most one intent
.
Done
9748ba1
to
ee2332c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 8 files at r1, 3 of 3 files at r2, 2 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)
pkg/keys/keys.go, line 401 at r1 (raw file):
// For a scan [start, end) the corresponding lock table scan is // [LTSK(start), LTSK(end)). func LockTableSingleKey(key roachpb.Key, buf []byte) (roachpb.Key, []byte) {
We should note somewhere in the function's comment that buf
is replaced and not appended to.
pkg/storage/intent_interleaving_iter.go, line 30 at r2 (raw file):
// Additionally, the implementation assumes that the only single key locks // in the lock table key space are intents. type intentInterleavingIter struct {
What is the plan for this type once all intents have been moved into the lockTable? It seems like we will still need it for limited scans that do not scan the lockTable ahead of time (#49973), but will other operations that have already scanned the lockTable still use it? Also, do we plan to simplify the implementation once all intents are segregated?
pkg/storage/intent_interleaving_iter.go, line 44 at r2 (raw file):
// state of the intentKey overrides that state. intentKey roachpb.Key // - cmp output of (intentKey, current iter key) when both are non-nil.
Could you expand this out a bit more to discuss whether this is taking timestamps into account or not? For instance, what should this be if the intentIter is positioned on an intent and the iter is positioned on the same key but at a committed version instead of the provisional version? I think it will be 0
after reading some of this file. Or is that not possible because we'll move intentIter
forward first?
EDIT: I was able to resolve these questions after reading the rest of the code in depth, but it could be made easier to understand.
Maybe this brings up a more general point about being more clear about how the MVCCIterator
and the EngineIterator
interleave. It might be helpful to think of this as an interleaving between two MVCCIterator
s, where the first is iter
and the second is a transformation of intentIter
that a) maps the EngineKey to another EngineKey using DecodeLockTableSingleKey
and b) maps the EngineKey to an MVCCKey by giving it a timestamp of 0. If we didn't care about performance, I bet you could even define this type a little more abstractly like that:
intentInterleavingIter = iter_merge(iter, intentIter.map(|k| DecodeLockTableSingleKey(k)).map(|k| MVCCKey{Key: k, Timestamp: 0}))
pkg/storage/intent_interleaving_iter.go, line 71 at r2 (raw file):
var intentKeyBuf []byte if opts.LowerBound != nil { intentOpts.LowerBound, intentKeyBuf = keys.LockTableSingleKey(opts.LowerBound, nil)
Isn't this going to alias intentKeyBuf
and intentOpts.LowerBound
to the same memory? Is that not unsafe? Can the buffers passed in an IterOptions
be reused after a call to NewEngineIterator
?
pkg/storage/intent_interleaving_iter.go, line 105 at r2 (raw file):
// or using SetUpperBound. if !i.hasUpperBound && keys.IsLocal(key.Key) && !keys.IsLocalStoreKey(key.Key) { i.iter.SetUpperBound(keys.LocalRangeLockTablePrefix)
Should we set i.hasUpperBound = true
after this? Or should we be calling i.SetUpperBound
?
pkg/storage/intent_interleaving_iter.go, line 109 at r2 (raw file):
} var intentSeekKey roachpb.Key if key.Timestamp == (hlc.Timestamp{}) {
.IsEmpty()
. We just got rid of all of these in 4fa8a38.
pkg/storage/intent_interleaving_iter.go, line 118 at r2 (raw file):
// Else seeking to a particular version and using prefix iteration, // so don't expect to ever see the intent. i.intentKey = nil
nit: consider adding intentSeekKey = nil
. I know it's not needed, but it makes the intention and the interaction with the next block more clear.
pkg/storage/intent_interleaving_iter.go, line 155 at r2 (raw file):
func (i *intentInterleavingIter) tryDecodeLockKey(valid bool) error { if !valid { i.intentKey = nil
Consider mentioning why this doesn't set i.valid = false
in a comment. It took me a second to understand how this interacts with computePos
.
pkg/storage/intent_interleaving_iter.go, line 206 at r2 (raw file):
// preceding key or exhausted. So step it forward. It will now point to // a key that is the same as the intent key since an intent always has a // corresponding provisional value.
" and provisional values must have a higher timestamp than any committed value on a key". We've actually had bugs with this in the past, so it's worth considering where we're making assumptions about this. We should be able to assume it, but it's probably best to call the assumption out explicitly.
pkg/storage/intent_interleaving_iter.go, line 337 at r2 (raw file):
func (i *intentInterleavingIter) isCurAtIntentIter() bool { return (i.dir > 0 && i.intentCmp <= 0) || (i.dir < 0 && i.intentCmp > 0)
This might be a good place for a comment and maybe even a diagram demonstrating how the two iterators move and why this is the case. If you're feeling inspired.
pkg/storage/intent_interleaving_iter.go, line 360 at r2 (raw file):
func (i *intentInterleavingIter) Key() MVCCKey { unsafeKey := i.UnsafeKey() return MVCCKey{Key: append(roachpb.Key(nil), unsafeKey.Key...), Timestamp: unsafeKey.Timestamp}
nit: append(roachpb.Key(nil), unsafeKey.Key...
isn't quite the same as keyCopy := make([]byte, len(key.Key)); copy(keyCopy, key.Key)
. One reason is that the append
method leaves some capacity on the slice. In the past, I believe we measured it to be a bit slower. I'd lean towards doing exactly what we do in pebbleIterator.Key
.
Also, do we think there might be cases where an Iterator implementation provides a more efficient Key
impl? In other words, would it be worth deferring to the underlying MVCCIterator implementation when !i.isCurAtIntentIter()
.
pkg/storage/intent_interleaving_iter.go, line 392 at r2 (raw file):
return } else { intentSeekKey, i.intentKeyBuf = keys.LockTableSingleKey(key.Key.Next(), i.intentKeyBuf)
Could you give this a small comment that calls out why Next
is required. It took me a minute to figure that out.
pkg/storage/testdata/ii_iter, line 1 at r2 (raw file):
# Both separated and interleaved intents, and one inline meta.
Nice test!
pkg/storage/testdata/ii_iter, line 47 at r2 (raw file):
next ---- cmd: seek-ge meta k=a ts=20 txn=1
nit: this would be a little easier to follow if it more clearly separated the cmd from the output, and if it included the arguments to cmds that take them, like seek-ge
and seek-lt
.
pkg/storage/testdata/ii_iter, line 123 at r2 (raw file):
cmd: next . # Prefix iteration and NextKey. There is no guarantee on what we will
Does this combination make sense to support? Does the "no guarantee" part mean that its behavior is undefined?
pkg/storage/testdata/ii_iter, line 149 at r2 (raw file):
seek-ge k=a ts=25 next next
Should this also include a case like seek-ge k=a ts=5
where we immediately seek to the next key?
ee2332c
to
4c0758a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @sumeerbhola)
pkg/keys/keys.go, line 401 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We should note somewhere in the function's comment that
buf
is replaced and not appended to.
Done
pkg/storage/intent_interleaving_iter.go, line 30 at r2 (raw file):
What is the plan for this type once all intents have been moved into the lockTable? It seems like we will still need it for limited scans that do not scan the lockTable ahead of time (#49973), but will other operations that have already scanned the lockTable still use it?
Yes, that is correct -- it will continue to exist for use in limited scans only.
Also, do we plan to simplify the implementation once all intents are segregated?
Yes, I think there will be (minor) simplifications.
pkg/storage/intent_interleaving_iter.go, line 44 at r2 (raw file):
EDIT: I was able to resolve these questions after reading the rest of the code in depth, but it could be made easier to understand.
I've added a struct-level long comment and another comment here.
It might be helpful to think of this as an interleaving between two MVCCIterators, where the first is iter and the second is a transformation of intentIter that a) maps the EngineKey to another EngineKey using DecodeLockTableSingleKey and b) maps the EngineKey to an MVCCKey by giving it a timestamp of 0. If we didn't care about performance, ...
This is now included in the struct-level comment.
pkg/storage/intent_interleaving_iter.go, line 71 at r2 (raw file):
Isn't this going to alias intentKeyBuf and intentOpts.LowerBound to the same memory? Is that not unsafe? Can the buffers passed in an IterOptions be reused after a call to NewEngineIterator?
Yes to aliasing. No to it being unsafe. Yes, we don't document the copying behavior in pebbleIterator
-- I have now added comments in engine.go
to this effect.
I will probably need to further optimize this when integrating, since (a) iterator reuse should include reusing already allocated memory for intentInterleavingIter
, (b) this intentKeyBuf reuse is quite half-hearted -- I am inclined towards having 2 bufs.
pkg/storage/intent_interleaving_iter.go, line 105 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we set
i.hasUpperBound = true
after this? Or should we be callingi.SetUpperBound
?
I added a comment. This is not a user-specified upper-bound, so we want to keep changing it on every call to SeekGE
, which is why we don't set hasUpperBound=true
.
I've also added some test cases with local keys, to exercise this code.
pkg/storage/intent_interleaving_iter.go, line 109 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
.IsEmpty()
. We just got rid of all of these in 4fa8a38.
Done
pkg/storage/intent_interleaving_iter.go, line 118 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider adding
intentSeekKey = nil
. I know it's not needed, but it makes the intention and the interaction with the next block more clear.
I added a comment instead -- I prefer a code style where we don't set things to their existing value, and instead assert an invariant. Since this invariant is very local (intentSeekKey
was declared a few lines before) I figured it wasn't worth asserting. Let me know if you think otherwise.
pkg/storage/intent_interleaving_iter.go, line 155 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Consider mentioning why this doesn't set
i.valid = false
in a comment. It took me a second to understand how this interacts withcomputePos
.
Done. I didn't mention computePos
here since the caller sometimes uses computePos
and sometimes does something more specialized (by relying on an invariant that avoids a key comparison).
pkg/storage/intent_interleaving_iter.go, line 206 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
" and provisional values must have a higher timestamp than any committed value on a key". We've actually had bugs with this in the past, so it's worth considering where we're making assumptions about this. We should be able to assume it, but it's probably best to call the assumption out explicitly.
I've added a longer comment, which mentions that we don't actually rely on it pointing to the provisional value. But we do rely on it pointing to some version of the same key.
And I audited all the other places in the code that rely on some invariant outside the scope of this iterator and added comments for them.
pkg/storage/intent_interleaving_iter.go, line 337 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This might be a good place for a comment and maybe even a diagram demonstrating how the two iterators move and why this is the case. If you're feeling inspired.
I've added a long comment. It also resulted in a couple of code changes elsewhere.
pkg/storage/intent_interleaving_iter.go, line 360 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit:
append(roachpb.Key(nil), unsafeKey.Key...
isn't quite the same askeyCopy := make([]byte, len(key.Key)); copy(keyCopy, key.Key)
. One reason is that theappend
method leaves some capacity on the slice. In the past, I believe we measured it to be a bit slower. I'd lean towards doing exactly what we do inpebbleIterator.Key
.Also, do we think there might be cases where an Iterator implementation provides a more efficient
Key
impl? In other words, would it be worth deferring to the underlying MVCCIterator implementation when!i.isCurAtIntentIter()
.
Good point. I've duplicated the logic from pebbleIterator.Key
.
pkg/storage/intent_interleaving_iter.go, line 392 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you give this a small comment that calls out why
Next
is required. It took me a minute to figure that out.
Done
pkg/storage/testdata/ii_iter, line 47 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this would be a little easier to follow if it more clearly separated the cmd from the output, and if it included the arguments to cmds that take them, like
seek-ge
andseek-lt
.
Fixed.
pkg/storage/testdata/ii_iter, line 123 at r2 (raw file):
Does this combination make sense to support? Does the "no guarantee" part mean that its behavior is undefined?
Yes, it is undefined on what happens after we are done with the key that SeekGE
was called with. I've adjusted the comment. We don't "support" it, but we don't prevent the user from continuing to iterate either.
pkg/storage/testdata/ii_iter, line 149 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should this also include a case like
seek-ge k=a ts=5
where we immediately seek to the next key?
Good point. Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal)
90009ab
to
d8243d6
Compare
TFTR! |
d8243d6
to
d1f80c1
Compare
intentInterleavingIter allows for both physically separate and physically interleaved intents, and makes both look like interleaved intents. It uses two underlying iterators, a MVCCIterator and an EngineIterator, that are kept synchronized. Informs cockroachdb#41720 Release note: None
d1f80c1
to
95b836d
Compare
bors r+ |
Build succeeded: |
intentInterleavingIter allows for both physically separate
and physically interleaved intents, and makes both look
like interleaved intents. It uses two underlying iterators,
a MVCCIterator and an EngineIterator, that are kept
synchronized.
Informs #41720
Release note: None