-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sstable: add more commentary on synthetic suffix replacement #3300
sstable: add more commentary on synthetic suffix replacement #3300
Conversation
7df0e24
to
6b03596
Compare
sstable/block.go
Outdated
@@ -1008,7 +1008,27 @@ func (i *blockIter) SeekLT(key []byte, flags base.SeekLTFlags) (*InternalKey, ba | |||
i.offset = decodeRestart(i.data[i.restarts+4*(index-1):]) | |||
if index < i.numRestarts { | |||
targetOffset = decodeRestart(i.data[i.restarts+4*(index):]) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like this approach better, I'll add a big comment explaining what's going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @msbutler and @sumeerbhola)
sstable/suffix_rewriter.go
line 204 at r3 (raw file):
return err } if err := iter.init(r.Compare, r.Split, inputBlock, r.Properties.GlobalSeqNum, false /* hideObsoletePoints */, nil /* syntheticSuffix */); err != nil {
Sorry to be a pain but could you leave these call sites as they are? I am moving them to an IterTransforms
struct which contains HideObsoletePoints
and SyntheticSuffix
(and the "global" sequence number), and this would just create merge conflicts. It will be addressed very soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented on first two commits.
Sorry to be a pain, but can you separate the third commit into a separate PR, so we can get the block.go changes here merged first. I am finding it a bit hard to focus, given the additional need to jump back to the comments and responses in the original PR.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @msbutler)
sstable/block.go
line 305 at r1 (raw file):
// which will remain valid until the blockIter is closed. The key stability // guarantee is used by the range tombstone and range key code, which knows that // the respective blocks are always encoded with a restart interval of 1. This
- So have we broken this guarantee with suffix replacement, since the key will be backed by
synthSuffixBuf
? - Are we ok because RANGEDELs are not present in sstables that are being suffix replaced?
- What about rangekeys? If someone wrote a RANGEKEYSET [b, g)@10, and now we want to replace suffix @10 with suffix @20, how is that accomplished? Is the plan to not use suffix replacement for those blocks and to eagerly rewrite them when loading them into the block cache, given their rarity? Is there a PR for that?
- Are we only relying on the key stability guarantee when processing rangedels and rangekeys? @jbowens
sstable/block.go
line 1011 at r2 (raw file):
Previously, msbutler (Michael Butler) wrote…
If you like this approach better, I'll add a big comment explaining what's going on here.
Yes, I have a preference for this since it sets up everything correctly for later iteration.
sstable/block.go
line 1016 at r2 (raw file):
// Shift up to the original binary search result and decode the key. i.offset = targetOffset i.nextOffset = i.offset
we don't need to set i.nextOffset
. i.readEntry()
will set that.
sstable/block.go
line 1025 at r2 (raw file):
if i.cmp(i.ikey.UserKey, key) < 0 { i.offset = targetOffset targetOffset = decodeRestart(i.data[i.restarts+4*(index+1):])
index+1 may be out of bounds.
sstable/block.go
line 1030 at r2 (raw file):
} } }
I think these blocks can be rearranged to be clearer (some of this predates your changes, but we need to simplify to compensate for the added complexity)
Let's lift the index==0 case before, so we can delay definition of the targetOffset
and can clarify what it really means. And then we can unindent the main case. Something like:
if index == 0 {
if i.syntheticSuffix != nil {
// The binary search was conducted on keys without suffix replacement,
// implying the first key in the block may be less than the search key. To // double check, get the first key in the block with suffix replacement // and compare to the search key. Consider the following example: suppose // the user searches with a@3, the first key in the block is a@2 and the // block contains a suffix replacement rule of 4. Since a@3 sorts before // a@2, the binary search would return index==0. Without conducting the // suffix replacement, the SeekLT would incorrectly return nil. With // suffix replacement though, a@4 should be returned as a@4 sorts before // a@3. ikey, lazyVal := i.First()
if i.cmp(ikey.UserKey, key) < 0 {
return ikey, lazyVal
}
}
// If index == 0 then all keys in this block are larger than the key
// sought. i.offset = -1
i.nextOffset = 0
return nil, base.LazyValue{}
}
// INVARIANT: index > 0
// Ignoring suffix replacement, we will search between the restart at index-1
// and the restart at index, for the first key >= key, and then on finding it,
// return i.Prev(). We need to know when we have hit the offset for index, since
// then we can stop searching. targetOffset encodes that offset for index.
targetOffset := i.restarts
i.offset = decodeRestart(i.data\[i.restarts+4\*(index-1):\])
if index < i.numRestarts {
targetOffset = decodeRestart(i.data\[i.restarts+4\*(index):\])
if i.syntheticSuffix != nil {
// It is possible that we should be searching between the restart at index
// and that at index+1. TODO: add longer comment with example
naiveOffset := i.offset
// Shift up to the original binary search result and decode the key.
i.offset = targetOffset
i.readEntry()
i.decodeInternalKey(i.key)
i.maybeReplaceSuffix(false /\* allowInPlace \*/)
// If the binary search point is actually less than the search key, post
// replacement, bump the target offset. if i.cmp(i.ikey.UserKey, key) < 0 {
i.offset = targetOffset
if index+1 < i.numRestarts {
targetOffset = decodeRestart(i.data\[i.restarts+4\*(index+1):\])
} else {
targetOffset = i.restarts
}
} else {
i.offset = naiveOffset
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @10, @20, @msbutler, and @sumeerbhola)
sstable/block.go
line 305 at r1 (raw file):
Are we only relying on the key stability guarantee when processing rangedels and rangekeys?
Yeah, I believe that's true
What about rangekeys? If someone wrote a RANGEKEYSET
[b, g)@10
, and now we want to replace suffix@10
with suffix@20
, how is that accomplished?
Also, what happens when a range key fragment has a suffix in either the start or end key? That can happen in ordinary non-ingested sstables due to fragmentation, but maybe it cannot happen when we produce these sstables?
Or maybe are there no range keys in these sstables at all because they're produced without version history?
Previously, jbowens (Jackson Owens) wrote…
Yeah we don't allow rangekeys in external ingest for now (we won't need to for the first OR version). |
Previously, RaduBerinde wrote…
|
Previously, RaduBerinde wrote…The PR above also adds a check that we never try to iterate through range dels or range keys when we have a synthetic suffix. |
6b03596
to
e78faa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed last commit.
Reviewable status: 0 of 7 files reviewed, 5 unresolved discussions (waiting on @10, @20, @msbutler, @RaduBerinde, and @sumeerbhola)
sstable/block.go
line 305 at r1 (raw file):
Previously, RaduBerinde wrote…
Yeah we don't allow rangekeys in external ingest for now (we won't need to for the first OR version).
Added more commentary on this.
sstable/block.go
line 1011 at r2 (raw file):
Previously, sumeerbhola wrote…
Yes, I have a preference for this since it sets up everything correctly for later iteration.
Done.
sstable/block.go
line 1016 at r2 (raw file):
Previously, sumeerbhola wrote…
we don't need to set
i.nextOffset
.i.readEntry()
will set that.
Done.
sstable/block.go
line 1025 at r2 (raw file):
Previously, sumeerbhola wrote…
index+1 may be out of bounds.
Done.
sstable/block.go
line 1030 at r2 (raw file):
Previously, sumeerbhola wrote…
I think these blocks can be rearranged to be clearer (some of this predates your changes, but we need to simplify to compensate for the added complexity)
Let's lift the index==0 case before, so we can delay definition of the
targetOffset
and can clarify what it really means. And then we can unindent the main case. Something like:if index == 0 { if i.syntheticSuffix != nil { // The binary search was conducted on keys without suffix replacement, // implying the first key in the block may be less than the search key. To // double check, get the first key in the block with suffix replacement // and compare to the search key. Consider the following example: suppose // the user searches with a@3, the first key in the block is a@2 and the // block contains a suffix replacement rule of 4. Since a@3 sorts before // a@2, the binary search would return index==0. Without conducting the // suffix replacement, the SeekLT would incorrectly return nil. With // suffix replacement though, a@4 should be returned as a@4 sorts before // a@3. ikey, lazyVal := i.First() if i.cmp(ikey.UserKey, key) < 0 { return ikey, lazyVal } } // If index == 0 then all keys in this block are larger than the key // sought. i.offset = -1 i.nextOffset = 0 return nil, base.LazyValue{} } // INVARIANT: index > 0 // Ignoring suffix replacement, we will search between the restart at index-1 // and the restart at index, for the first key >= key, and then on finding it, // return i.Prev(). We need to know when we have hit the offset for index, since // then we can stop searching. targetOffset encodes that offset for index. targetOffset := i.restarts i.offset = decodeRestart(i.data\[i.restarts+4\*(index-1):\]) if index < i.numRestarts { targetOffset = decodeRestart(i.data\[i.restarts+4\*(index):\]) if i.syntheticSuffix != nil { // It is possible that we should be searching between the restart at index // and that at index+1. TODO: add longer comment with example naiveOffset := i.offset // Shift up to the original binary search result and decode the key. i.offset = targetOffset i.readEntry() i.decodeInternalKey(i.key) i.maybeReplaceSuffix(false /\* allowInPlace \*/) // If the binary search point is actually less than the search key, post // replacement, bump the target offset. if i.cmp(i.ikey.UserKey, key) < 0 { i.offset = targetOffset if index+1 < i.numRestarts { targetOffset = decodeRestart(i.data\[i.restarts+4\*(index+1):\]) } else { targetOffset = i.restarts } } else { i.offset = naiveOffset } } }
Done, with added commentary for suffix replacement.
sstable/suffix_rewriter.go
line 204 at r3 (raw file):
Previously, RaduBerinde wrote…
The PR above also adds a check that we never try to iterate through range dels or range keys when we have a synthetic suffix.
Removed the third commit from this pr. Will deal with later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: 1 of 7 files reviewed, 5 unresolved discussions (waiting on @10, @20, @msbutler, and @RaduBerinde)
we also allow |
Previously, RaduBerinde wrote…
Ah, i should have clarified given our conversation. Yes, that is true. And to clarify, keys with an empty suffix sort before keys with a suffix In other words, conducting suffix replacement on a key without a suffix would make it less than the original value, which contradicts this invariant. This makes me wonder if all my added off-by-one logic breaks down when we conduct suffix replacement on keys without a suffix. |
Previously, msbutler (Michael Butler) wrote…
note: the only keys without a suffix are index keys |
Previously, msbutler (Michael Butler) wrote…
In the meta test we do have keys without prefix and it seems to work fine. |
Previously, RaduBerinde wrote…
Alrighty, i feel a bit better about this after adding a few unit test cases here I think this all works is because we only need to trigger off by one handling when (integer cmp) origSuffix < > searchkeySuffix <> replacementSuffix (i.e. in the searchkeySuffix is in between the other two suffixes). This condition will never trigger because the originalSuffix is the max suffix, the empty suffix. |
See individual commits