sstable: add more commentary on synthetic suffix replacement #3300

msbutler · 2024-02-13T21:05:47Z

See individual commits

cockroach-teamcity · 2024-02-13T21:05:56Z

This change is

msbutler · 2024-02-14T16:31:36Z

sstable/block.go

@@ -1008,7 +1008,27 @@ func (i *blockIter) SeekLT(key []byte, flags base.SeekLTFlags) (*InternalKey, ba
 		i.offset = decodeRestart(i.data[i.restarts+4*(index-1):])
 		if index < i.numRestarts {
 			targetOffset = decodeRestart(i.data[i.restarts+4*(index):])
+


If you like this approach better, I'll add a big comment explaining what's going on here.

RaduBerinde

Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @msbutler and @sumeerbhola)

sstable/suffix_rewriter.go line 204 at r3 (raw file):

			return err
		}
		if err := iter.init(r.Compare, r.Split, inputBlock, r.Properties.GlobalSeqNum, false /* hideObsoletePoints */, nil /* syntheticSuffix */); err != nil {

Sorry to be a pain but could you leave these call sites as they are? I am moving them to an IterTransforms struct which contains HideObsoletePoints and SyntheticSuffix (and the "global" sequence number), and this would just create merge conflicts. It will be addressed very soon.

sumeerbhola

commented on first two commits.
Sorry to be a pain, but can you separate the third commit into a separate PR, so we can get the block.go changes here merged first. I am finding it a bit hard to focus, given the additional need to jump back to the comments and responses in the original PR.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @msbutler)

sstable/block.go line 305 at r1 (raw file):

// which will remain valid until the blockIter is closed. The key stability
// guarantee is used by the range tombstone and range key code, which knows that
// the respective blocks are always encoded with a restart interval of 1. This

So have we broken this guarantee with suffix replacement, since the key will be backed by synthSuffixBuf?
Are we ok because RANGEDELs are not present in sstables that are being suffix replaced?
What about rangekeys? If someone wrote a RANGEKEYSET [b, g)@10, and now we want to replace suffix @10 with suffix @20, how is that accomplished? Is the plan to not use suffix replacement for those blocks and to eagerly rewrite them when loading them into the block cache, given their rarity? Is there a PR for that?
Are we only relying on the key stability guarantee when processing rangedels and rangekeys? @jbowens

sstable/block.go line 1011 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

If you like this approach better, I'll add a big comment explaining what's going on here.

Yes, I have a preference for this since it sets up everything correctly for later iteration.

sstable/block.go line 1016 at r2 (raw file):

				// Shift up to the original binary search result and decode the key.
				i.offset = targetOffset
				i.nextOffset = i.offset

we don't need to set i.nextOffset. i.readEntry() will set that.

sstable/block.go line 1025 at r2 (raw file):

				if i.cmp(i.ikey.UserKey, key) < 0 {
					i.offset = targetOffset
					targetOffset = decodeRestart(i.data[i.restarts+4*(index+1):])

index+1 may be out of bounds.

sstable/block.go line 1030 at r2 (raw file):

				}
			}
		}

I think these blocks can be rearranged to be clearer (some of this predates your changes, but we need to simplify to compensate for the added complexity)

Let's lift the index==0 case before, so we can delay definition of the targetOffset and can clarify what it really means. And then we can unindent the main case. Something like:

if index == 0 {  
  if i.syntheticSuffix != nil {  
   // The binary search was conducted on keys without suffix replacement,  
   // implying the first key in the block may be less than the search key. To   // double check, get the first key in the block with suffix replacement   // and compare to the search key. Consider the following example: suppose   // the user searches with a@3, the first key in the block is a@2 and the   // block contains a suffix replacement rule of 4. Since a@3 sorts before   // a@2, the binary search would return index==0. Without conducting the   // suffix replacement, the SeekLT would incorrectly return nil. With   // suffix replacement though, a@4 should be returned as a@4 sorts before   // a@3.   ikey, lazyVal := i.First()  
   if i.cmp(ikey.UserKey, key) < 0 {  
    return ikey, lazyVal  
   }  
  }  
  // If index == 0 then all keys in this block are larger than the key  
  // sought.  i.offset = -1  
  i.nextOffset = 0  
  return nil, base.LazyValue{}  
}  
  
// INVARIANT: index > 0  
  
// Ignoring suffix replacement, we will search between the restart at index-1  
// and the restart at index, for the first key >= key, and then on finding it,  
// return i.Prev(). We need to know when we have hit the offset for index, since  
// then we can stop searching. targetOffset encodes that offset for index.  
targetOffset := i.restarts  
  
i.offset = decodeRestart(i.data\[i.restarts+4\*(index-1):\])  
if index < i.numRestarts {  
  targetOffset = decodeRestart(i.data\[i.restarts+4\*(index):\])  
  if i.syntheticSuffix != nil {  
   // It is possible that we should be searching between the restart at index  
   // and that at index+1. TODO: add longer comment with example  
   naiveOffset := i.offset  
   // Shift up to the original binary search result and decode the key.  
   i.offset = targetOffset  
   i.readEntry()  
   i.decodeInternalKey(i.key)  
   i.maybeReplaceSuffix(false /\* allowInPlace \*/)  
  
   // If the binary search point is actually less than the search key, post  
   // replacement, bump the target offset.   if i.cmp(i.ikey.UserKey, key) < 0 {  
    i.offset = targetOffset  
    if index+1 < i.numRestarts {  
     targetOffset = decodeRestart(i.data\[i.restarts+4\*(index+1):\])  
    } else {  
     targetOffset = i.restarts  
    }  
   } else {  
    i.offset = naiveOffset  
   }  
  }  
}

jbowens

Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @10, @20, @msbutler, and @sumeerbhola)

sstable/block.go line 305 at r1 (raw file):

Are we only relying on the key stability guarantee when processing rangedels and rangekeys?

Yeah, I believe that's true

What about rangekeys? If someone wrote a RANGEKEYSET [b, g)@10, and now we want to replace suffix @10 with suffix @20, how is that accomplished?

Also, what happens when a range key fragment has a suffix in either the start or end key? That can happen in ordinary non-ingested sstables due to fragmentation, but maybe it cannot happen when we produce these sstables?

Or maybe are there no range keys in these sstables at all because they're produced without version history?

RaduBerinde · 2024-02-15T16:05:13Z

sstable/block.go line 305 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Are we only relying on the key stability guarantee when processing rangedels and rangekeys?

Yeah, I believe that's true

What about rangekeys? If someone wrote a RANGEKEYSET [b, g)@10, and now we want to replace suffix @10 with suffix @20, how is that accomplished?

Also, what happens when a range key fragment has a suffix in either the start or end key? That can happen in ordinary non-ingested sstables due to fragmentation, but maybe it cannot happen when we produce these sstables?

Or maybe are there no range keys in these sstables at all because they're produced without version history?

Yeah we don't allow rangekeys in external ingest for now (we won't need to for the first OR version).

RaduBerinde · 2024-02-16T15:58:51Z

sstable/suffix_rewriter.go line 204 at r3 (raw file):

Previously, RaduBerinde wrote…

Sorry to be a pain but could you leave these call sites as they are? I am moving them to an IterTransforms struct which contains HideObsoletePoints and SyntheticSuffix (and the "global" sequence number), and this would just create merge conflicts. It will be addressed very soon.

#3310

RaduBerinde · 2024-02-16T16:11:40Z

sstable/suffix_rewriter.go line 204 at r3 (raw file):

Previously, RaduBerinde wrote…

#3310

The PR above also adds a check that we never try to iterate through range dels or range keys when we have a synthetic suffix.

… block.go

msbutler

Removed last commit.

Reviewable status: 0 of 7 files reviewed, 5 unresolved discussions (waiting on @10, @20, @msbutler, @RaduBerinde, and @sumeerbhola)

sstable/block.go line 305 at r1 (raw file):

Previously, RaduBerinde wrote…

Yeah we don't allow rangekeys in external ingest for now (we won't need to for the first OR version).

Added more commentary on this.

sstable/block.go line 1011 at r2 (raw file):

Previously, sumeerbhola wrote…

Yes, I have a preference for this since it sets up everything correctly for later iteration.

Done.

sstable/block.go line 1016 at r2 (raw file):

Previously, sumeerbhola wrote…

we don't need to set i.nextOffset. i.readEntry() will set that.

Done.

sstable/block.go line 1025 at r2 (raw file):

Previously, sumeerbhola wrote…

index+1 may be out of bounds.

Done.

sstable/block.go line 1030 at r2 (raw file):

Previously, sumeerbhola wrote…

I think these blocks can be rearranged to be clearer (some of this predates your changes, but we need to simplify to compensate for the added complexity)

Let's lift the index==0 case before, so we can delay definition of the targetOffset and can clarify what it really means. And then we can unindent the main case. Something like:

if index == 0 {  
  if i.syntheticSuffix != nil {  
   // The binary search was conducted on keys without suffix replacement,  
   // implying the first key in the block may be less than the search key. To   // double check, get the first key in the block with suffix replacement   // and compare to the search key. Consider the following example: suppose   // the user searches with a@3, the first key in the block is a@2 and the   // block contains a suffix replacement rule of 4. Since a@3 sorts before   // a@2, the binary search would return index==0. Without conducting the   // suffix replacement, the SeekLT would incorrectly return nil. With   // suffix replacement though, a@4 should be returned as a@4 sorts before   // a@3.   ikey, lazyVal := i.First()  
   if i.cmp(ikey.UserKey, key) < 0 {  
    return ikey, lazyVal  
   }  
  }  
  // If index == 0 then all keys in this block are larger than the key  
  // sought.  i.offset = -1  
  i.nextOffset = 0  
  return nil, base.LazyValue{}  
}  
  
// INVARIANT: index > 0  
  
// Ignoring suffix replacement, we will search between the restart at index-1  
// and the restart at index, for the first key >= key, and then on finding it,  
// return i.Prev(). We need to know when we have hit the offset for index, since  
// then we can stop searching. targetOffset encodes that offset for index.  
targetOffset := i.restarts  
  
i.offset = decodeRestart(i.data\[i.restarts+4\*(index-1):\])  
if index < i.numRestarts {  
  targetOffset = decodeRestart(i.data\[i.restarts+4\*(index):\])  
  if i.syntheticSuffix != nil {  
   // It is possible that we should be searching between the restart at index  
   // and that at index+1. TODO: add longer comment with example  
   naiveOffset := i.offset  
   // Shift up to the original binary search result and decode the key.  
   i.offset = targetOffset  
   i.readEntry()  
   i.decodeInternalKey(i.key)  
   i.maybeReplaceSuffix(false /\* allowInPlace \*/)  
  
   // If the binary search point is actually less than the search key, post  
   // replacement, bump the target offset.   if i.cmp(i.ikey.UserKey, key) < 0 {  
    i.offset = targetOffset  
    if index+1 < i.numRestarts {  
     targetOffset = decodeRestart(i.data\[i.restarts+4\*(index+1):\])  
    } else {  
     targetOffset = i.restarts  
    }  
   } else {  
    i.offset = naiveOffset  
   }  
  }  
}

Done, with added commentary for suffix replacement.

sstable/suffix_rewriter.go line 204 at r3 (raw file):

Previously, RaduBerinde wrote…

The PR above also adds a check that we never try to iterate through range dels or range keys when we have a synthetic suffix.

Removed the third commit from this pr. Will deal with later.

sumeerbhola

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: 1 of 7 files reviewed, 5 unresolved discussions (waiting on @10, @20, @msbutler, and @RaduBerinde)

RaduBerinde · 2024-02-19T18:58:16Z

sstable/block.go line 438 at r4 (raw file):

	//
	// (1) no two keys in the block share the same prefix.
	// (2) pebble.Compare(keyPrefix{replacementSuffix},keyPrefix{originalSuffix}) < 0

we also allow originalSuffix to be empty, correct?

msbutler · 2024-02-19T19:46:11Z

sstable/block.go line 438 at r4 (raw file):

Previously, RaduBerinde wrote…

we also allow originalSuffix to be empty, correct?

Ah, i should have clarified given our conversation. Yes, that is true.

And to clarify, keys with an empty suffix sort before keys with a suffix
a@,a@5,a@4

In other words, conducting suffix replacement on a key without a suffix would make it less than the original value, which contradicts this invariant.

This makes me wonder if all my added off-by-one logic breaks down when we conduct suffix replacement on keys without a suffix.

msbutler · 2024-02-19T19:47:36Z

sstable/block.go line 438 at r4 (raw file):

Previously, msbutler (Michael Butler) wrote…

Ah, i should have clarified given our conversation. Yes, that is true.

And to clarify, keys with an empty suffix sort before keys with a suffix
a@,a@5,a@4

In other words, conducting suffix replacement on a key without a suffix would make it less than the original value, which contradicts this invariant.

This makes me wonder if all my added off-by-one logic breaks down when we conduct suffix replacement on keys without a suffix.

note: the only keys without a suffix are index keys

RaduBerinde · 2024-02-19T20:03:26Z

sstable/block.go line 438 at r4 (raw file):

Previously, msbutler (Michael Butler) wrote…

note: the only keys without a suffix are index keys

In the meta test we do have keys without prefix and it seems to work fine.

msbutler · 2024-02-20T14:50:44Z

sstable/block.go line 438 at r4 (raw file):

Previously, RaduBerinde wrote…

In the meta test we do have keys without prefix and it seems to work fine.

Alrighty, i feel a bit better about this after adding a few unit test cases here
#3320

I think this all works is because we only need to trigger off by one handling when (integer cmp) origSuffix < > searchkeySuffix <> replacementSuffix (i.e. in the searchkeySuffix is in between the other two suffixes). This condition will never trigger because the originalSuffix is the max suffix, the empty suffix.

msbutler self-assigned this Feb 13, 2024

msbutler force-pushed the butler-address-sumeer-feedback branch 5 times, most recently from 7df0e24 to 6b03596 Compare February 14, 2024 16:25

msbutler changed the title ~~Butler address sumeer feedback~~ sstable: add more commentary on synthetic suffix replacement Feb 14, 2024

msbutler marked this pull request as ready for review February 14, 2024 16:29

msbutler requested review from sumeerbhola and RaduBerinde February 14, 2024 16:29

msbutler commented Feb 14, 2024

View reviewed changes

RaduBerinde reviewed Feb 14, 2024

View reviewed changes

sumeerbhola requested changes Feb 15, 2024

View reviewed changes

jbowens reviewed Feb 15, 2024

View reviewed changes

msbutler added 2 commits February 16, 2024 13:13

sstable: add more documenation around synthetic suffix replacement in…

c81fbe1

… block.go

sstable: refactor SeekLT synthetic suffix adjustment

e78faa3

msbutler force-pushed the butler-address-sumeer-feedback branch from 6b03596 to e78faa3 Compare February 16, 2024 20:21

msbutler commented Feb 16, 2024

View reviewed changes

sumeerbhola requested a review from RaduBerinde February 17, 2024 02:08

sumeerbhola approved these changes Feb 17, 2024

View reviewed changes

msbutler merged commit e26f47f into cockroachdb:master Feb 18, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sstable: add more commentary on synthetic suffix replacement #3300

sstable: add more commentary on synthetic suffix replacement #3300

msbutler commented Feb 13, 2024 •

edited

Loading

cockroach-teamcity commented Feb 13, 2024

msbutler Feb 14, 2024

RaduBerinde left a comment

sumeerbhola left a comment

jbowens left a comment

RaduBerinde commented Feb 15, 2024

RaduBerinde commented Feb 16, 2024

RaduBerinde commented Feb 16, 2024

msbutler left a comment

sumeerbhola left a comment

RaduBerinde commented Feb 19, 2024

msbutler commented Feb 19, 2024

msbutler commented Feb 19, 2024

RaduBerinde commented Feb 19, 2024

msbutler commented Feb 20, 2024

sstable: add more commentary on synthetic suffix replacement #3300

sstable: add more commentary on synthetic suffix replacement #3300

Conversation

msbutler commented Feb 13, 2024 • edited Loading

cockroach-teamcity commented Feb 13, 2024

msbutler Feb 14, 2024

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

RaduBerinde commented Feb 15, 2024

RaduBerinde commented Feb 16, 2024

RaduBerinde commented Feb 16, 2024

msbutler left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

RaduBerinde commented Feb 19, 2024

msbutler commented Feb 19, 2024

msbutler commented Feb 19, 2024

RaduBerinde commented Feb 19, 2024

msbutler commented Feb 20, 2024

msbutler commented Feb 13, 2024 •

edited

Loading