Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: add more commentary on synthetic suffix replacement #3300

Merged
merged 2 commits into from
Feb 18, 2024

Conversation

msbutler
Copy link
Contributor

@msbutler msbutler commented Feb 13, 2024

See individual commits

@msbutler msbutler self-assigned this Feb 13, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler force-pushed the butler-address-sumeer-feedback branch 5 times, most recently from 7df0e24 to 6b03596 Compare February 14, 2024 16:25
@msbutler msbutler changed the title Butler address sumeer feedback sstable: add more commentary on synthetic suffix replacement Feb 14, 2024
@msbutler msbutler marked this pull request as ready for review February 14, 2024 16:29
sstable/block.go Outdated
@@ -1008,7 +1008,27 @@ func (i *blockIter) SeekLT(key []byte, flags base.SeekLTFlags) (*InternalKey, ba
i.offset = decodeRestart(i.data[i.restarts+4*(index-1):])
if index < i.numRestarts {
targetOffset = decodeRestart(i.data[i.restarts+4*(index):])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like this approach better, I'll add a big comment explaining what's going on here.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @msbutler and @sumeerbhola)


sstable/suffix_rewriter.go line 204 at r3 (raw file):

			return err
		}
		if err := iter.init(r.Compare, r.Split, inputBlock, r.Properties.GlobalSeqNum, false /* hideObsoletePoints */, nil /* syntheticSuffix */); err != nil {

Sorry to be a pain but could you leave these call sites as they are? I am moving them to an IterTransforms struct which contains HideObsoletePoints and SyntheticSuffix (and the "global" sequence number), and this would just create merge conflicts. It will be addressed very soon.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented on first two commits.
Sorry to be a pain, but can you separate the third commit into a separate PR, so we can get the block.go changes here merged first. I am finding it a bit hard to focus, given the additional need to jump back to the comments and responses in the original PR.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @msbutler)


sstable/block.go line 305 at r1 (raw file):

// which will remain valid until the blockIter is closed. The key stability
// guarantee is used by the range tombstone and range key code, which knows that
// the respective blocks are always encoded with a restart interval of 1. This
  • So have we broken this guarantee with suffix replacement, since the key will be backed by synthSuffixBuf?
  • Are we ok because RANGEDELs are not present in sstables that are being suffix replaced?
  • What about rangekeys? If someone wrote a RANGEKEYSET [b, g)@10, and now we want to replace suffix @10 with suffix @20, how is that accomplished? Is the plan to not use suffix replacement for those blocks and to eagerly rewrite them when loading them into the block cache, given their rarity? Is there a PR for that?
  • Are we only relying on the key stability guarantee when processing rangedels and rangekeys? @jbowens

sstable/block.go line 1011 at r2 (raw file):

Previously, msbutler (Michael Butler) wrote…

If you like this approach better, I'll add a big comment explaining what's going on here.

Yes, I have a preference for this since it sets up everything correctly for later iteration.


sstable/block.go line 1016 at r2 (raw file):

				// Shift up to the original binary search result and decode the key.
				i.offset = targetOffset
				i.nextOffset = i.offset

we don't need to set i.nextOffset. i.readEntry() will set that.


sstable/block.go line 1025 at r2 (raw file):

				if i.cmp(i.ikey.UserKey, key) < 0 {
					i.offset = targetOffset
					targetOffset = decodeRestart(i.data[i.restarts+4*(index+1):])

index+1 may be out of bounds.


sstable/block.go line 1030 at r2 (raw file):

				}
			}
		}

I think these blocks can be rearranged to be clearer (some of this predates your changes, but we need to simplify to compensate for the added complexity)

Let's lift the index==0 case before, so we can delay definition of the targetOffset and can clarify what it really means. And then we can unindent the main case. Something like:

if index == 0 {  
  if i.syntheticSuffix != nil {  
   // The binary search was conducted on keys without suffix replacement,  
   // implying the first key in the block may be less than the search key. To   // double check, get the first key in the block with suffix replacement   // and compare to the search key. Consider the following example: suppose   // the user searches with a@3, the first key in the block is a@2 and the   // block contains a suffix replacement rule of 4. Since a@3 sorts before   // a@2, the binary search would return index==0. Without conducting the   // suffix replacement, the SeekLT would incorrectly return nil. With   // suffix replacement though, a@4 should be returned as a@4 sorts before   // a@3.   ikey, lazyVal := i.First()  
   if i.cmp(ikey.UserKey, key) < 0 {  
    return ikey, lazyVal  
   }  
  }  
  // If index == 0 then all keys in this block are larger than the key  
  // sought.  i.offset = -1  
  i.nextOffset = 0  
  return nil, base.LazyValue{}  
}  
  
// INVARIANT: index > 0  
  
// Ignoring suffix replacement, we will search between the restart at index-1  
// and the restart at index, for the first key >= key, and then on finding it,  
// return i.Prev(). We need to know when we have hit the offset for index, since  
// then we can stop searching. targetOffset encodes that offset for index.  
targetOffset := i.restarts  
  
i.offset = decodeRestart(i.data\[i.restarts+4\*(index-1):\])  
if index < i.numRestarts {  
  targetOffset = decodeRestart(i.data\[i.restarts+4\*(index):\])  
  if i.syntheticSuffix != nil {  
   // It is possible that we should be searching between the restart at index  
   // and that at index+1. TODO: add longer comment with example  
   naiveOffset := i.offset  
   // Shift up to the original binary search result and decode the key.  
   i.offset = targetOffset  
   i.readEntry()  
   i.decodeInternalKey(i.key)  
   i.maybeReplaceSuffix(false /\* allowInPlace \*/)  
  
   // If the binary search point is actually less than the search key, post  
   // replacement, bump the target offset.   if i.cmp(i.ikey.UserKey, key) < 0 {  
    i.offset = targetOffset  
    if index+1 < i.numRestarts {  
     targetOffset = decodeRestart(i.data\[i.restarts+4\*(index+1):\])  
    } else {  
     targetOffset = i.restarts  
    }  
   } else {  
    i.offset = naiveOffset  
   }  
  }  
}

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 6 unresolved discussions (waiting on @10, @20, @msbutler, and @sumeerbhola)


sstable/block.go line 305 at r1 (raw file):

Are we only relying on the key stability guarantee when processing rangedels and rangekeys?

Yeah, I believe that's true

What about rangekeys? If someone wrote a RANGEKEYSET [b, g)@10, and now we want to replace suffix @10 with suffix @20, how is that accomplished?

Also, what happens when a range key fragment has a suffix in either the start or end key? That can happen in ordinary non-ingested sstables due to fragmentation, but maybe it cannot happen when we produce these sstables?

Or maybe are there no range keys in these sstables at all because they're produced without version history?

@RaduBerinde
Copy link
Member

sstable/block.go line 305 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Are we only relying on the key stability guarantee when processing rangedels and rangekeys?

Yeah, I believe that's true

What about rangekeys? If someone wrote a RANGEKEYSET [b, g)@10, and now we want to replace suffix @10 with suffix @20, how is that accomplished?

Also, what happens when a range key fragment has a suffix in either the start or end key? That can happen in ordinary non-ingested sstables due to fragmentation, but maybe it cannot happen when we produce these sstables?

Or maybe are there no range keys in these sstables at all because they're produced without version history?

Yeah we don't allow rangekeys in external ingest for now (we won't need to for the first OR version).

@RaduBerinde
Copy link
Member

sstable/suffix_rewriter.go line 204 at r3 (raw file):

Previously, RaduBerinde wrote…

Sorry to be a pain but could you leave these call sites as they are? I am moving them to an IterTransforms struct which contains HideObsoletePoints and SyntheticSuffix (and the "global" sequence number), and this would just create merge conflicts. It will be addressed very soon.

#3310

@RaduBerinde
Copy link
Member

sstable/suffix_rewriter.go line 204 at r3 (raw file):

Previously, RaduBerinde wrote…

#3310

The PR above also adds a check that we never try to iterate through range dels or range keys when we have a synthetic suffix.

@msbutler msbutler force-pushed the butler-address-sumeer-feedback branch from 6b03596 to e78faa3 Compare February 16, 2024 20:21
Copy link
Contributor Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed last commit.

Reviewable status: 0 of 7 files reviewed, 5 unresolved discussions (waiting on @10, @20, @msbutler, @RaduBerinde, and @sumeerbhola)


sstable/block.go line 305 at r1 (raw file):

Previously, RaduBerinde wrote…

Yeah we don't allow rangekeys in external ingest for now (we won't need to for the first OR version).

Added more commentary on this.


sstable/block.go line 1011 at r2 (raw file):

Previously, sumeerbhola wrote…

Yes, I have a preference for this since it sets up everything correctly for later iteration.

Done.


sstable/block.go line 1016 at r2 (raw file):

Previously, sumeerbhola wrote…

we don't need to set i.nextOffset. i.readEntry() will set that.

Done.


sstable/block.go line 1025 at r2 (raw file):

Previously, sumeerbhola wrote…

index+1 may be out of bounds.

Done.


sstable/block.go line 1030 at r2 (raw file):

Previously, sumeerbhola wrote…

I think these blocks can be rearranged to be clearer (some of this predates your changes, but we need to simplify to compensate for the added complexity)

Let's lift the index==0 case before, so we can delay definition of the targetOffset and can clarify what it really means. And then we can unindent the main case. Something like:

if index == 0 {  
  if i.syntheticSuffix != nil {  
   // The binary search was conducted on keys without suffix replacement,  
   // implying the first key in the block may be less than the search key. To   // double check, get the first key in the block with suffix replacement   // and compare to the search key. Consider the following example: suppose   // the user searches with a@3, the first key in the block is a@2 and the   // block contains a suffix replacement rule of 4. Since a@3 sorts before   // a@2, the binary search would return index==0. Without conducting the   // suffix replacement, the SeekLT would incorrectly return nil. With   // suffix replacement though, a@4 should be returned as a@4 sorts before   // a@3.   ikey, lazyVal := i.First()  
   if i.cmp(ikey.UserKey, key) < 0 {  
    return ikey, lazyVal  
   }  
  }  
  // If index == 0 then all keys in this block are larger than the key  
  // sought.  i.offset = -1  
  i.nextOffset = 0  
  return nil, base.LazyValue{}  
}  
  
// INVARIANT: index > 0  
  
// Ignoring suffix replacement, we will search between the restart at index-1  
// and the restart at index, for the first key >= key, and then on finding it,  
// return i.Prev(). We need to know when we have hit the offset for index, since  
// then we can stop searching. targetOffset encodes that offset for index.  
targetOffset := i.restarts  
  
i.offset = decodeRestart(i.data\[i.restarts+4\*(index-1):\])  
if index < i.numRestarts {  
  targetOffset = decodeRestart(i.data\[i.restarts+4\*(index):\])  
  if i.syntheticSuffix != nil {  
   // It is possible that we should be searching between the restart at index  
   // and that at index+1. TODO: add longer comment with example  
   naiveOffset := i.offset  
   // Shift up to the original binary search result and decode the key.  
   i.offset = targetOffset  
   i.readEntry()  
   i.decodeInternalKey(i.key)  
   i.maybeReplaceSuffix(false /\* allowInPlace \*/)  
  
   // If the binary search point is actually less than the search key, post  
   // replacement, bump the target offset.   if i.cmp(i.ikey.UserKey, key) < 0 {  
    i.offset = targetOffset  
    if index+1 < i.numRestarts {  
     targetOffset = decodeRestart(i.data\[i.restarts+4\*(index+1):\])  
    } else {  
     targetOffset = i.restarts  
    }  
   } else {  
    i.offset = naiveOffset  
   }  
  }  
}

Done, with added commentary for suffix replacement.


sstable/suffix_rewriter.go line 204 at r3 (raw file):

Previously, RaduBerinde wrote…

The PR above also adds a check that we never try to iterate through range dels or range keys when we have a synthetic suffix.

Removed the third commit from this pr. Will deal with later.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: 1 of 7 files reviewed, 5 unresolved discussions (waiting on @10, @20, @msbutler, and @RaduBerinde)

@msbutler msbutler merged commit e26f47f into cockroachdb:master Feb 18, 2024
11 checks passed
@RaduBerinde
Copy link
Member

sstable/block.go line 438 at r4 (raw file):

	//
	// (1) no two keys in the block share the same prefix.
	// (2) pebble.Compare(keyPrefix{replacementSuffix},keyPrefix{originalSuffix}) < 0

we also allow originalSuffix to be empty, correct?

@msbutler
Copy link
Contributor Author

sstable/block.go line 438 at r4 (raw file):

Previously, RaduBerinde wrote…

we also allow originalSuffix to be empty, correct?

Ah, i should have clarified given our conversation. Yes, that is true.

And to clarify, keys with an empty suffix sort before keys with a suffix
a@,a@5,a@4

In other words, conducting suffix replacement on a key without a suffix would make it less than the original value, which contradicts this invariant.

This makes me wonder if all my added off-by-one logic breaks down when we conduct suffix replacement on keys without a suffix.

@msbutler
Copy link
Contributor Author

sstable/block.go line 438 at r4 (raw file):

Previously, msbutler (Michael Butler) wrote…

Ah, i should have clarified given our conversation. Yes, that is true.

And to clarify, keys with an empty suffix sort before keys with a suffix
a@,a@5,a@4

In other words, conducting suffix replacement on a key without a suffix would make it less than the original value, which contradicts this invariant.

This makes me wonder if all my added off-by-one logic breaks down when we conduct suffix replacement on keys without a suffix.

note: the only keys without a suffix are index keys

@RaduBerinde
Copy link
Member

sstable/block.go line 438 at r4 (raw file):

Previously, msbutler (Michael Butler) wrote…

note: the only keys without a suffix are index keys

In the meta test we do have keys without prefix and it seems to work fine.

@msbutler
Copy link
Contributor Author

sstable/block.go line 438 at r4 (raw file):

Previously, RaduBerinde wrote…

In the meta test we do have keys without prefix and it seems to work fine.

Alrighty, i feel a bit better about this after adding a few unit test cases here
#3320

I think this all works is because we only need to trigger off by one handling when (integer cmp) origSuffix < > searchkeySuffix <> replacementSuffix (i.e. in the searchkeySuffix is in between the other two suffixes). This condition will never trigger because the originalSuffix is the max suffix, the empty suffix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants