Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/base: add doc comment discussing TrySeekUsingNext #3329

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Feb 21, 2024

No description provided.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jbowens jbowens force-pushed the tryseekusingnext-doc branch from 303d580 to 404f675 Compare February 21, 2024 20:49
@jbowens jbowens marked this pull request as ready for review February 21, 2024 22:03
@jbowens jbowens requested review from sumeerbhola and a team February 21, 2024 22:03
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a work-in-progress, but I'm having trouble structuring it in a coherent way

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @sumeerbhola)

@jbowens jbowens force-pushed the tryseekusingnext-doc branch from 404f675 to bdaab6d Compare February 22, 2024 17:29
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely much better than what we currently have, so I am good with merging this
:lgtm:

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @jbowens)


internal/base/doc.go line 30 at r2 (raw file):

// beneath the range deletion. However in doing so, a TrySeekUsingNext flag
// passed by the merging iterator's client no longer transitively holds for
// subsequent seeks of child level iterators in all cases. The merging iterator

do we currently do this? I only see the usual test coverage case in merging_iter.go.

	if invariants.Enabled && flags.TrySeekUsingNext() && !m.forceEnableSeekOpt &&
		disableSeekOpt(key, uintptr(unsafe.Pointer(m))) {
		flags = flags.DisableTrySeekUsingNext()
	}

internal/base/doc.go line 72 at r2 (raw file):

// The pebble levelIter makes use of the TrySeekUsingNext flag to avoid a naive
// seek among a level's file metadatas. When TrySeekUsingNext is passed by the
// caller, the relevant key must fall within the current file or later.

if its a later file we disable DisableTrySeekUsingNext(), so isn't this just for the current file?


internal/base/doc.go line 79 at r2 (raw file):

//
// The sstable iterators use the TrySeekUsingNext flag to avoid naive seeks
// through a table's index structures:

perhaps point to the long comment in reader_iter.go


internal/base/iterator.go line 234 at r2 (raw file):

// instead focuses on the contract expected of the caller.

// TrySeekUsingNext is set when the caller has knowledge that no action has been

that it has performed no action ...


internal/base/iterator.go line 246 at r2 (raw file):

// not return a key less than the current iterator position even if a naive seek
// would land there.
//

We should probably also say that the above promise from the caller is the same for SeekPrefixGE. That is, the prefixes of k1 and k2 can be different. The callee must remember if it did not position itself for k1 (e.g. an sstable iterator that did not position itself for k1 due to the bloom filter not matching), and that it needs to do the full work for k2.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @jbowens)


internal/base/doc.go line 68 at r2 (raw file):
hmm, I assume this relates to the other statement

If true, the callee should not return a key less than the current iterator position even if a naive seek would land there.

doesn't that remove the optionality on the iterator to ignore TrySeekUsingNext? We do that when the bloom filter did not match.

@jbowens jbowens force-pushed the tryseekusingnext-doc branch from bdaab6d to a48efa2 Compare February 23, 2024 17:55
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


internal/base/doc.go line 30 at r2 (raw file):

Previously, sumeerbhola wrote…

do we currently do this? I only see the usual test coverage case in merging_iter.go.

	if invariants.Enabled && flags.TrySeekUsingNext() && !m.forceEnableSeekOpt &&
		disableSeekOpt(key, uintptr(unsafe.Pointer(m))) {
		flags = flags.DisableTrySeekUsingNext()
	}

we don't currently do this, but it's one of the options that was considered to address @RaduBerinde's problem in #3324.


internal/base/doc.go line 68 at r2 (raw file):

Previously, sumeerbhola wrote…

hmm, I assume this relates to the other statement

If true, the callee should not return a key less than the current iterator position even if a naive seek would land there.

doesn't that remove the optionality on the iterator to ignore TrySeekUsingNext? We do that when the bloom filter did not match.

it does remove the optionality, but in a way that the bloom filter use is still compliant.

The contract is that if the previous call to SeekPrefixGE(k1) returned some key k, then SeekPrefixGE(k2, TrySeekUsingNext()=true) must return some key ≥ k and ≥ k2. In the bloom filter case, the bloom filter exclusion causes to return no key at all. This ensures we won't position other levels' range deletions according to some key k that's > k2 during the previous seek.


internal/base/doc.go line 72 at r2 (raw file):

Previously, sumeerbhola wrote…

if its a later file we disable DisableTrySeekUsingNext(), so isn't this just for the current file?

yeah. this is trying to say that the seek among the file metadatas (not within the files themselves) is constrained to [current file, +∞) rather than (-∞,+∞). I expanded the comment to clarify


internal/base/doc.go line 79 at r2 (raw file):

Previously, sumeerbhola wrote…

perhaps point to the long comment in reader_iter.go

Done.


internal/base/iterator.go line 234 at r2 (raw file):

Previously, sumeerbhola wrote…

that it has performed no action ...

Done.


internal/base/iterator.go line 246 at r2 (raw file):

Previously, sumeerbhola wrote…

We should probably also say that the above promise from the caller is the same for SeekPrefixGE. That is, the prefixes of k1 and k2 can be different. The callee must remember if it did not position itself for k1 (e.g. an sstable iterator that did not position itself for k1 due to the bloom filter not matching), and that it needs to do the full work for k2.

Done.

@jbowens jbowens force-pushed the tryseekusingnext-doc branch from a48efa2 to 6191567 Compare February 23, 2024 17:56
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 6 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


internal/base/doc.go line 68 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

it does remove the optionality, but in a way that the bloom filter use is still compliant.

The contract is that if the previous call to SeekPrefixGE(k1) returned some key k, then SeekPrefixGE(k2, TrySeekUsingNext()=true) must return some key ≥ k and ≥ k2. In the bloom filter case, the bloom filter exclusion causes to return no key at all. This ensures we won't position other levels' range deletions according to some key k that's > k2 during the previous seek.

Hrm, maybe a better way of thinking about it is that the mergingIter is violating the contract. You might get a sequence like:

mergingIter.SeekPrefixGE(k1)
  # mergingIter observes a range deletion deleting the span [k1,k3)
  > levelIter.SeekPrefixGE(k3)
mergingIter.SeekPrefixGE(k2, TrySeekUsingNext()=true)
  # mergingIter does not observe the [k1,k3) range deletion because the
  # relevant iterator has already been positioned to a key ≥ k3
  > levelIter.SeekPrefixGE(k2, TrySeekUsingNext()=true)

The caller obeyed the contract, passing TrySeekUsingNext()=true because k2 ≥ k1. The merging iterator violated it, because in the second seek k2 < k3, and yet it passed TrySeekUsingNext()=true.

The semantics that the merging iterator is relying on if TrySeekUsingNext()=true, a iter.Seek[Prefix]GE(k) should be interpreted as iter.Seek[Prefix]GE(max(k, iter.Key()))

@RaduBerinde
Copy link
Member

The semantics that the merging iterator is relying on if TrySeekUsingNext()=true, a iter.Seek[Prefix]GE(k) should be interpreted as iter.Seek[Prefix]GE(max(k, iter.Key()))

The max is something the merging iterator could do instead of requiring all implementations to tolerate it, no?

@jbowens
Copy link
Collaborator Author

jbowens commented Feb 29, 2024

The max is something the merging iterator could do instead of requiring all implementations to tolerate it, no?

That's true, but the original design of TrySeekUsingNext strived to limit the number of key comparisons by performing the comparison at the top-level, and propagating the flag indicating the possibility of applying the optimization to its children. It's possible to avoid performing any kind of max key comparison within the leaf implementations. I think there's a tradeoff between a complicated, subtle interface and eking out performance. I'm not sure the performance warrants the complexity, but it still feels like there's some possible refactor or recharacterization that lets us get the best of both worlds...

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping this

Reviewable status: 0 of 3 files reviewed, 6 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @jbowens and @RaduBerinde)


internal/base/doc.go line 68 at r2 (raw file):
The subtleties in the second and third bullets are related to SeekPrefixGE. I am getting increasingly unsure about (a) what exactly is our current behavior and what makes it correct, (b) what we think is the minimal behavior needed to make things correct (which may suggest a direction for future optimizations, or serve as a way to guarantee correctness of in-progress changes/optimizations).

Can you write down both (a) and (b), and attempt a small proof sketch?

For instance, in the example you described above, I am trying to figure out what causes the mergingIter to have moved past the [k1,k3) span in the first seek. Say that levelIter ignored the seek to k3, due to the bloom filter, and returned nil. In that case it has not moved to k3. Now is the fact that the mergingIter never adds a key that is not matching the prefix to the heap mean that it cannot have moved past the span?
If we realized that the k3 is a different prefix and did not seek the others, the span would still be there on the next seek, if the next seek was to a key < k3. So we would be correct I think. I think this relates to your comment in the other PR

b) maintain an invariant that all the higher levels remain in the heap and their l.tombstones cascade down to l.tombstone.End such that a subsequent SeekGE(k2) will end up adjusting its seek key to the original l.tombstone.End by the time it reaches the unpositioned levels.

I don't know if they need to remain in the heap. Won't they be readded to the heap on the next SeekPrefixGE?

I am very wary of any further code changes in this area without fully reasoning about correctness.


internal/base/doc.go line 31 at r3 (raw file):

// passed by the merging iterator's client no longer transitively holds for
// subsequent seeks of child level iterators in all cases. The merging iterator
// assumes responsibility for ensuring that SeekPrefixGE is propagated to its

nit: for ensuring that TrySeekUsingNext ...?

@jbowens jbowens force-pushed the tryseekusingnext-doc branch from 6191567 to 50ad0ad Compare August 15, 2024 21:34
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 7 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


internal/base/doc.go line 68 at r2 (raw file):

Previously, sumeerbhola wrote…

The subtleties in the second and third bullets are related to SeekPrefixGE. I am getting increasingly unsure about (a) what exactly is our current behavior and what makes it correct, (b) what we think is the minimal behavior needed to make things correct (which may suggest a direction for future optimizations, or serve as a way to guarantee correctness of in-progress changes/optimizations).

Can you write down both (a) and (b), and attempt a small proof sketch?

For instance, in the example you described above, I am trying to figure out what causes the mergingIter to have moved past the [k1,k3) span in the first seek. Say that levelIter ignored the seek to k3, due to the bloom filter, and returned nil. In that case it has not moved to k3. Now is the fact that the mergingIter never adds a key that is not matching the prefix to the heap mean that it cannot have moved past the span?
If we realized that the k3 is a different prefix and did not seek the others, the span would still be there on the next seek, if the next seek was to a key < k3. So we would be correct I think. I think this relates to your comment in the other PR

b) maintain an invariant that all the higher levels remain in the heap and their l.tombstones cascade down to l.tombstone.End such that a subsequent SeekGE(k2) will end up adjusting its seek key to the original l.tombstone.End by the time it reaches the unpositioned levels.

I don't know if they need to remain in the heap. Won't they be readded to the heap on the next SeekPrefixGE?

I am very wary of any further code changes in this area without fully reasoning about correctness.

I've pushed an update to the comment that clarifies a few invariants although there's something around not iterating beyond the iteration prefix that I still need to codify.

@jbowens jbowens force-pushed the tryseekusingnext-doc branch from 50ad0ad to 098c6f3 Compare August 15, 2024 21:52
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 8 unresolved discussions (waiting on @jbowens and @RaduBerinde)


internal/base/doc.go line 59 at r5 (raw file):

// f_i among files f_0, f_1, ..., f_n and the key k_i among keys k_0, k_1, ...,
// k_n known to the internal iterator. We maintain the following merging
// iterator invariants:

I'm reading this for now by only thinking about SeekGE.

Terminology question:

  • are f_0, ..., f_n the files that each level is currently positioned at? M2 suggests these are files in a single level. The use of the same n is confusing if it is the latter. Same question for k_0, k_1, ...

The property I think that we want is that if a level_i has moved to key k_i for various reasons (before SeekGE(s2)), and there is any visible RANGEDEL [s0, s1) in level_i where s1 <= k_i, and s1 > s, then all lower levels must be moved to a key >= s1. This is because we may have moved past the file containing the RANGEDEL in level_i, and the effect of that has to have been enforced. M1 ensures that if this invariant was true before calling SeekGE(s2), then propagating TrySeekUsingNext to everyone will continue to enforce it.

This also means that

  • level iterator must remember its file position. With SeekGE a file must be open, or the level is exhausted.
  • When a file is open, that file must respect TrySeekUsingNext. I don't think this is captured by M2.

SeekPrefixGE is tricky in that we want to allow the file iter to choose to not to position itself if the bloom filter does not match. Also, a levelIter should be allowed to not load a file if that file's start key is beyond the prefix. Which would mean that for correctness we can't move level_i past a RANGEDEL [s0, s1) unless s1 matches the prefix.

Hmm, maybe this is also an insufficient statement. Say we have SeekPrefixGE(b@100) and there is a RANGEDEL[b@200, b@50) and SET b@40, in two consecutive files in level_1. And level_2 has a file with bounds [a, c] and is loaded, and does not match the bloom filter so leaves itself unpositioned. If we then do SeekPrefixGE(b@80), what happens in this level_2 file. Ah yes, it's bloom filter did not match, so can't have b@. So a table iterator can ignore that RANGEDEL precisely because it knows it has no key covered by it that may be uncovered later.

Apologies if I am rambling -- I am starting to swap this back. Also, if I am slowing you and @RaduBerinde back in making progress on this and the other PR, feel free to treat my comments as drive-by commentary and proceed at your own pace.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 8 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


internal/base/doc.go line 31 at r3 (raw file):

Previously, sumeerbhola wrote…

nit: for ensuring that TrySeekUsingNext ...?

didn't follow this comment?


internal/base/doc.go line 59 at r5 (raw file):

M2 suggests these are files in a single level. The use of the same n is confusing if it is the latter. Same question for k_0, k_1, ...

Same level; not sure how to express it without defining an overwhelming number of variables

Which would mean that for correctness we can't move level_i past a RANGEDEL [s0, s1) unless s1 matches the prefix.

I think we can move a level past s1, we just can't propagate it out of the mergingIter (eg, we need to not rely on the outcome of isNextEntryDeleted on a subsequent iterator seek)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants