Skip to content

Commit

Permalink
db: use InterleavingIter to interleave range deletions in compactions
Browse files Browse the repository at this point in the history
Remove the keyspan.InternalIteratorShim that was intended to be a temporary
bridge to allow range deletions to be surfaced within the compaction iterator,
replacing it with the keyspan.InterleavingIter. This mirrors the mechanics of
range keys.

Additionally, this commit removes the compaction iterator's concept of a key
that lies within the same snapshot stripe but is non-skippable. These
"non-skippable" keys lead to bizarre behavior, such as a sequence like:

  a.MERGE.5 a.RANGEDEL.4 a.MERGE.4

yielding the separate unmerged a.MERGE.5 and a.MERGE.4 internal keys (Note
a.RANGEDEL.4 does not delete a.MERGE.4 because they share the same sequence
number.)

The sameStripeNonSkippable fell out of the fact that range deletions were
previously interleaved by the input iterator at their start key with their
sequence number. These sameStripeNonSkippable could interrupt any logic
iterating across the keys of a stripe, complicating the compaction iterator
logic.

In some instances INVALID keys were also returned with sameStripeNonSkippable.
These keys are now treated similarly to other errors encountered during
iteration: i.err is set and newStripeNewKey is returned.

Close #3082.
  • Loading branch information
jbowens committed Jan 17, 2024
1 parent 3b7293f commit b162166
Show file tree
Hide file tree
Showing 14 changed files with 403 additions and 585 deletions.
58 changes: 24 additions & 34 deletions compaction.go
Original file line number Diff line number Diff line change
Expand Up @@ -247,13 +247,7 @@ func (f *fileSizeSplitter) shouldSplitBefore(key *InternalKey, tw *sstable.Write
// grandparent boundary.
f.atGrandparentBoundary = false

// If the key is a range tombstone, the EstimatedSize may not grow right
// away when a range tombstone is added to the fragmenter: It's dependent on
// whether or not the this new range deletion will start a new fragment.
// Range deletions are rare, so we choose to simply not split yet.
// TODO(jackson): Reconsider this, and consider range keys too as a part of
// #2321.
if key.Kind() == InternalKeyKindRangeDelete || tw == nil {
if tw == nil {
return noSplit
}

Expand Down Expand Up @@ -625,12 +619,9 @@ type compaction struct {
// The range key fragmenter. Similar to rangeDelFrag in that it gets range
// keys from the compaction iter and fragments them for output to files.
rangeKeyFrag keyspan.Fragmenter
// The range deletion tombstone iterator, that merges and fragments
// tombstones across levels. This iterator is included within the compaction
// input iterator as a single level.
// TODO(jackson): Remove this when the refactor of FragmentIterator,
// InterleavingIterator, etc is complete.
rangeDelIter keyspan.InternalIteratorShim
// rangeDelInterlaving is an interleaving iterator for range deletions, that
// interleaves range tombstones among the point keys.
rangeDelInterleaving keyspan.InterleavingIter
// rangeKeyInterleaving is the interleaving iter for range keys.
rangeKeyInterleaving keyspan.InterleavingIter

Expand Down Expand Up @@ -1377,36 +1368,35 @@ func (c *compaction) newInputIter(
}
}

// If there's only one constituent point iterator, we can avoid the overhead
// of a *mergingIter. This is possible, for example, when performing a flush
// of a single memtable. Otherwise, combine all the iterators into a merging
// iter.
iter := iters[0]
if len(iters) > 1 {
iter = newMergingIter(c.logger, &c.stats, c.cmp, nil, iters...)
}

// In normal operation, levelIter iterates over the point operations in a
// level, and initializes a rangeDelIter pointer for the range deletions in
// each table. During compaction, we want to iterate over the merged view of
// point operations and range deletions. In order to do this we create one
// levelIter per level to iterate over the point operations, and collect up
// all the range deletion files.
//
// The range deletion levels are first combined with a keyspan.MergingIter
// (currently wrapped by a keyspan.InternalIteratorShim to satisfy the
// internal iterator interface). The resulting merged rangedel iterator is
// then included with the point levels in a single mergingIter.
//
// Combine all the rangedel iterators using a keyspan.MergingIterator and a
// InternalIteratorShim so that the range deletions may be interleaved in
// the compaction input.
// TODO(jackson): Replace the InternalIteratorShim with an interleaving
// iterator.
// The range deletion levels are combined with a keyspan.MergingIter. The
// resulting merged rangedel iterator is then included using an
// InterleavingIter.
// TODO(jackson): Consider using a defragmenting iterator to stitch together
// logical range deletions that were fragmented due to previous file
// boundaries.
if len(rangeDelIters) > 0 {
c.rangeDelIter.Init(c.cmp, rangeDelIters...)
iters = append(iters, &c.rangeDelIter)
mi := &keyspan.MergingIter{}
mi.Init(c.cmp, keyspan.NoopTransform, new(keyspan.MergingBuffers), rangeDelIters...)
c.rangeDelInterleaving.Init(c.comparer, iter, mi, keyspan.InterleavingIterOpts{})
iter = &c.rangeDelInterleaving
}

// If there's only one constituent point iterator, we can avoid the overhead
// of a *mergingIter. This is possible, for example, when performing a flush
// of a single memtable. Otherwise, combine all the iterators into a merging
// iter.
iter := iters[0]
if len(iters) > 0 {
iter = newMergingIter(c.logger, &c.stats, c.cmp, nil, iters...)
}
// If there are range key iterators, we need to combine them using
// keyspan.MergingIter, and then interleave them among the points.
if len(rangeKeyIters) > 0 {
Expand Down Expand Up @@ -3402,7 +3392,7 @@ func (d *DB) runCompaction(
// The interleaved range deletion might only be one of many with
// these bounds. Some fragmenting is performed ahead of time by
// keyspan.MergingIter.
if s := c.rangeDelIter.Span(); !s.Empty() {
if s := c.rangeDelInterleaving.Span(); !s.Empty() {
// The memory management here is subtle. Range deletions
// blocks do NOT use prefix compression, which ensures that
// range deletion spans' memory is available as long we keep
Expand Down
Loading

0 comments on commit b162166

Please sign in to comment.