From 6191567b33a8a27ab433ad1b6f8964e9ac35ad7f Mon Sep 17 00:00:00 2001 From: Jackson Owens Date: Wed, 21 Feb 2024 15:42:20 -0500 Subject: [PATCH] internal/base: add doc comment discussing TrySeekUsingNext --- internal/base/doc.go | 99 +++++++++++++++++++++++++++++++ internal/base/iterator.go | 33 +++++++---- sstable/reader_iter_single_lvl.go | 6 ++ 3 files changed, 127 insertions(+), 11 deletions(-) create mode 100644 internal/base/doc.go diff --git a/internal/base/doc.go b/internal/base/doc.go new file mode 100644 index 0000000000..1fc619e287 --- /dev/null +++ b/internal/base/doc.go @@ -0,0 +1,99 @@ +// Copyright 2024 The LevelDB-Go and Pebble Authors. All rights reserved. Use +// of this source code is governed by a BSD-style license that can be found in +// the LICENSE file. + +// Package base defines fundamental types used across Pebble, including keys, +// iterators, etc. +// +// # Iterators +// +// The [InternalIterator] interface defines the iterator interface implemented +// by all iterators over point keys. Internal iterators are composed to form an +// "iterator stack," resulting in a single internal iterator (see mergingIter in +// the pebble package) that yields a merged view of the LSM. +// +// The SeekGE and SeekPrefixGE positioning methods take a set of flags +// [SeekGEFlags] allowing the caller to provide additional context to iterator +// implementations. The TrySeekUsingNext flag is set when the caller has +// knowledge that no action has been performed to move this iterator beyond the +// first key that would be found if this iterator were to honestly do the +// intended seek. This allows a class of optimizations where an internal +// iterator may avoid a full naive repositioning if the iterator is already +// at a proximate position. This also means every caller (including intermediary +// internal iterators within the iterator stack) must preserve this +// relationship. +// +// For example, if a range deletion deletes the remainder of a prefix, the +// merging iterator may be able to elide a SeekPrefixGE on level iterators +// beneath the range deletion. However in doing so, a TrySeekUsingNext flag +// passed by the merging iterator's client no longer transitively holds for +// subsequent seeks of child level iterators in all cases. The merging iterator +// assumes responsibility for ensuring that SeekPrefixGE is propagated to its +// consitutent iterators only when valid. +// +// Description of TrySeekUsingNext mechanics across the iterator stack: +// +// As the top-level entry point of user seeks, the [pebble.Iterator] is +// responsible for detecting when consecutive seeks move monotonically forward. +// It saves seek keys and compares consecutive seek keys to decide whether to +// propagate the TrySeekUsingNext flag to its [InternalIterator]. +// +// The [pebble.Iterator] also has its own TrySeekUsingNext optimization in +// SeekGE: Above the [InternalIterator] interface, the [pebble.Iterator]'s +// SeekGE method detects consecutive seeks to monotonically increasing keys and +// examines the current key. If the iterator is already positioned appropriately +// (at a key ≥ the seek key), it elides the entire seek of the internal +// iterator. +// +// The pebble mergingIter does not perform any TrySeekUsingNext optimization +// itself, but it must preserve the TrySeekUsingNext contract in its calls to +// its child iterators because it passes the TrySeekUsingNext flag as-is to its +// child iterators. It can do this because it always translates calls to its +// SeekGE and SeekPrefixGE methods as equivalent calls to every child iterator. +// However there are a few subtleties: +// +// - In some cases the calls made to child iterators may only be equivalent +// within the context of the iterator's visible sequence number. For example, +// if a range deletion tombstone is present on a level, seek keys propagated +// to lower-levelled child iterators may be adjusted without violating the +// transitivity of the TrySeekUsingNext flag and its invariants so long as +// the mergingIter is always reading state at the same visible sequence +// number. +// - The mergingIter takes care to avoid ever advancing a child iterator that's +// already positioned beyond the current iteration prefix. +// - When propagating TrySeekUsingNext to its child iterators, the mergingIter +// must propagate it to all child iterators or none. This is required because +// of the mergingIter's handling of range deletions. Unequal application of +// TrySeekUsingNext may cause range deletions that have already been skipped +// over in a level to go unseen, despite being relevant to other levels that +// do not use TrySeekUsingNext. +// +// The pebble levelIter makes use of the TrySeekUsingNext flag to avoid a naive +// seek within the level's B-Tree of files. When TrySeekUsingNext is passed by +// the caller, the relevant key must fall within the current file or a later +// file. The search space is reduced from (-∞,+∞) to [current file, +∞). If the +// current file's bounds overlap the key, the levelIter propagates the +// TrySeekUsingNext to the current sstable iterator. If the levelIter must +// advance to a new file, it drops the flag because the new file's sstable +// iterator is still unpositioned. +// +// In-memory iterators arenaskl.Iterator and batchskl.Iterator make use of the +// TrySeekUsingNext flag, attempting a fixed number of Nexts before falling back +// to performing a seek using skiplist structures. +// +// The sstable iterators use the TrySeekUsingNext flag to avoid naive seeks +// through a table's index structures. See the long comment in +// sstable/reader_iter.go for more details: +// - If an iterator is already exhausted, either because there are no +// subsequent point keys or because the upper bound has been reached, the +// iterator uses TrySeekUsingNext to avoid any repositioning at all. +// - Otherwise, a TrySeekUsingNext flag causes the sstable Iterator to Next +// forward a capped number of times, stopping as soon as a key ≥ the seek key +// is discovered. +// - The sstable iterator does not always position itself in response to a +// SeekPrefixGE even when TrySeekUsingNext()=false, because bloom filters may +// indicate the prefix does not exist within the file. The sstable iterator +// takes care to remember when it didn't position itself, so that a +// subsequent seek using TrySeekUsingNext does NOT try to reuse the current +// iterator position. +package base diff --git a/internal/base/iterator.go b/internal/base/iterator.go index 39e05e1fc7..be334803e5 100644 --- a/internal/base/iterator.go +++ b/internal/base/iterator.go @@ -227,17 +227,28 @@ const ( // SeekGEFlagsNone is the default value of SeekGEFlags, with all flags disabled. const SeekGEFlagsNone = SeekGEFlags(0) -// TrySeekUsingNext indicates whether a performance optimization was enabled -// by a caller, indicating the caller has not done any action to move this -// iterator beyond the first key that would be found if this iterator were to -// honestly do the intended seek. For example, say the caller did a -// SeekGE(k1...), followed by SeekGE(k2...) where k1 <= k2, without any -// intermediate positioning calls. The caller can safely specify true for this -// parameter in the second call. As another example, say the caller did do one -// call to Next between the two Seek calls, and k1 < k2. Again, the caller can -// safely specify a true value for this parameter. Note that a false value is -// always safe. The callee is free to ignore the true value if its -// implementation does not permit this optimization. +// TODO(jackson): Rename TrySeekUsingNext to MonotonicallyForward or something +// similar that avoids prescribing the implementation of the optimization but +// instead focuses on the contract expected of the caller. + +// TrySeekUsingNext is set when the caller has knowledge that it has performed +// no action to move this iterator beyond the first key that would be found if +// this iterator were to honestly do the intended seek. This enables a class of +// performance optimizations within various internal iterator implementations. +// For example, say the caller did a SeekGE(k1...), followed by SeekGE(k2...) +// where k1 <= k2, without any intermediate positioning calls. The caller can +// safely specify true for this parameter in the second call. As another +// example, say the caller did do one call to Next between the two Seek calls, +// and k1 < k2. Again, the caller can safely specify a true value for this +// parameter. Note that a false value is always safe. If true, the callee should +// not return a key less than the current iterator position even if a naive seek +// would land there. +// +// The same promise applies to SeekPrefixGE: Prefixes of k1 and k2 may be +// different. If the callee does not position itself for k1 (for example, an +// sstable iterator that elides a seek due to bloom filter exclusion), the +// callee must remember it did not position itself for k1 and that it must +// perform the full seek. // // We make the caller do this determination since a string comparison of k1, k2 // is not necessarily cheap, and there may be many iterators in the iterator diff --git a/sstable/reader_iter_single_lvl.go b/sstable/reader_iter_single_lvl.go index 2cec8be64f..152930cfc9 100644 --- a/sstable/reader_iter_single_lvl.go +++ b/sstable/reader_iter_single_lvl.go @@ -835,6 +835,12 @@ func (i *singleLevelIterator) seekPrefixGE( if checkFilter && i.reader.tableFilter != nil { if !i.lastBloomFilterMatched { // Iterator is not positioned based on last seek. + // + // TODO(jackson): Would it be worth keeping the + // TrySeekUsingNext optimization if the previous SeekPrefixGE call + // that hit the bloom filter exclusion case also had + // TrySeekUsingNext()=true (in which case the position from two + // operations ago transitively still holds)? flags = flags.DisableTrySeekUsingNext() } i.lastBloomFilterMatched = false