Skip to content

Commit

Permalink
storage: add experimental MVCC range key primitives
Browse files Browse the repository at this point in the history
This patch adds initial experimental primitives for MVCC range keys,
which will be the foundation for MVCC range tombstones. They are based
on experimental Pebble range keys.

* Data structures:
  * `MVCCRangeKey`
  * internal `nil` value for range tombstones (as with point tombstones)

* `Engine` methods for mutating range keys:
  * `ExperimentalClearMVCCRangeKey()`
  * `ExperimentalPutMVCCRangeKey()`

* `SimpleMVCCIterator` methods for accessing range keys:
  * `HasPointAndRange()`
  * `RangeBounds()`
  * `RangeKeys()`

Range keys do not have a distinct identity, and should instead be
considered a key continuum: they will merge with abutting keys of the
same value, can be partially cleared, can split or merge along with
ranges, and so on. Bounded scans will truncate them to the scan bounds.

Range key support is implemented in `pebbleIterator` and
`intentInterleavingIter`, but not in the rest of the MVCC or KV APIs.
They are not persisted to disk either. Subsequent pull requests will
extend their functionality and integrate them with other components.

Release note: None
  • Loading branch information
erikgrinaker committed Mar 26, 2022
1 parent bb5aa54 commit b2f9c98
Show file tree
Hide file tree
Showing 19 changed files with 1,958 additions and 53 deletions.
15 changes: 15 additions & 0 deletions pkg/kv/kvserver/rangefeed/task_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,21 @@ func (s *testIterator) curKV() storage.MVCCKeyValue {
return s.kvs[s.cur]
}

// HasPointAndRange implements SimpleMVCCIterator.
func (s *testIterator) HasPointAndRange() (bool, bool) {
panic("not implemented")
}

// RangeBounds implements SimpleMVCCIterator.
func (s *testIterator) RangeBounds() (roachpb.Key, roachpb.Key) {
panic("not implemented")
}

// RangeTombstones implements SimpleMVCCIterator.
func (s *testIterator) RangeKeys() []storage.MVCCRangeKey {
panic("not implemented")
}

func TestInitResolvedTSScan(t *testing.T) {
defer leaktest.AfterTest(t)()
startKey := roachpb.RKey("d")
Expand Down
23 changes: 23 additions & 0 deletions pkg/kv/kvserver/spanset/batch.go
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,21 @@ func (i *MVCCIterator) UnsafeValue() []byte {
return i.i.UnsafeValue()
}

// HasPointAndRange implements SimpleMVCCIterator.
func (i *MVCCIterator) HasPointAndRange() (bool, bool) {
panic("not implemented")
}

// RangeBounds implements SimpleMVCCIterator.
func (i *MVCCIterator) RangeBounds() (roachpb.Key, roachpb.Key) {
panic("not implemented")
}

// RangeKeys implements SimpleMVCCIterator.
func (i *MVCCIterator) RangeKeys() []storage.MVCCRangeKey {
panic("not implemented")
}

// ComputeStats is part of the storage.MVCCIterator interface.
func (i *MVCCIterator) ComputeStats(
start, end roachpb.Key, nowNanos int64,
Expand Down Expand Up @@ -599,6 +614,14 @@ func (s spanSetWriter) ClearIterRange(iter storage.MVCCIterator, start, end roac
return s.w.ClearIterRange(iter, start, end)
}

func (s spanSetWriter) ExperimentalPutMVCCRangeKey(rangeKey storage.MVCCRangeKey) error {
panic("not implemented")
}

func (s spanSetWriter) ExperimentalClearMVCCRangeKey(rangeKey storage.MVCCRangeKey) error {
panic("not implemented")
}

func (s spanSetWriter) Merge(key storage.MVCCKey, value []byte) error {
if s.spansOnly {
if err := s.spans.CheckAllowed(SpanReadWrite, roachpb.Span{Key: key.Key}); err != nil {
Expand Down
133 changes: 124 additions & 9 deletions pkg/storage/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,10 @@ func init() {
type SimpleMVCCIterator interface {
// Close frees up resources held by the iterator.
Close()
// SeekGE advances the iterator to the first key in the engine which
// is >= the provided key.
// SeekGE advances the iterator to the first key in the engine which is >= the
// provided key. If range keys are enabled and a range key straddles the seek
// point, if will be surfaced before any point keys except when seeking
// directly to a specific version that exists.
SeekGE(key MVCCKey)
// Valid must be called after any call to Seek(), Next(), Prev(), or
// similar methods. It returns (true, nil) if the iterator points to
Expand All @@ -65,13 +67,53 @@ type SimpleMVCCIterator interface {
// or the next key if the iterator is currently located at the last version
// for a key. NextKey must not be used to switch iteration direction from
// reverse iteration to forward iteration.
//
// If range keys are enabled, range and point keys are treated separately. For
// example, with point keys a@1,b@1,c@1 and range key [b-d)@2, calling
// NextKey() repeatedly will yield a@1,[b-d)@2,b@1,c@1.
NextKey()
// UnsafeKey returns the same value as Key, but the memory is invalidated on
// the next call to {Next,NextKey,Prev,SeekGE,SeekLT,Close}.
UnsafeKey() MVCCKey
// UnsafeValue returns the same value as Value, but the memory is
// invalidated on the next call to {Next,NextKey,Prev,SeekGE,SeekLT,Close}.
UnsafeValue() []byte
// HasPointAndRange returns whether the current iterator position has a point
// key and/or a range key. If Valid() returns true, one of these will be true.
// Range keys are only emitted when requested via IterOptions.KeyTypes.
HasPointAndRange() (bool, bool)
// RangeBounds returns the range bounds for the current range key, or
// (nil, nil) if there are none. The returned keys are only valid until
// the next iterator call. See RangeKeys() for more info on range keys.
RangeBounds() (roachpb.Key, roachpb.Key)
// RangeKeys returns all range keys (for different timestamps) at the current
// key position, or an empty list if there are none. When at a point key, it
// will return all range keys overlapping that point key. The keys are only
// valid until the next iterator operation. Currently, all range keys are MVCC
// range tombstones with an implied value of nil, and the value is therefore
// not exposed.
//
// Range keys are fragmented by Pebble such that all overlapping range keys
// between two fragment bounds form a stack of range key fragments at
// different timestamps. For example, writing [a-e)@1 and [c-g)@2 will yield
// this fragment structure:
//
// 2: |---|---|
// 1: |---|---|
// a b c d e f g
//
// Fragmentation makes all range key properties local, which avoids incurring
// unnecessary access costs across SSTs and CRDB ranges. This fragmentation is
// deterministic on the current range key state, and does not depend on write
// history. Stacking allows easy access to all range keys that overlap a given
// point key.
//
// Range keys may merge or fragment due to other range keys, split and merge
// along with CRDB ranges, can be partially removed by GC, and may be
// truncated by iterator bounds.
//
// TODO(erikgrinaker): Write a tech note on range keys and link it here.
RangeKeys() []MVCCRangeKey
}

// IteratorStats is returned from {MVCCIterator,EngineIterator}.Stats.
Expand Down Expand Up @@ -122,13 +164,15 @@ type MVCCIterator interface {
Prev()

// SeekIntentGE is a specialized version of SeekGE(MVCCKey{Key: key}), when
// the caller expects to find an intent, and additionally has the txnUUID
// for the intent it is looking for. When running with separated intents,
// this can optimize the behavior of the underlying Engine for write heavy
// keys by avoiding the need to iterate over many deleted intents.
// the caller expects to find an intent as the first point key, and
// additionally has the txnUUID for the intent it is looking for. When running
// with separated intents, this can optimize the behavior of the underlying
// Engine for write heavy keys by avoiding the need to iterate over many
// deleted intents.
SeekIntentGE(key roachpb.Key, txnUUID uuid.UUID)

// Key returns the current key.
// Key returns the current key. If the iterator is on a range key only,
// this returns its start key.
Key() MVCCKey
// UnsafeRawKey returns the current raw key which could be an encoded
// MVCCKey, or the more general EngineKey (for a lock table key).
Expand All @@ -147,7 +191,8 @@ type MVCCIterator interface {
// currently used by callers who pass around key information as a []byte --
// this seems avoidable, and we should consider cleaning up the callers.
UnsafeRawMVCCKey() []byte
// Value returns the current value as a byte slice.
// Value returns the current value as a byte slice. For range keys, values
// are obtained via RangeKeys().
Value() []byte
// ValueProto unmarshals the value the iterator is currently
// pointing to using a protobuf decoder.
Expand Down Expand Up @@ -309,8 +354,38 @@ type IterOptions struct {
// use such an iterator is to use it in concert with an iterator without
// timestamp hints, as done by MVCCIncrementalIterator.
MinTimestampHint, MaxTimestampHint hlc.Timestamp
// KeyTypes specifies the types of keys to surface: point and/or range keys.
// Use HasPointAndRange() to determine which key type is present at a given
// iterator position, and RangeBounds() and RangeKeys() to access range keys.
// Defaults to IterKeyTypePointsOnly. For more info, see RangeKeys().
//
// NB: range keys are only supported for use with MVCCIterators, but it is
// legal to enable them for EngineIterators in order to derive cloned
// MVCCIterators from them. Range key behavior for EngineIterators is
// undefined.
KeyTypes IterKeyType
}

// IterKeyType configures which types of keys an iterator should surface.
//
// TODO(erikgrinaker): Combine this with MVCCIterKind somehow.
type IterKeyType uint8

const (
// IterKeyTypePointsOnly iterates over point keys only.
IterKeyTypePointsOnly IterKeyType = iota
// IterKeyTypePointsWithRanges iterates over point keys only, but surfaces
// range keys overlapping those points. Range keys that do not overlap with
// point keys are not surfaced.
//
// TODO(erikgrinaker): Consider moving this down into Pebble.
IterKeyTypePointsWithRanges
// IterKeyTypePointsAndRanges iterates over both point and range keys.
IterKeyTypePointsAndRanges
// IterKeyTypeRangesOnly iterates over only range keys.
IterKeyTypeRangesOnly
)

// MVCCIterKind is used to inform Reader about the kind of iteration desired
// by the caller.
type MVCCIterKind int
Expand Down Expand Up @@ -464,7 +539,7 @@ type Reader interface {
// NewEngineIterator returns a new instance of an EngineIterator over this
// engine. The caller must invoke EngineIterator.Close() when finished
// with the iterator to free resources. The caller can change IterOptions
// after this function returns.
// after this function returns. EngineIterators do not support range keys.
NewEngineIterator(opts IterOptions) EngineIterator
// ConsistentIterators returns true if the Reader implementation guarantees
// that the different iterators constructed by this Reader will see the same
Expand Down Expand Up @@ -552,6 +627,8 @@ type Writer interface {
// this method actually removes entries from the storage engine.
//
// It is safe to modify the contents of the arguments after it returns.
//
// TODO(erikgrinaker): This should clear range keys too.
ClearMVCCRangeAndIntents(start, end roachpb.Key) error
// ClearMVCCRange removes MVCC keys from start (inclusive) to end
// (exclusive). It should not be expected to clear intents, though may clear
Expand All @@ -573,8 +650,46 @@ type Writer interface {
//
// It is safe to modify the contents of the arguments after ClearIterRange
// returns.
//
// TODO(erikgrinaker): This should clear range keys too.
ClearIterRange(iter MVCCIterator, start, end roachpb.Key) error

// ExperimentalClearMVCCRangeKey deletes an MVCC range key from start
// (inclusive) to end (exclusive) at the given timestamp. For any range key
// that straddles the start and end boundaries, only the segments within the
// boundaries will be cleared. Range keys at other timestamps are unaffected.
// Clears are idempotent.
//
// This method is primarily intended for MVCC garbage collection and similar
// internal use.
//
// This method is EXPERIMENTAL: range keys are under active development, and
// have severe limitations including being ignored by all KV and MVCC APIs and
// only being stored in memory.
ExperimentalClearMVCCRangeKey(rangeKey MVCCRangeKey) error

// ExperimentalPutMVCCRangeKey writes an MVCC range key. It will replace any
// existing keys, or any segments that it overlaps. This is currently only
// used for range tombstones, which have an implicit value of nil, and the
// Pebble value parameter is not exposed (adding this will need changes to
// MVCC stats, GC, scans/gets, and more).
//
// A range key does not have a distinct identity, but should be considered a
// key continuum. They can be fragmented or merged by overlapping range keys,
// split/merged along with CRDB ranges, partially removed or replaced,
// and truncated during bounded iteration.
//
// Range keys exist separately from point keys in Pebble, and must be accessed
// via special iterator options and methods such as IterOptions.KeyTypes and
// SimpleMVCCIterator.RangeKeys().
//
// TODO(erikgrinaker): Write a tech note on range keys and link it here.
//
// This method is EXPERIMENTAL: range keys are under active development, and
// have severe limitations including being ignored by all KV and MVCC APIs and
// only being stored in memory.
ExperimentalPutMVCCRangeKey(MVCCRangeKey) error

// Merge is a high-performance write operation used for values which are
// accumulated over several writes. Multiple values can be merged
// sequentially into a single key; a subsequent read will return a "merged"
Expand Down
Loading

0 comments on commit b2f9c98

Please sign in to comment.