Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: utilities to implement sliding window compaction #14368

Merged
merged 12 commits into from
Oct 26, 2023

Commits on Oct 25, 2023

  1. storage: add for_each for compaction reader

    With coroutines, it can be more convenient to just pass in some
    procedural method to the iteration function, rather than implementing a
    reducer.
    
    I intend on using this to build a map using multiple readers. Puts into
    this map will be async, given the possibility of stalls during probing.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    a811691 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    97a52d0 View commit details
    Browse the repository at this point in the history
  3. storage: allow controlling the number of keys indexed in compaction

    This will be useful in testing the sliding window approach.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    25a887c View commit details
    Browse the repository at this point in the history
  4. storage: pass lambda to compaction reducer

    This replaces the inlined should_keep() method in the
    copy_data_segment_reducer with a function, so that a subsequent commit
    can base deduplication on a key_offset_map.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    b838f50 View commit details
    Browse the repository at this point in the history
  5. storage: make compaction segment reducer filter async

    Upcoming changes to use a key_offset_map will require async calls to
    put/get. Inherently our implementation won't do async work, but there
    will be potential for long-running tasks with some form of linear
    probing of a map that should be broken up by yielding.
    
    So, this commit changes the filter interface with an async version.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    4f28cbf View commit details
    Browse the repository at this point in the history
  6. storage: add bitflag for windowed compaction

    Subsequent commits will introduced sliding window compaction by which
    segments will be deduplicated with keys from multiple segments. To
    distinguish such compacted segments (e.g. to tell if there are new
    uncompacted segments that need deduplicating), this adds a new segment
    bitflag.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    feb9d05 View commit details
    Browse the repository at this point in the history
  7. storage: allow keeping the last offset when compacting

    In sliding window compaction, it will be possible that the entire
    segment's data will be removed. To maintain that we still have data in
    each segment, this allows passing the last segment offset to the
    segment reducer, allowing it to tell whether it's empty by the end of
    its reduce and force keeping a record accordingly.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    6b8b1d3 View commit details
    Browse the repository at this point in the history
  8. storage: allow passing compacted index writer to segment reducer

    In sliding window compaction, rather than reusing a segment's existing
    compacted index, we will need to rewrite an index as the segment is
    rewritten with deduplication context from other segments. A compacted
    index writer is now passed to the segment reducer.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    5940d6d View commit details
    Browse the repository at this point in the history
  9. storage: add utils for segment deduplication

    There's some functional overlap with compaction_reducers, but the
    utilities include methods to build a key_offset_map and to rewrite a
    segment and compacted index removing duplicates.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    4f79b9c View commit details
    Browse the repository at this point in the history
  10. storage: add method to collect sliding window compaction range

    Adds a method to collect the set of segments over which to slide during
    a sliding window compaction.
    andrwng committed Oct 25, 2023
    Configuration menu
    Copy the full SHA
    2f2229d View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    9059677 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    e1d9ff3 View commit details
    Browse the repository at this point in the history