Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: utilities to implement sliding window compaction #14368

Merged
merged 12 commits into from
Oct 26, 2023

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Oct 23, 2023

Adds utility methods that will be used in sliding window comapction. There are a few pieces to this:

  • Selecting the "sliding range", the range of segments that are eligible for compaction in a given round of compaction. Only "traditionally" compactible segments (i.e. stable, can't be active) are viable, and segments that have had their keys deduplicated in a previous been sliding window compaction no longer need deduplication unless new non-compacted data shows up.
  • Building the offset map, a mapping of keys to the latest offset seen for that key, with keys from multiple segments. Within the sliding range, we look at the newest segment and add all keys to the map, iterating on older and older segments. We keep track of the earliest segment that has been fully indexed, and in the next round of compaction, that is the new upper bound of the sliding range.
  • Deduplicating the keys with the offset map. In this implementation, one segment is rewritten and replaced at a time, similar to self compaction.

This PR introduces the mechanisms to perform the above steps, but doesn't orchestrate them into a method to be used by compaction. This will come in a follow-up PR.

Fixes #14363

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

src/v/storage/compaction_reducers.cc Outdated Show resolved Hide resolved
Comment on lines +155 to 158
// Compacted index writer for the newly written segment. May not be
// supplied if the compacted index isn't expected to change, e.g. when
// rewriting a single segment filtering with its own compacted index.
compacted_index_writer* _compacted_idx;
index_state _idx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused since we are writing to two indices. What will be the end state here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We need to self compact first, so we still need to build the old compacted segment index. Still, I don't get the purpose of the new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general having thew new compacted index seems useful in cases where there are subsequent self compactions, to avoid having to rebuild it by reading the segment. That will happen after restarting, since the segment bitflags are in memory only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused. What does _compacted_idx point to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-read deduplicate_segment and it makes more sense now. Is the idea to update both the compaction index via (compaction_index_writer) and the segment index via the return value of deduplicate_segment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. FWIW I have a follow-up change to pass the index_state in as an input too, so the compacted index writer, index, and appender would all be decoupled from a single segment reducer, the idea there is that rather than rewriting a segment at a time, we could merge 2 (or more eventually) segments at a time by having the segment reducers share the writers.

}
// We should only keep the record if its offset is equal or
// higher than that indexed.
co_return map_offset.value() <= o;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the offset ever be equal to the indexed one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the offset being equal means that this record was the highest indexed offset for the key.

The offset being higher means that the record has a higher offset than what we've indexed, e.g. because we only partially indexed a segment and a later offset in the segment had the given key at a higher offset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We iterate through the segments in reverse order, but read from the compaction index in the natural order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, exactly

src/v/storage/segment_deduplication_utils.h Outdated Show resolved Hide resolved
src/v/storage/segment_deduplication_utils.cc Outdated Show resolved Hide resolved
@@ -291,6 +298,7 @@ class disk_log_impl final : public log {
mutex _segments_rolling_lock;

std::optional<model::offset> _cloud_gc_offset;
std::optional<model::offset> _last_compaction_window_start_offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be persisted on disk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should, but as a short term follow-up task. Currently compaction is already susceptible to duplicating work after restarts since we don't persist bitflags.

@andrwng andrwng force-pushed the compaction-dedupe-utils branch 2 times, most recently from a8a9daa to 8086174 Compare October 24, 2023 21:43
VladLazar
VladLazar previously approved these changes Oct 25, 2023
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/v/storage/compaction_reducers.cc Outdated Show resolved Hide resolved
src/v/storage/compaction_reducers.cc Outdated Show resolved Hide resolved
@@ -287,6 +287,19 @@ ss::future<ss::stop_iteration> copy_data_segment_reducer::do_compaction(
if (to_copy == std::nullopt) {
co_return stop_t::no;
}
if (_compacted_idx && is_compactible(to_copy.value())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than reusing a segment's existing
compacted index, we will need to rewrite an index as the segment is
rewritten with deduplication context from other segments

I'm a bit unsure why anything related to the compacted index needs to be changed. It's a reflection of what is in a segment--(key,offset) pairs--and I wouldn't think that that changes based on how the segment is created (ie single segment compaction vs compaction with larger sliding window).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method builds a new segment, and the keys in the compacted index should reflect that, no? It's possible that the new segment has virtually no keys because all of the new values were written in newer segments, in which case keeping the existing compacted index would be wasteful.

To your point, we could just delete the compacted index and instead rely on a subsequent round of self compaction to create the indexes. OTOH I'm thinking that windowed compaction would be a sort of supserset of self compaction: if you've done windowed compaction, there's no need to self compact because the indexes are already there.

src/v/storage/segment_deduplication_utils.cc Outdated Show resolved Hide resolved
With coroutines, it can be more convenient to just pass in some
procedural method to the iteration function, rather than implementing a
reducer.

I intend on using this to build a map using multiple readers. Puts into
this map will be async, given the possibility of stalls during probing.
This will be useful in testing the sliding window approach.
This replaces the inlined should_keep() method in the
copy_data_segment_reducer with a function, so that a subsequent commit
can base deduplication on a key_offset_map.
Upcoming changes to use a key_offset_map will require async calls to
put/get. Inherently our implementation won't do async work, but there
will be potential for long-running tasks with some form of linear
probing of a map that should be broken up by yielding.

So, this commit changes the filter interface with an async version.
Subsequent commits will introduced sliding window compaction by which
segments will be deduplicated with keys from multiple segments. To
distinguish such compacted segments (e.g. to tell if there are new
uncompacted segments that need deduplicating), this adds a new segment
bitflag.
In sliding window compaction, it will be possible that the entire
segment's data will be removed. To maintain that we still have data in
each segment, this allows passing the last segment offset to the
segment reducer, allowing it to tell whether it's empty by the end of
its reduce and force keeping a record accordingly.
In sliding window compaction, rather than reusing a segment's existing
compacted index, we will need to rewrite an index as the segment is
rewritten with deduplication context from other segments. A compacted
index writer is now passed to the segment reducer.
There's some functional overlap with compaction_reducers, but the
utilities include methods to build a key_offset_map and to rewrite a
segment and compacted index removing duplicates.
Adds a method to collect the set of segments over which to slide during
a sliding window compaction.
@andrwng andrwng self-assigned this Oct 25, 2023
@andrwng
Copy link
Contributor Author

andrwng commented Oct 26, 2023

CI failure is #14451

@dotnwat dotnwat merged commit 3eaf1ba into redpanda-data:dev Oct 26, 2023
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

utilities to deduplicate across more than one segment and term
4 participants