-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc: use clear range operations to remove multiple garbage keys across keys and versions #90830
Conversation
cac19b4
to
42e5b13
Compare
549ddf1
to
98b1a41
Compare
92bc9b3
to
ed74dcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
-- commits
line 57 at r7:
can you include a release note for the new metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing some comments after an initial high-level pass. I need to do a more thorough pass later, in particular of the GC batching code.
ed74dcc
to
7df3f66
Compare
ffb73af
to
f80f540
Compare
935e3b6
to
217d819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to do another pass over the batch logic, but I've re-reviewed the rest.
>> at end: | ||
data: "A"/10.000000000,0 -> /BYTES/B | ||
|
||
# Check that we can't invoke gc_clear_range if we have live data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a couple of variants that check a live value and a point tombstone that are exactly on the GC threshold too (or do we have them already?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much sense for the full range deletion. I add one more test but overall we bail if there's anything not masked by range tombstone even if technically it should be collectable. E.g. point tombstone followed by range tombstone is not enough for fast path to succeed.
But I'll check if we can add more for non optimized case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation is that I want to make sure we have test coverage for this branch, and the corresponding branch in the full-range deletion:
I don't believe we currently have any test cases that hit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this condition is not needed at all. If key is garbage then it must be covered by newer value (its timestamp must be below coveredBy). And coveredBy timestamp is guaranteed to be at or below gcThreshold so we must be good.
// safe to continue because we bumped the GC | ||
// thresholds. We may leave some inconsistent history | ||
// behind, but nobody can read it. | ||
log.Warningf(ctx, "failed to GC keys with clear range: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we discussed not GCing range keys if point key GC failed.
pkg/kv/kvserver/gc/gc.go
Outdated
// be greater or equal to the last key (lowest, collected last) of the first | ||
// (collected first, containing highest keys) points batch. It is guaranteed | ||
// that there are no garbage keys between this key and the start of first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be equal to the last key? Doesn't it have to be greater than it in order to cover it?
Also, this should be "no keys" rather than "no garbage keys", right? Because any live keys would be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is correct. It could be last when we start iterating, but we will always tighten it to the batch[0][0].Next() before sending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GC batcher LGTM, feel free to merge once comments have been addressed.
PS: this can close out #61455 too. |
f226f81
to
1da6d73
Compare
This commit adds timestamp field to clear_range_key parameters of GC request. New timestamp parameter is used by GC queue to remove chunks of consecutive keys where no new data was written above the GC threshold and storage can optimize deletions with range tombstones. Release note: None
Deleting multiple versions of same key or continuous ranges of point keys with pebble tombstones could be inefficient when range has lots of garbage. This pr adds a storage gc operation that uses pebble range tombstones to remove data. Release note: None
IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans Existing tests moved from ReplicaMVCCDataIterator to IterateMVCCReplicaKeySpans as former is being sunsetted and only a single call site exists. Release note: None
1da6d73
to
c3e941f
Compare
When deleting large chunks of keys usually produced by range tombstones GC will use point deletion for all versions found within replica key range. That kind of operations would generate unnecessary load on the storage because each individual version must be deleted independently. This patch adds an ability for GC to find ranges of consecutive keys which are not interleaved with any newer data and create clear_range_key GC requests to remove them. Release note: None
This commit adds metrics to GC ClearRange requests in GC. Metrics are shared with with existing GC ClearRangeKey full range deletions since they are peforming similar operation and exposed as queue.gc.info.clearrangesuccess/queue.gc.info.clearrangefailed. Release note (ops change): Metrics queue.gc.info.clearrangesuccess queue.gc.info.clearrangefailed updated to include statistics about GC operations that perform ClearRange on parts of range keyspace. Previously those metrics only included requests to remove range data completely when performing a schema change.
ClearRange fields of GCRequests are ignored by earlier versions, this feature needs to be version gated to let GC work when talking to a previous version if GC is run from a different node. Release note: None
Handle ClearRange GC request fields to use GCClearRange storage functions for ranges of keys that contain no visible data instead of removing each key individually. This PR only enables processing of request field, while actual GC and storage functionality is added in previous PRs. Release note: None
c3e941f
to
9efbf4f
Compare
bors r=erikgrinaker |
Build failed (retrying...): |
Build succeeded: |
Summary:
GC Will track consecutive keys and issue GCClearRange requests to remove multiple keys at once if it can find more than configured number of keys. This covers multiple versions of the same key as well as multiple consecutive keys.
On the GC side it will count consecutive keys while building "points batches" e.g. batches containing MVCC keys to delete. Those batches are capped to fit into resulting raft entries once deletion is executed. GC could accumulate more than a single batch in memory in that case. Once necessary number of consecutive keys is found, GCClearRange is issued and pending batches are discarded. If non gc-able data is found, pending batches are GC'd and clear range tracking is restarted.
To protect GC from consuming too much memory for example if it collects a range with very large keys, it uses a configrable memory limit on how many key bytes could be held in pending batches. If this limit is reached before GC reaches minimum desired number of keys, oldest batch is GC'd and starting point for clear range is adjusted.
On the evaluation side, GC requests for ranges will obtain write latches removed range to prevent other requests writing in the middle of the range. However, if only a single key is touched i.e. multiple versions are deleted, then no latches are obtained as it is done when deleting individual versions as all the removed data is below the GC threshold.
GC exposes a metric
queue.gc.info.numclearconsecutivekeys
that shows how many times GCClearRange was used.GC adds new system setting
kv.gc.clear_range_min_keys
that sets minimum number of keys eligible for GCClearRange.batcheval: handle ClearSubRange request fields in GC requests
Handle ClearSubRange GC request fields to use GCClearRange storage
functions for ranges of keys that contain no visible data instead
of removing each key individually.
This PR only enables processing of request field, while actual GC
and storage functionality is added in previous PRs.
Release note: None
gc: add version gate to GC clear range requests
ClearRange fields of GCRequests are ignored by earlier versions,
this feature needs to be version gated to let GC work when talking
to a previous version if GC is run from a different node.
Release note: None
gc: add metrics for clear range requests in GC
This commit adds two metrics for clear range functionality in GC:
queue.gc.info.numclearconsecutivekeys is incremented every time GC sends
a ClearRange request for a bunch of consecutive keys to a replica
Release note (ops change): Metric queue.gc.info.numclearconsecutivekeys
added. It exposes number of clear range requests to remove consecutive
keys issued by mvcc garbage collection.
gc: GC use delete_range_keys for consecutive ranges of keys
When deleting large chunks of keys usually produced by range tombstones
GC will use point deletion for all versions found within replica key
range.
That kind of operations would generate unnecessary load on the storage
because each individual version must be deleted independently.
This patch adds an ability for GC to find ranges of consecutive keys
which are not interleaved with any newer data and create
delete_range_key GC requests to remove them.
Release note: None
rditer: helper to visit multiple replica key spans
IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans
Existing tests moved from ReplicaMVCCDataIterator to
IterateMVCCReplicaKeySpans as former is being sunsetted and only a
single call site exists.
Release note: None
storage: add gc operation that uses clear range
Deleting multiple versions of same key or continuous ranges of
point keys with pebble tombstones could be inefficient when range
has lots of garbage.
This pr adds a storage gc operation that uses pebble range tombstones
to remove data.
Release note: None
gc: add delete range parameters to gc request
This commit adds clear_sub_range_key parameters to GC request.
clear_sub_range_keys is used by GC queue to remove chunks of
consecutive keys where no new data was written above the GC
threshold and storage can optimize deletions with range
tombstones.
To support new types of keys, GCer interface is also updated
to pass provided keys down to request.
Release note: None
Fixes: #84560, #61455