Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add UseClearRange option to GCRequest_GCKey #61455

Closed
tbg opened this issue Mar 4, 2021 · 9 comments
Closed

kvserver: add UseClearRange option to GCRequest_GCKey #61455

tbg opened this issue Mar 4, 2021 · 9 comments
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@tbg
Copy link
Member

tbg commented Mar 4, 2021

There is an open PR by @ajwerner that would need dedicated attention to land. It (dramatically) improves the ease with which we can GC large version histories of a key.

The reason the PR is held up is uncertainty about the possible negative impact of many range deletion tombstones. This is a concern that needs to be investigated.

This issue tracks the future of that PR, a job previously done by (now closed) #50194.

#51230

Jira issue: CRDB-3015

@tbg tbg added A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Mar 4, 2021
@jlinder jlinder added T-kv KV Team T-storage Storage Team labels Jun 16, 2021
@tbg
Copy link
Member Author

tbg commented Sep 27, 2021

Note that we also have a more far-reaching proposal on the table, which is moving GC to the storage layer: #57260

Using ClearRange in "today's GC" might be useful as an interim step. We discussed perf impact on GC recently within KV and were unsure how "bad" GC is at the moment. That is, we had no good intuition of whether it is currently responsible for performance blips, etc, in the wild. We do have the queue.gc.processingnanos metric, so we should in theory be able to make such links.

@ajwerner
Copy link
Contributor

I've never really understood how #57260 interacts with mvcc stats. Is the idea that we still read the data to evaluate the stats impact but then lay down some tombstone which will apply lazily to reclaim space?

@tbg
Copy link
Member Author

tbg commented Sep 27, 2021

The stats are indeed the "hard" part there. In a world without stats, I think the solution would simply be that we determine a GCThreshold, and bump that. And we say that versions covered by the GCThreshold cease to be part of the replicated state, i.e. they get omitted by snapshots and it's pebble jobs to remove them from any replica they linger on over time. But of course then a) we have to recompute stats when sending snapshots b) pebble has to somehow.. emit stats updates to us when it compacts.. and the stats are nonlocal (can't determine stats contributions of a single KV pair, you need to know whether it covers something, or is itself live, etc). #57260 essentially says "do something entirely new" or "recompute stats after GC". I'm not sure what that "entirely new" would be, exactly. Maybe it is what you're saying.

@mwang1026 mwang1026 changed the title kvserver: add UseClearRange option to GCReqeuest_GCKey kvserver: add UseClearRange option to GCRequest_GCKey Oct 28, 2021
@sumeerbhola
Copy link
Collaborator

The reason the PR is held up is uncertainty about the possible negative impact of many range deletion tombstones. This is a concern that needs to be investigated.

cockroachdb/pebble#1295 should probably be a prerequisite.

@erikgrinaker
Copy link
Contributor

#51230 uses a clearrange to remove many versions of a key. However, we also need a variant that uses clearrange to remove a long run of keys with no visible versions that all fell below the GC threshold at the same time. Typically because of a DeleteRange or something, especially with MVCC range tombstones (#70414).

@blathers-crl
Copy link

blathers-crl bot commented Jun 7, 2022

cc @cockroachdb/replication

@ajwerner
Copy link
Contributor

ajwerner commented Jun 7, 2022

However, we also need a variant that uses clearrange to remove a long run of keys with no visible versions that all fell below the GC threshold at the same time. Typically because of a DeleteRange or something, especially with MVCC range tombstones (#70414).

This seems hard in that it'll run afoul of all the efforts to make MVCC GC latchless. We may need a new primitive which provides a to ensure that new writes don't accidentally get caught under this ClearRange when considering removing more than one key's versions.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jun 8, 2022

However, we also need a variant that uses clearrange to remove a long run of keys with no visible versions that all fell below the GC threshold at the same time. Typically because of a DeleteRange or something, especially with MVCC range tombstones (#70414).

This seems hard in that it'll run afoul of all the efforts to make MVCC GC latchless. We may need a new primitive which provides a to ensure that new writes don't accidentally get caught under this ClearRange when considering removing more than one key's versions.

Yeah, that bit's annoying. However, we'll need this to keep e.g. schema GC reasonably performant once it's moved to MVCC. Let's keep this issue specifically about removing many versions of a single key (which we can do latchlessly), and discuss over on #70414.

craig bot pushed a commit that referenced this issue Dec 19, 2022
90830: gc: use clear range operations to remove multiple garbage keys across keys and versions r=erikgrinaker a=aliher1911

Summary:

GC Will track consecutive keys and issue GCClearRange requests to remove multiple keys at once if it can find more than configured number of keys. This covers multiple versions of the same key as well as multiple consecutive keys.

On the GC side it will count consecutive keys while building "points batches" e.g. batches containing MVCC keys to delete. Those batches are capped to fit into resulting raft entries once deletion is executed. GC could accumulate more than a single batch in memory in that case. Once necessary number of consecutive keys is found, GCClearRange is issued and pending batches are discarded. If non gc-able data is found, pending batches are GC'd and clear range tracking is restarted.
To protect GC from consuming too much memory for example if it collects a range with very large keys, it uses a configrable memory limit on how many key bytes could be held in pending batches. If this limit is reached before GC reaches minimum desired number of keys, oldest batch is GC'd and starting point for clear range is adjusted.

On the evaluation side, GC requests for ranges will obtain write latches removed range to prevent other requests writing in the middle of the range. However, if only a single key is touched i.e. multiple versions are deleted, then no latches are obtained as it is done when deleting individual versions as all the removed data is below the GC threshold.

GC exposes a metric `queue.gc.info.numclearconsecutivekeys` that shows how many times GCClearRange was used.
GC adds new system setting `kv.gc.clear_range_min_keys` that sets minimum number of keys eligible for GCClearRange.


----

batcheval: handle ClearSubRange request fields in GC requests

Handle ClearSubRange GC request fields to use GCClearRange storage
functions for ranges of keys that contain no visible data instead
of removing each key individually.
This PR only enables processing of request field, while actual GC
and storage functionality is added in previous PRs.

Release note: None

----

gc: add version gate to GC clear range requests

ClearRange fields of GCRequests are ignored by earlier versions,
this feature needs to be version gated to let GC work when talking
to a previous version if GC is run from a different node.

Release note: None

----

gc: add metrics for clear range requests in GC

This commit adds two metrics for clear range functionality in GC:

queue.gc.info.numclearconsecutivekeys is incremented every time GC sends
a ClearRange request for a bunch of consecutive keys to a replica

Release note (ops change): Metric queue.gc.info.numclearconsecutivekeys
added. It exposes number of clear range requests to remove consecutive
keys issued by mvcc garbage collection.

----

gc: GC use delete_range_keys for consecutive ranges of keys

When deleting large chunks of keys usually produced by range tombstones
GC will use point deletion for all versions found within replica key
range.
That kind of operations would generate unnecessary load on the storage
because each individual version must be deleted independently.
This patch adds an ability for GC to find ranges of consecutive keys
which are not interleaved with any newer data and create
delete_range_key GC requests to remove them.

Release note: None

----

rditer: helper to visit multiple replica key spans

IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans
Existing tests moved from ReplicaMVCCDataIterator to
IterateMVCCReplicaKeySpans as former is being sunsetted and only a
single call site exists.

Release note: None

----

storage: add gc operation that uses clear range

Deleting multiple versions of same key or continuous ranges of
point keys with pebble tombstones could be inefficient when range
has lots of garbage.
This pr adds a storage gc operation that uses pebble range tombstones
to remove data.

Release note: None

----

gc: add delete range parameters to gc request

This commit adds clear_sub_range_key parameters to GC request.
clear_sub_range_keys is used by GC queue to remove chunks of
consecutive keys where no new data was written above the GC
threshold and storage can optimize deletions with range
tombstones.
To support new types of keys, GCer interface is also updated
to pass provided keys down to request.

Release note: None

Fixes: #84560, #61455

93732: dev: refactor `doctor` r=healthy-pod a=rickystewart

Create a new interactive `dev doctor` experience, so you don't have to copy-paste commands from `dev doctor` and it can solve problems for you.

Now `dev doctor` will be interactive by default and will ask for user input to confirm the actions it's going to take. You can also set `--autofix` to automatically agree to some default actions without having to interactive. Interactive doctor can be disabled with `--interactive=false`.

Epic: none
Release note: None

93814: workload: fix TPC-H generator to generate correct values for p_name and ps_supplycost r=rytaft a=rytaft

**workload: fix TPC-H generator to generate correct values for p_name**

The TPC-H generator creates values for `p_name` in the `part` table by selecting 5 elements from a list of words called `randPartNames`. It is supposed to select 5 random unique elements from the full list, but prior to this commit, it only selected a permutation of the first 5 elements. This commit fixes the logic and adds a unit test.

Epic: None
Release note: None

**workload: fix TPC-H generator value of ps_supplycost**

Prior to this commit, the TPC-H generator was generating
a value between 0 and 10 for `ps_supplycost` in the `partsupp`
table. However, the spec calls for a random value in the range 
`[1.00 .. 1,000.00]`. This commit fixes the oversight.

Epic: None
Release note: None

93852: build-and-publish-patched-go: upgrade from 1.19.1 to 1.19.4 r=ajwerner a=ajwerner

This picks up a mix of security and compiler fixes. The compiler fixes I'm interested in relate to generic data structures, but the other fixes seem worthwhile too.

* [x] Adjust the Pebble tests to run in new version.
* [x] Adjust version in Docker image ([source](./builder/Dockerfile)).
* [x] Adjust version in the TeamCity agent image ([setup script](./packer/teamcity-agent.sh))
* [x] Rebuild and push the Docker image (following [Basic Process](#basic-process))
* [x] Update `build/teamcity/internal/release/build-and-publish-patched-go/impl.sh` with the new version and adjust SHA256 sums as necessary.
* [x] Run the `Internal / Release / Build and Publish Patched Go` build configuration in TeamCity with your latest version of the script above. This will print out the new URL's and SHA256 sums for the patched Go that you built above.
* [x] Bump the version in `WORKSPACE` under `go_download_sdk`. You may need to bump [rules_go](https://github.com/bazelbuild/rules_go/releases). Also edit the filenames listed in `sdks` and update all the hashes to match what you built in the step above.
* [x] Run `./dev generate bazel` to refresh `distdir_files.bzl`, then `bazel fetch `@distdir//:archives`` to ensure you've updated all hashes to the correct value.
* [x] Bump the version in `builder.sh` accordingly ([source](./builder.sh#L6)).
* [x] Bump the version in `go-version-check.sh` ([source](./go-version-check.sh)), unless bumping to a new patch release.
* [x] Bump the go version in `go.mod`.
* [x] Bump the default installed version of Go in `bootstrap-debian.sh` ([source](./bootstrap/bootstrap-debian.sh)).
* [x] Replace other mentions of the older version of go (grep for `golang:<old_version>` and `go<old_version>`).
* [ ] Update the `builder.dockerImage` parameter in the TeamCity [`Cockroach`](https://teamcity.cockroachdb.com/admin/editProject.html?projectId=Cockroach&tab=projectParams) and [`Internal`](https://teamcity.cockroachdb.com/admin/editProject.html?projectId=Internal&tab=projectParams) projects.
* [ ] Ask the Developer Infrastructure team to deploy new TeamCity agent images according to [packer/README.md](./packer/README.md)


Epic: None

Release note: None

93914: ui: refresh nodes on tenants r=matthewtodd a=matthewtodd

Part of #89949

Now that we can show meaningful region information for tenants, we need to actually trigger the fetching of that information.

Release note: None

Co-authored-by: Oleg Afanasyev <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
@erikgrinaker
Copy link
Contributor

Fixed by #90830.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

6 participants