Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: garbage collect MVCC range tombstones #70414

Closed
erikgrinaker opened this issue Sep 20, 2021 · 4 comments
Closed

kvserver: garbage collect MVCC range tombstones #70414

erikgrinaker opened this issue Sep 20, 2021 · 4 comments
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Sep 20, 2021

In #70412, we will implement MVCC range deletion tombstones. MVCC garbage collection must be extended to handle these, preferably by relying on Pebble range tombstones via Writer.ClearRange for <<O(n) removals when possible.

See also the (currently internal) design document.

Epic CRDB-2624

Jira issue: CRDB-10064

@erikgrinaker erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team labels Sep 20, 2021
@erikgrinaker
Copy link
Contributor Author

There's a related GC range clear improvement in #61455 that we should do at the same time.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jun 8, 2022

As @ajwerner mentions over on #61455, using Pebble range tombstones to clear a wide swath of keys (e.g. under an MVCC range tombstone) will require a latch over the keyspan to prevent it from racing with a new write above it which will be incorrectly cleared. However, this is in conflict with efforts to make GC latchless in #55293.

@ajwerner says:

We may need a new primitive which provides a to ensure that new writes don't accidentally get caught under this ClearRange when considering removing more than one key's versions.

Not sure which primitive you have in mind, but I feel like if we have a heuristic to only use range clears for large amounts of data, then taking out a latch for that case is possibly fine -- especially since these operations are designed to be cheap compared to point clears. In the case where we're clearing an entire range (e.g. after dropping a table) we can likely avoid a stats computation too, which should make these operations constant-time, and there typically wouldn't be any other clients to contend with.

Alternatively, we could push this down to Pebble -- either by having a range clear operation with suffix+seqnum predicates, or by moving MVCC GC down into Pebble compactions entirely as discussed in #57260 (comment) (a much larger lift).

But in the short term, it seems to me like using a latch for ranged GC is the only realistic option, and we can be conservative in when we choose to use ranged GC (typically for schema GC).

CC @aliher1911 @aayushshah15 @nvanbenschoten

@aliher1911
Copy link
Contributor

I think we already discussed that briefly in the past and what you are suggesting was current consensus. If we detect that all data goes, gc request would obtain a write latch for the range (or part of it), otherwise we could proceed with latchless.

@erikgrinaker
Copy link
Contributor Author

Resolved by #83265.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants