Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: optimize MVCCDeleteRangeUsingTombstone #83696

Closed
erikgrinaker opened this issue Jul 1, 2022 · 1 comment · Fixed by #85522
Closed

storage: optimize MVCCDeleteRangeUsingTombstone #83696

erikgrinaker opened this issue Jul 1, 2022 · 1 comment · Fixed by #85522
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-storage Storage Team

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jul 1, 2022

When writing an MVCC range tombstone, we currently scan across the entire span to check for conflicts (newer versions) and adjust MVCC stats. We need a fast-path when we're deleting an entire range, where we can hopefully just mark all the live data as garbage without a scan, and use a TBI to check for conflicts.

Jira issue: CRDB-17214

Epic CRDB-2624

@erikgrinaker erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. T-kv-replication labels Jul 1, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 1, 2022

cc @cockroachdb/replication

@blathers-crl blathers-crl bot added the T-storage Storage Team label Jul 1, 2022
@blathers-crl blathers-crl bot added the A-kv-replication Relating to Raft, consensus, and coordination. label Jul 1, 2022
craig bot pushed a commit that referenced this issue Aug 8, 2022
85505: gossip: provide online method to clear leaked gossip infos r=knz a=nvanbenschoten

Fixes #85013.
Needed (in v21.2.X) for cockroachlabs/support#1709.

This commit introduces a new `crdb_internal.unsafe_clear_gossip_info` builtin
function which allows admin users to manually clear info objects from the
cluster's gossip network. The function does so by re-gossiping an identical
value for the specified key but with a TTL that is long enough to reasonably
ensure full propagation to all nodes in the cluster but short enough to expire
quickly once propagated.

The function is best-effort. It is possible for the info object with the low
TTL to fail to reach full propagation before reaching its TTL. For instance,
this is possible during a transient network partition. The effect of this is
that the existing gossip info object with a higher (or no) TTL would remain
in the gossip network on some nodes and may eventually propagate back out to
other nodes once the partition heals.

`@knz:` I'm assigning this to you for a review both because you're as good a
person as any to look at gossip-related changes, and because limited SQL
access to the cluster's gossip network is a nuanced subject in the
context of multi-tenancy.

Release note: None

85522: storage: optimize `DeleteRange` when deleting entire Raft range r=aliher1911 a=erikgrinaker

This patch adds a fast path for `DeleteRange` when deleting an entire
Raft range, by simply marking all live data as deleted in MVCC stats
instead of scanning across all point keys. It will still perform a
time-bound scan to look for conflicts with newer writes, and a range key
scan to take range key fragmentation into account for stats.

There are no behavioral changes. The fast path is therefore tested
comprehensively by adding a metamorphic parameter for it in
`TestMVCCHistories`.

Benchmarks confirm that the fast path is ~constant, while the slow path
is asymptotically linear.

```
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=1000/valueSize=64/entireRange=false-24         	     499	   2319384 ns/op	  48.29 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=1000/valueSize=64/entireRange=true-24          	     577	   1965157 ns/op	  56.99 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=10000/valueSize=64/entireRange=false-24        	     216	   5531790 ns/op	 202.47 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=10000/valueSize=64/entireRange=true-24         	     576	   2014470 ns/op	 555.98 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=100000/valueSize=64/entireRange=false-24       	      32	  37814215 ns/op	 296.18 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=100000/valueSize=64/entireRange=true-24        	     589	   2022481 ns/op	5537.75 MB/s
```

Resolves #83696.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
@craig craig bot closed this as completed in #85522 Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-storage Relating to our storage engine (Pebble) on-disk storage. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-storage Storage Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant