Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: provide online method to clear leaked gossip infos #85013

Closed
nvanbenschoten opened this issue Jul 25, 2022 · 2 comments · Fixed by #85505
Closed

kv: provide online method to clear leaked gossip infos #85013

nvanbenschoten opened this issue Jul 25, 2022 · 2 comments · Fixed by #85505
Labels
A-kv-gossip C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) good first issue O-postmortem Originated from a Postmortem action item. T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Jul 25, 2022

In a support escalation (https://github.com/cockroachlabs/support/issues/1709) for a v21.1 cluster, we saw that leaked gossip entries could impact foreground latency. The gossip leak itself has since been fixed, but there was no way to clear the leaked gossip values without a simultaneous, cluster-wide restart.

It would have been valuable to have a way to expire gossip information manually.

As a strawman, imagine a builtin function like crdb_internal.expire_gossip_info(key) that did the following:

info := gossip.get(key)
if info != nil {
    gossip.addInfo(key, info.value, ttl=1m)
}

By injecting a 1m expiration, the gossip info would have enough time to propagate across the cluster but would then expire after a minute, clearing out the leaked information.

Jira issue: CRDB-18003

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-gossip O-postmortem Originated from a Postmortem action item. T-kv KV Team labels Jul 25, 2022
@surahman
Copy link
Contributor

surahman commented Jul 26, 2022

Hi @nvanbenschoten, where are the internal builtins located in the source?

Edit: cockroach/pkg/sql/sem/builtins

@surahman
Copy link
Contributor

surahman commented Jul 26, 2022

I have been looking through the data stuct for the Gossip protocol as well as the context available to the builtins but cannot figure out how to access the map of gossip keys. Once accessed, an update is simple:

func (g *Gossip) AddInfo(key string, val []byte, ttl time.Duration) error {

Would you be able to provide any details on how to get the gossip data from ctx?

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 3, 2022
Fixes cockroachdb#85013.
Needed for cockroachlabs/support#1709.

This commit introduces a new `crdb_internal.unsafe_clear_gossip_info` builtin
function which allows admin users to manually clear info objects from the
cluster's gossip network. The function does so by re-gossiping an identical
value for the specified key but with a TTL that is long enough to reasonably
ensure full propagation to all nodes in the cluster but short enough to expire
quickly once propagated.

The function is best-effort. It is possible for the info object with the low
TTL to fail to reach full propagation before reaching its TTL. For instance,
this is possible during a transient network partition. The effect of this is
that the existing gossip info object with a higher (or no) TTL would remain
in the gossip network on some nodes and may eventually propagate back out to
other nodes once the partition heals.

Release note: None
craig bot pushed a commit that referenced this issue Aug 8, 2022
85505: gossip: provide online method to clear leaked gossip infos r=knz a=nvanbenschoten

Fixes #85013.
Needed (in v21.2.X) for cockroachlabs/support#1709.

This commit introduces a new `crdb_internal.unsafe_clear_gossip_info` builtin
function which allows admin users to manually clear info objects from the
cluster's gossip network. The function does so by re-gossiping an identical
value for the specified key but with a TTL that is long enough to reasonably
ensure full propagation to all nodes in the cluster but short enough to expire
quickly once propagated.

The function is best-effort. It is possible for the info object with the low
TTL to fail to reach full propagation before reaching its TTL. For instance,
this is possible during a transient network partition. The effect of this is
that the existing gossip info object with a higher (or no) TTL would remain
in the gossip network on some nodes and may eventually propagate back out to
other nodes once the partition heals.

`@knz:` I'm assigning this to you for a review both because you're as good a
person as any to look at gossip-related changes, and because limited SQL
access to the cluster's gossip network is a nuanced subject in the
context of multi-tenancy.

Release note: None

85522: storage: optimize `DeleteRange` when deleting entire Raft range r=aliher1911 a=erikgrinaker

This patch adds a fast path for `DeleteRange` when deleting an entire
Raft range, by simply marking all live data as deleted in MVCC stats
instead of scanning across all point keys. It will still perform a
time-bound scan to look for conflicts with newer writes, and a range key
scan to take range key fragmentation into account for stats.

There are no behavioral changes. The fast path is therefore tested
comprehensively by adding a metamorphic parameter for it in
`TestMVCCHistories`.

Benchmarks confirm that the fast path is ~constant, while the slow path
is asymptotically linear.

```
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=1000/valueSize=64/entireRange=false-24         	     499	   2319384 ns/op	  48.29 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=1000/valueSize=64/entireRange=true-24          	     577	   1965157 ns/op	  56.99 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=10000/valueSize=64/entireRange=false-24        	     216	   5531790 ns/op	 202.47 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=10000/valueSize=64/entireRange=true-24         	     576	   2014470 ns/op	 555.98 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=100000/valueSize=64/entireRange=false-24       	      32	  37814215 ns/op	 296.18 MB/s
BenchmarkMVCCDeleteRangeUsingTombstone_Pebble/numKeys=100000/valueSize=64/entireRange=true-24        	     589	   2022481 ns/op	5537.75 MB/s
```

Resolves #83696.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
@craig craig bot closed this as completed in dfcb6ef Aug 8, 2022
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 8, 2022
Fixes cockroachdb#85013.
Needed for cockroachlabs/support#1709.

This commit introduces a new `crdb_internal.unsafe_clear_gossip_info` builtin
function which allows admin users to manually clear info objects from the
cluster's gossip network. The function does so by re-gossiping an identical
value for the specified key but with a TTL that is long enough to reasonably
ensure full propagation to all nodes in the cluster but short enough to expire
quickly once propagated.

The function is best-effort. It is possible for the info object with the low
TTL to fail to reach full propagation before reaching its TTL. For instance,
this is possible during a transient network partition. The effect of this is
that the existing gossip info object with a higher (or no) TTL would remain
in the gossip network on some nodes and may eventually propagate back out to
other nodes once the partition heals.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 8, 2022
Fixes cockroachdb#85013.
Needed for cockroachlabs/support#1709.

This commit introduces a new `crdb_internal.unsafe_clear_gossip_info` builtin
function which allows admin users to manually clear info objects from the
cluster's gossip network. The function does so by re-gossiping an identical
value for the specified key but with a TTL that is long enough to reasonably
ensure full propagation to all nodes in the cluster but short enough to expire
quickly once propagated.

The function is best-effort. It is possible for the info object with the low
TTL to fail to reach full propagation before reaching its TTL. For instance,
this is possible during a transient network partition. The effect of this is
that the existing gossip info object with a higher (or no) TTL would remain
in the gossip network on some nodes and may eventually propagate back out to
other nodes once the partition heals.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-gossip C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) good first issue O-postmortem Originated from a Postmortem action item. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants