Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
gossip: adjust recovery timings to tolerate shorter lease expiration
Fixes cockroachdb#133159. This commit reduces the gossip sentinel TTL from 6s to 3s, so that it is no longer aligned with the node liveness expiration of 6s. The sentinel key informs gossip whether it is connected to the primary gossip network or a partition and thus needs a short TTL so that partitions are fixed quickly. In particular, partitions need to resolve faster than the timeout (6s) or node liveness will be adversely affected, which can trigger false-positives in the `ranges.unavailable` metric. This commit also reduces the gossip stall check interval from 2s to 1s. The stall check interval also affects how quickly gossip partitions are noticed and repaired, controlling how frequently gossip connection attempts are made. The stall check itself is very cheap, so this produces no load on the system. Release note (bug fix): Reduce the duration of partitions in the gossip network when a node crashes in order to eliminate false positives in the `ranges.unavailable` metric.
- Loading branch information