storage: transfer leases and leadership more thoroughly on graceful shutdown #44204
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
When we drain a server, we spent a total of at most five seconds (per store) to move leases and Raft leaders to other nodes before we continue shutting down:
cockroach/pkg/storage/store.go
Lines 969 to 991 in f702e9d
In large deployments, this may not be enough. We need to improve this logic; ideally it can determine with good enough accuracy when to give up (for example, a single surviving node in a three node deployment has no chance at transferring leases away), or we can justify trying "forever" (i.e. until the operator issues a hard shutdown). Ideally we don't need to introduce another knob.
The text was updated successfully, but these errors were encountered: