Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: transfer leases and leadership more thoroughly on graceful shutdown #44204

Closed
tbg opened this issue Jan 22, 2020 · 1 comment
Closed
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@tbg
Copy link
Member

tbg commented Jan 22, 2020

When we drain a server, we spent a total of at most five seconds (per store) to move leases and Raft leaders to other nodes before we continue shutting down:

// The maximum amount of time waited for leadership shedding before commencing
// to drain a store.
const raftLeadershipTransferWait = 5 * time.Second
// SetDraining (when called with 'true') causes incoming lease transfers to be
// rejected, prevents all of the Store's Replicas from acquiring or extending
// range leases, and attempts to transfer away any leases owned.
// When called with 'false', returns to the normal mode of operation.
func (s *Store) SetDraining(drain bool) {
s.draining.Store(drain)
if !drain {
newStoreReplicaVisitor(s).Visit(func(r *Replica) bool {
r.mu.Lock()
r.mu.draining = false
r.mu.Unlock()
return true
})
return
}
var wg sync.WaitGroup
ctx := logtags.AddTag(context.Background(), "drain", nil)

In large deployments, this may not be enough. We need to improve this logic; ideally it can determine with good enough accuracy when to give up (for example, a single surviving node in a three node deployment has no chance at transferring leases away), or we can justify trying "forever" (i.e. until the operator issues a hard shutdown). Ideally we don't need to introduce another knob.

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Jan 22, 2020
@tbg tbg assigned knz and tbg Jan 22, 2020
@knz
Copy link
Contributor

knz commented Apr 20, 2020

fixed by #45149

@knz knz closed this as completed Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants