kvserver: don't allow raft forwarding of lease requests

This patch aims to improve the behavior in scenarios where a follower replica is behind, unaware of the latest lease, and it tries to acquire a lease in its ignorant state. That lease acquisition request is bound to fail (because the lease that it's based on is stale), but while it fails (or, rather, until the behind replica finds out that it failed) local requests are blocked. This blocking can last for a very long time in situations where a snapshot is needed to catch up the follower, and the snapshot is queued up behind many other snapshots (e.g. after a node has been down for a while and gets restarted). This patch tries an opinionated solution: never allow followers to acquire leases. If there is a leader, it's a better idea for the leader to acquire the lease. The leader might have a lease anyway or, even if it doesn't, having the leader acquire it saves a leadership transfer (leadership follows the lease). We change the proposal path to recognize lease requests and reject them early if the current replica is a follower and the leader is known. The rejection points to the leader, which causes the request that triggered the lease acquisition to make its way to the leader and attempt to acquire a lease over there. Fixes #37906 As described in #37906, the badness caused by requests blocking behind a doomed lease acq request could be reproduced with a 100-warehouse tpcc workload (--no-wait) on a 3 node cluster. Turning off a node for 10 minutes and then turning it back on would result in the cluster being wedged for a few minutes until all the snapshots are transferred. I've verified that this patch fixes it. Release note (bug fix): A bug causing queries sent to a freshly-restarted node to sometimes hang for a long time while the node catches up with replication has been fixed.
cockroachdb · Nov 13, 2020 · 8aa1c14 · 8aa1c14
1 parent 5e1a8c3
commit 8aa1c14
Show file tree

Hide file tree

Showing 11 changed files with 559 additions and 52 deletions.
diff --git a/pkg/kv/kvserver/client_raft_helpers_test.go b/pkg/kv/kvserver/client_raft_helpers_test.go
@@ -12,6 +12,7 @@ package kvserver_test
 
 import (
 	"context"
+	"fmt"
 
 	"github.com/cockroachdb/cockroach/pkg/kv/kvserver"
 	"github.com/cockroachdb/cockroach/pkg/roachpb"
@@ -22,7 +23,9 @@ import (
 )
 
 type unreliableRaftHandlerFuncs struct {
-	// If non-nil, can return false to avoid dropping a msg to rangeID.
+	// If non-nil, can return false to avoid dropping the msg to
+	// unreliableRaftHandler.rangeID. If nil, all messages pertaining to the
+	// respective range are dropped.
 	dropReq  func(*kvserver.RaftMessageRequest) bool
 	dropHB   func(*kvserver.RaftHeartbeat) bool
 	dropResp func(*kvserver.RaftMessageResponse) bool
@@ -47,6 +50,7 @@ func noopRaftHandlerFuncs() unreliableRaftHandlerFuncs {
 // unreliableRaftHandler drops all Raft messages that are addressed to the
 // specified rangeID, but lets all other messages through.
 type unreliableRaftHandler struct {
+	name    string
 	rangeID roachpb.RangeID
 	kvserver.RaftMessageHandler
 	unreliableRaftHandlerFuncs
@@ -68,9 +72,14 @@ func (h *unreliableRaftHandler) HandleRaftRequest(
 		}
 	} else if req.RangeID == h.rangeID {
 		if h.dropReq == nil || h.dropReq(req) {
+			var prefix string
+			if h.name != "" {
+				prefix = fmt.Sprintf("[%s] ", h.name)
+			}
 			log.Infof(
 				ctx,
-				"dropping r%d Raft message %s",
+				"%sdropping r%d Raft message %s",
+				prefix,
 				req.RangeID,
 				raft.DescribeMessage(req.Message, func([]byte) string {
 					return "<omitted>"