Skip to content

Commit

Permalink
kvserver: log when raft send/recv queue fills up
Browse files Browse the repository at this point in the history
Inspired by cockroachlabs/support#1770.

If either the raft send or receive queue fills up, wide-spread outages
can occur as replication progress stalls. We have metrics that can
indicate this, but straightforward logging is also appropriate to direct
attention to the fact, which this commit achieves.

Touches cockroachdb#79755

Release justification: important logging improvement
Release note: None
  • Loading branch information
tbg committed Aug 23, 2022
1 parent cf533b0 commit c1cd955
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 0 deletions.
3 changes: 3 additions & 0 deletions pkg/kv/kvserver/raft_transport.go
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,9 @@ func (t *RaftTransport) SendAsync(
case ch <- req:
return true
default:
if logRaftSendQueueFullEvery.ShouldLog() {
log.Warningf(t.AnnotateCtx(context.Background()), "raft send queue to n%d is full", toNodeID)
}
releaseRaftMessageRequest(req)
return false
}
Expand Down
8 changes: 8 additions & 0 deletions pkg/kv/kvserver/store_raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ import (
"go.etcd.io/etcd/raft/v3/raftpb"
)

var (
logRaftRecvQueueFullEvery = log.Every(1 * time.Second)
logRaftSendQueueFullEvery = log.Every(1 * time.Second)
)

type raftRequestInfo struct {
req *kvserverpb.RaftMessageRequest
size int64 // size of req in bytes
Expand Down Expand Up @@ -305,6 +310,9 @@ func (s *Store) HandleRaftUncoalescedRequest(
// that dropping the request is safe. Raft will retry.
s.metrics.RaftRcvdDropped.Inc(1)
s.metrics.RaftRcvdDroppedBytes.Inc(size)
if logRaftRecvQueueFullEvery.ShouldLog() {
log.Warningf(ctx, "raft receive queue for r%d is full", req.RangeID)
}
return false
}
return enqueue
Expand Down

0 comments on commit c1cd955

Please sign in to comment.