You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that Replica.Snapshot is safely isolated from the rest of the Replica (#6187), we can use etcd/raft's asynchronous snapshot feature. This will avoid blocking the processRaft goroutine during the expensive snapshot generation.
The first time Snapshot() is called, it should start a goroutine and immediately return raft.ErrSnapshotTemporarilyUnavailable. Raft will occasionally poll for results by calling Snapshot; we can return the result when we have it; until then we return ErrSnapshotTemporarilyUnavailable again (without starting new goroutines).
A few subtle points:
In rare cases raft may decide it doesn't need the snapshot, so we should be sure to discard snapshots that go unused for too long.
When expanding the replication factor (e.g from 1 to 3 or 3 to 5) we may be able to reuse the same snapshot twice, although It's probably not worth optimizing for this case.
After starting the goroutine, it might be best to wait with a short timeout so that we can get the snapshot in a single attempt when it's small. This will be especially important to keep the tests fast.
The text was updated successfully, but these errors were encountered:
Blocking the processRaft goroutine for too long is problematic. In
extreme cases it can cause heartbeats to be missed and new elections to
start (a major cause of cockroachdb#5970). This commit moves the work of snapshot
generation to an asynchronous goroutine.
Fixescockroachdb#6204.
Blocking the processRaft goroutine for too long is problematic. In
extreme cases it can cause heartbeats to be missed and new elections to
start (a major cause of cockroachdb#5970). This commit moves the work of snapshot
generation to an asynchronous goroutine.
Fixescockroachdb#6204.
Now that
Replica.Snapshot
is safely isolated from the rest of theReplica
(#6187), we can useetcd/raft
's asynchronous snapshot feature. This will avoid blocking theprocessRaft
goroutine during the expensive snapshot generation.The first time
Snapshot()
is called, it should start a goroutine and immediately returnraft.ErrSnapshotTemporarilyUnavailable
. Raft will occasionally poll for results by callingSnapshot
; we can return the result when we have it; until then we returnErrSnapshotTemporarilyUnavailable
again (without starting new goroutines).A few subtle points:
The text was updated successfully, but these errors were encountered: