Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept any snapshot that allows replication #84

Closed
wants to merge 2 commits into from

Conversation

tbg
Copy link
Collaborator

@tbg tbg commented Jul 14, 2023

Prior to this commit, the leader would not take into account snapshots reported
by a follower unless they matched or exceeded the tracked PendingSnapshot index
(which is the leader's last index at the time of requesting the snapshot). This
is too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR makes that change.

In doing so, it addresses long-standing problems that we've encountered in
CockroachDB. Unless you create the snapshot immediately and locally when raft
emits an MsgSnap, it's difficult/impossible to later synthesize a snapshot at
the requested index. It is possible to get one above the requested index which
raft always accepted, but CockroachDB delegates snapshots to followers who
might be behind on applying the log, and it is awkward to have wait for log
application to send the snapshot just to satisfy an overly strict condition in
raft. Additionally, CockroachDB also sends snapshots preemptively when adding a
new replica since there are qualitative differences between an initial snapshot
and one needed to reconnect to the log and one does not want to wait for raft
to round-trip to the follower to realize that a snapshot is needed. In this
case, the sent snapshot is commonly behind the PendingSnapshot since the leader
transitions the follower into StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

tbg added 2 commits July 14, 2023 16:24
This adds a test that documents the following behavior:

It the leader is tracking a follower as StateSnapshot with PendingSnapshot equal
to, say, 100, and the follower applies a snapshot at a lower index that
reconnects the follower to the leader's log, then the leader will still ignore
this snapshot.
Prior to this commit, the leader would not take into account snapshots reported
by a follower unless they matched or exceeded the tracked PendingSnapshot index
(which is the leader's last index at the time of requesting the snapshot). This
is too inflexible: the leader should take into account any snapshot that
reconnects the follower to its log. This PR makes that change.

In doing so, it addresses long-standing problems that we've encountered in
CockroachDB. Unless you create the snapshot immediately and locally when raft
emits an MsgSnap, it's difficult/impossible to later synthesize a snapshot at
the requested index. It is possible to get one above the requested index which
raft always accepted, but CockroachDB delegates snapshots to followers who
might be behind on applying the log, and it is awkward to have wait for log
application to send the snapshot just to satisfy an overly strict condition in
raft. Additionally, CockroachDB also sends snapshots preemptively when adding a
new replica since there are qualitative differences between an initial snapshot
and one needed to reconnect to the log and one does not want to wait for raft
to round-trip to the follower to realize that a snapshot is needed. In this
case, the sent snapshot is commonly behind the PendingSnapshot since the leader
transitions the follower into StateProbe when a snapshot is already in flight.

Touches cockroachdb/cockroach#84242.
Touches cockroachdb/cockroach#87553.
Touches cockroachdb/cockroach#87554.
Touches cockroachdb/cockroach#97971.
See also https://github.com/cockroachdb/cockroach/blob/2b91c3829270eb512c5380201c26a3d838fc567a/pkg/kv/kvserver/raft_snapshot_queue.go#L131-L143.

Signed-off-by: Tobias Grieger <[email protected]>
@erikgrinaker
Copy link
Contributor

Replaced by #110.

@pav-kv pav-kv closed this Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants