-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: delegated snapshots must carry a min index #87581
Comments
This is the crux of the issue, but I don't see how it's possible. Raft remembers the index of the pending snapshot, but it doesn't seem to assign it to the peer's Match index (used for quorum calculations). Instead, it only assigns it to the peer's Next index when transitioning to Do you mind pointing at the code that makes this possible? |
In the "complete delegated" snapshot branch - the truncated index is already added. I agree with @nvanbenschoten however that today it calls I'm curious why we start sending |
Ah, good catch, Nathan! I misread that as moving This then also explains why we go to probe, and not immediately to replicate. We need the MsgAppResp for that, which actually provides the "proof" of what the latest log index is now. So if we want to bypass With all this, Nathan's suggestion to transfer the actual MsgAppResp with the snapshot response and to step it into the leader makes a ton of sense, since that way we avoid making taking on any new duties w.r.t correctness - we're just delivering a message Raft is sending anyway, to make sure it has been received when we consider the snapshot "done". We should look at how ugly this gets in practice. I think it's sensible to add
I don't think we do, in StateSnapshot raft just hangs tight. However, you do send one message to get into StateSnapshot. Initially, the follower is in StateProbe and the leader tries to append to it. That will get rejected, and now we're in StateSnapshot (since the rejection hint will be zero, i.e. below what the leader can catch the follower up from via the log). The follower sends an MsgAppResp upon applying the snapshot, which fast-tracks the leader back to StateReplicate. Maybe I misunderstood your question. |
Closes cockroachdb#87581. Release note: None
87614: sql/opt/norm: propagate kv errors from cast folding r=ajwerner a=ajwerner Swallowing KV errors here leads to incorrect results. Writes can be missed and serializability can be silently violated. This comes up in the context of the randomized schema change testing. May deal with #85677 relates to #80764 Release note: None 87646: backupccl: deflake backup tests r=adityamaru a=adityamaru See individual commits. Release note: None 87667: server: clean up some logging r=ajwerner a=ajwerner For one, this fixes a format directive issue in the first line. It also stops rendering some arguments to strings directly so that redaction works correctly. Release note: None 87702: kvserver: explain our use of (*raftGroup).ReportSnapshot r=andrewbaptist a=tbg Closes #87581. Release note: None 87713: ci: skip "integration"-style tests in testrace r=knz a=rickystewart Closes #87700. Release note: None Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: adityamaru <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
Describe the problem
When a leader delegates a snapshot to a follower, it must send along a minimum index for the snapshot.
Otherwise, we risk the following:
(*rawNode).ReportSnapshot
and this causes raft to believe that index 100 is now reflected onr1/3
It's worth looking at the first alternative as well, which I find appealing. The mismatch between how raft thinks snapshots work and how they really work in CRDB has concerned me at various times over the years. But I don't think we can do this for 22.2, so should consider the more targeted fix above. I'm actually not sure if 22.2 delegates any snapshots, so depending on whether we do this is a current bug in need of a fix, or something we need to fix before we actually delegate.
See #84242.
Alternatives
ReportSnapshot
. In effect, acknowledging that at least in CRDB raft has much less control over the snapshot index than it believes. We would track the index at which we've actually send the snapshots, hand that to raft, and remove any sketchy corner cases that way. Other users ofraft
may like the current behavior though, so we should allow an index of zero to behave like the index raft previously requested. While we're there, we should also let the follower transition straight intoStateReplicate
(see kv/kvserver: TestAdminRelocateRange failed #84242).Worse alternatives:
Could send the snapshot at the actual index requested. This is actually not really an option since it precludes us from using delegated snapshots - you can't magically materialize a snapshot at a lower applied index; this effectively means undoing log entries that you may not even have any more and even if you do, reverting WriteBatches is not an operation we support, not to mention the numerous side effects entries can have such as splits, etc.
We could let the follower return the index to the leader, who can then avoid calling
ReportSnapshot
or manually massage the raft state (..somehow) to make it safe. (Sketchy)We could let our impl of
raft.Storage
return an index of 10 (raftInitialIndex
) fromSnapshot()
, in effect making sure thatReportSnapshot
can never move the follower'sMatch
above its trueMatch
. (Bit sketchy)We could stop calling
ReportSnapshot
and rely fully on theMsgAppResp
. If theMsgAppResp
got lost for any reason, the raft snap queue would (ultimately and possibly after a scanner cycle only) pick up the replica. (Not good)Jira issue: CRDB-19425
The text was updated successfully, but these errors were encountered: