-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: fix preemptive snapshots #7468
storage: fix preemptive snapshots #7468
Conversation
Seems like this mostly works. I still need to track down the |
I went down a similar road investigating the handling of zero replica IDs -- it's a mess, and this change is making it worse. What are the semantics around the zero ID now? It's allowed, but only on snapshots (and means pre-emptive snapshot), but also it's required when the replica is not already initialized, because...reasons. I don't want to hold this change up, but we need to clarify this situation. Reading this diff I can't tell that it's correct because it's delaying or suppressing errors, seemingly at random. Reviewed 3 of 3 files at r1. storage/replica_raftstorage.go, line 604 [r1] (raw file):
errors.Wrap storage/store.go, line 2084 [r1] (raw file):
i think this is equivalent to Comments from Reviewable |
I think this change will need to wait for @bdarnell's input regardless, so no worries about holding it up. Yes, the semantics of the replica ID 0 are that it is allowed only on snapshots and we only apply them preemptively if the replica is not part of the raft group yet. This isn't a large change. Which errors are you worried about being suppressed? Note that calling I'm still learning about the intricacies of the raft layer. It is very likely I'm doing something silly here (as witnessed by some of my recent PRs). Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 604 [r1] (raw file):
|
The complication I was referring to is the additional meaning of replica ID zero which was added in 22bbc91. Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 604 [r1] (raw file):
|
eb2a486
to
e6b7d66
Compare
Review status: 2 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. storage/replica_raftstorage.go, line 604 [r1] (raw file):
|
e6b7d66
to
ec54e07
Compare
Review status: 1 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. storage/store.go, line 2104 [r1] (raw file):
|
Reviewed 2 of 2 files at r2. Comments from Reviewable |
Yes, this changes the semantics of replica ID 0. Definitely need @bdarnell to weigh in on if this is kosher or if another approach would be better (e.g. a new field indicating a preemptive snapshot has been applied). Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Haven't reviewed closely, but I share @tamirds worry that we're putting more complications and random exceptions in more code paths that are already pretty tangled up and hard to reason about for anyone but @bdarnell. That said, I'm glad that we've got ~4 pairs of eyes on this part of the system now. I removed a bunch of complexity in #7310 (at the price of a missed release, sadly) and follow-ups, and I hope that this trend continues and that the complexity which must remain is more explicitly called out and documented. For example, I had to think about whether a snapshot contained a HardState, similar to how during #7310 I had to puzzle together what Raft state is written when in what situation. We should document that stuff more and especially the reasoning behind it. Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. storage/replica_raftstorage.go, line 602 [r2] (raw file):
Discussed with @tamird yesterday but also wanted to throw this out here: There was some subtlety about having on-disk state without an associated in-memory replica ( Comments from Reviewable |
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. storage/replica_raftstorage.go, line 602 [r2] (raw file):
|
f9d4299
to
df0526a
Compare
This is ready for another look. The tests all pass. As far as I can tell from logs the preemptive snapshots are working. I'm taking a look at whether we can unit test that preemptive snapshots are being applied. And I'm going to be reading through the commits @tamird and @tschottdorf mentioned. Definitely still a novice in this area of the code. Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. storage/replica_raftstorage.go, line 602 [r2] (raw file):
|
Reviewed 4 of 4 files at r3. Comments from Reviewable |
Btw, prior to this change the initial replication in a 3 node cluster was taking ~2-3 secs. After this change the initial replication is taking ~1 sec. Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. Comments from Reviewable |
df0526a
to
4345dfb
Compare
I took a look through #4204 and didn't see anything worrisome about the new usage of Review status: 3 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. Comments from Reviewable |
To clarify, I wasn't specifically concerned about an interaction with #4204; I was pointing out that that PR introduced different special casing of the zero replica ID, which along with this PR contributes to the overall confusion around this field. Reviewed 1 of 1 files at r4. Comments from Reviewable |
Ok. I don't think there is a rush to merge this PR. Let's wait for @bdarnell to weigh in. Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. Comments from Reviewable |
44072b0
to
87468e9
Compare
Related: #6144. It may be simpler in the long run to make these incomplete replicas a separate type, although as my comment below points out there are now multiple kinds of incomplete replicas. Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 604 [r5] (raw file):
Yes, this is fine. The important thing about the persisted value of HardState.Term is that it is updated whenever the node casts a vote (and never moves backwards), so it knows not to vote in the same term twice. storage/store.go, line 2118 [r5] (raw file):
There are two ways a replica can be uninitialized: it may not know its start and end keys, and it may not have a replica ID. Comments from Reviewable |
Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 604 [r5] (raw file):
|
Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 602 [r2] (raw file):
|
87468e9
to
63b67e6
Compare
Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. storage/replica_raftstorage.go, line 602 [r2] (raw file):
|
Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. storage/replica_raftstorage.go, line 602 [r2] (raw file):
|
Allow Replicas to be created which do not have their Raft replica ID configured. These Replicas only allow snapshots to be applied. When applying snapshots to these replicas, we initialize the raft group state similarly to splitting a range. Fixes cockroachdb#7372.
63b67e6
to
74388f3
Compare
Review status: 1 of 4 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending. storage/replica_raftstorage.go, line 602 [r2] (raw file):
|
Reviewed 1 of 2 files at r6, 2 of 2 files at r7. Comments from Reviewable |
I'm very much in favor of #6144. This code is too subtle, and leaving that subtlety implicit much longer can only lead to time wasted debugging. Reviewed 2 of 4 files at r3, 2 of 2 files at r6, 2 of 2 files at r7. storage/replica_raftstorage.go, line 602 [r2] (raw file):
|
Heh, #6144 asks for an Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. Comments from Reviewable |
Yeah, on further reflection #6144 is unrelated. These non-member-replicas do need to be more-or-less full-fledged replicas so they can receive log entries and process their addition to the group. I'm not seeing a good way to clean this up.
That should be OK. The replica data will be there but if we're not a member it will be destroyed on restart. Then if we are still a member, raft will send a (non-preemptive) snapshot to bring the replica back. |
Allow Replicas to be created which do not have their Raft replica ID
configured. These Replicas only allow snapshots to be applied. When
applying snapshots to these replicas, we initialize the raft group state
similarly to splitting a range.
Fixes #7372.
This change is