Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: always write a HardState #7598

Merged
merged 8 commits into from
Jul 12, 2016
Merged

storage: always write a HardState #7598

merged 8 commits into from
Jul 12, 2016

Conversation

tbg
Copy link
Member

@tbg tbg commented Jul 1, 2016

storage: always write a HardState

As discovered in
https://github.com/cockroachdb/cockroach/issues/6991#issuecomment-230054238,
it's possible that we apply a Raft snapshot without writing a corresponding
HardState since we write the snapshot in its own batch first and only then
write a HardState.

If that happens, the server is going to panic on restart: It will have a
nontrivial first index, but a committed index of zero (from the empty
HardState).

This change prevents us from applying a snapshot when there is no HardState
supplied along with it, except when applying a preemptive snapshot (in which
case we synthesize a HardState).

Ensure that a new HardState does not break promises made by an existing one
during preemptive snapshot application.

Fixes #7619.

storage: add a migration for missing HardState

See #6991. It's possible that the HardState is missing after a snapshot was
applied (so there is a TruncatedState). In this case, synthesize a HardState
(simply setting everything that was in the snapshot to committed). Having lost
the original HardState can theoretically mean that the replica was further
ahead or had voted, and so there's no guarantee that this will be correct. But
it will be correct in the majority of cases, and some state *has* to be
recovered.

To illustrate this in the scenario in #6991: There, we (presumably) have
applied an empty snapshot (no real data, but a Raft log which starts and
ends at index ten as designated by its TruncatedState). We don't have a
HardState, so Raft will crash because its Commit index zero isn't in line
with the fact that our Raft log starts only at index ten.

The migration sees that there is a TruncatedState, but no HardState. It will
synthesize a HardState with Commit:10 (and the corresponding Term from the
TruncatedState, which is five).

This change is Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 1, 2016

Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


storage/replica_raftstorage.go, line 515 [r1] (raw file):

// applySnapshot updates the replica based on the given snapshot and
// corresponding HardState. When the Replica's replicaID is zero, the

s/corresponding// - there's no verification of correspondence here AFAICT


storage/replica_raftstorage.go, line 537 [r1] (raw file):

  // Extract the updated range descriptor.
  desc := snapData.RangeDescriptor
  // Fill the reservation if there was any one for this range, regardless of

s/any//


storage/replica_raftstorage.go, line 609 [r1] (raw file):

  if replicaID == 0 {
      if !raft.IsEmptyHardState(hs) {
          return 0, errors.Errorf("cannot apply HardState on preemptive snapshot")

does this need to have intimate knowledge of preemptive snapshots? how about "cannot apply hard state to (some identification) which is not yet a member of any range"?

ditto below.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 1, 2016

Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks pending.


storage/store.go, line 846 [r2] (raw file):

          log.Fatal(errors.Wrap(err, "could not migrate TruncatedState"))
      }
      defer log.Warningf("migration: synthesized TruncatedState for %+v", desc)

does this want to log the truncated state that was synthesized?


storage/store.go, line 858 [r2] (raw file):

      hs.Commit = state.TruncatedState.Index
      hs.Term = state.TruncatedState.Term
      defer log.Warningf("migration: synthesized HardState for %+v", desc)

does this want to log the hard state that was synthesized?


storage/store.go, line 860 [r2] (raw file):

      defer log.Warningf("migration: synthesized HardState for %+v", desc)
  }

i think you forgot to write the hard state here.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 620 [r2] (raw file):

      if raft.IsEmptyHardState(hs) {
          return 0, errors.Errorf("cannot apply empty HardState on non-preemptive snapshot")
      }

Did you mean to be calling setHardState here? I must be missing something.


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 2, 2016

No, you're right - it's missing. I'm still working on this change- I think there's a bug in or around #7468 that shows up via #7600. I'll ping this issue when it's ready for a look.

@petermattis
Copy link
Collaborator

#7468 applies preemptive snapshots with a hard state. That was required to get it to work, though I wouldn't be surprised if it contained a bug.

@petermattis
Copy link
Collaborator

@tschottdorf: @cuongdo is going to pick up this PR.

@tbg
Copy link
Member Author

tbg commented Jul 5, 2016

@cuongdo note that the code here doesn't actually write the HardState (in applySnapshot). I don't think this is a high priority change at the moment (except perhaps for the migration), and we'll have to see where we end up wrt maintaining Raft state in general.

@tbg
Copy link
Member Author

tbg commented Jul 6, 2016

@cuongdo I have picked this change back up since it was useful for #7600.

@cuongdo
Copy link
Contributor

cuongdo commented Jul 6, 2016

OK, let me know if I can help, e.g. testing the migration stuff with the
"old" registration cluster data.

On Wed, Jul 6, 2016 at 8:23 AM Tobias Schottdorf [email protected]
wrote:

@cuongdo https://github.com/cuongdo I have picked this change back up
since it was useful for #7600
#7600.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7598 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABffpqUMDzrzNA-lYcWOgIeoz2y_YwM_ks5qS55FgaJpZM4JDiKU
.

@tbg
Copy link
Member Author

tbg commented Jul 6, 2016

PTAL, I updated/cleaned up the commits. Light on testing, but I wanted @bdarnell to take a look first. The migration could be a separate PR, but it uses the previous work, so I left it in. That, too, is untested (a prime candidate for testing would be the registration cluster's backup).

@bdarnell
Copy link
Contributor

bdarnell commented Jul 7, 2016

Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


storage/replica_raftstorage.go, line 517 [r9] (raw file):

// applySnapshot updates the replica based on the given snapshot and
// corresponding HardState. The supplied HardState must be empty if and only

Is it guaranteed that raft always gives us a new HardState when we apply a snapshot? I guess it would always need to advance Commit, but this seems like a delicate thing to rely on. I don't think we should be as prescriptive here and just say that if HardState is non-empty it will be written atomically with the snapshot. Synthesizing a new HardState (when needed for preemptive snapshots) should happen at a higher level.


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 7, 2016

Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


storage/replica_raftstorage.go, line 517 [r9] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Is it guaranteed that raft always gives us a new HardState when we apply a snapshot? I guess it would always need to advance Commit, but this seems like a delicate thing to rely on. I don't think we should be as prescriptive here and just say that if HardState is non-empty it will be written atomically with the snapshot. Synthesizing a new HardState (when needed for preemptive snapshots) should happen at a higher level.

Good point. Do we really need to synthesize one at a higher level for preemptive snapshots? If Raft didn't give us one, that necessarily means that whatever was there is correct and unchanged.

Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Jul 7, 2016

Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


storage/replica_raftstorage.go, line 517 [r9] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Good point. Do we really need to synthesize one at a higher level for preemptive snapshots? If Raft didn't give us one, that necessarily means that whatever was there is correct and unchanged.

If raft didn't give us one and raft was involved, then we don't need to synthesize one. We'd only need to synthesize a HardState for applySnapshot if we continue our current path of calling applySnapshot directly for preemptive snapshots, then we would need to synthesize a HardState there. If we create a dummy raft.RawNode in that case then we never have to synthesize a HardState and we can just write whatever raft gives us.

Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 7, 2016

Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


storage/replica_raftstorage.go, line 517 [r9] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If raft didn't give us one and raft was involved, then we don't need to synthesize one. We'd only need to synthesize a HardState for applySnapshot if we continue our current path of calling applySnapshot directly for preemptive snapshots, then we would need to synthesize a HardState there. If we create a dummy raft.RawNode in that case then we never have to synthesize a HardState and we can just write whatever raft gives us.

Yep, sorry. I was assuming the procedure suggested in #6144. I'll make the change you suggested.

Comments from Reviewable

@tbg tbg force-pushed the hard-state branch 3 times, most recently from 8f6d617 to 44ef8f2 Compare July 8, 2016 12:29
@tbg
Copy link
Member Author

tbg commented Jul 8, 2016

PTAL and if it looks good I'll add some testing.


Review status: 0 of 6 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.


storage/replica_raftstorage.go, line 515 [r1] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

s/corresponding// - there's no verification of correspondence here AFAICT

Even more important to mention that you must pass a corresponding HardState, no?

storage/replica_raftstorage.go, line 537 [r1] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

s/any//

Done.

storage/replica_raftstorage.go, line 609 [r1] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

does this need to have intimate knowledge of preemptive snapshots? how about "cannot apply hard state to (some identification) which is not yet a member of any range"?

ditto below.

I think for the time being it can't hurt to be very explicit about preemptive snapshots. Note that current `HEAD` already has awareness for this here.

storage/replica_raftstorage.go, line 620 [r2] (raw file):

Previously, petermattis (Peter Mattis) wrote…

Did you mean to be calling setHardState here? I must be missing something.

I meant to. Fixed.

storage/store.go, line 846 [r2] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

does this want to log the truncated state that was synthesized?

Done.

storage/store.go, line 858 [r2] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

does this want to log the hard state that was synthesized?

Done (by returning more descriptive error).

storage/store.go, line 860 [r2] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i think you forgot to write the hard state here.

Duh. Done.

Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm: As I mention below, synthesizeHardState deserves a test. Should be straightforward.


Review status: 0 of 6 files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 629 [r12] (raw file):

  if !raft.IsEmptyHardState(hs) {
      if isPreemptive {
          return 0, errors.Errorf("cannot apply HardState on preemptive snapshot")

This error should include the hardstate via %+v. I think I would change the wording: s/cannot apply/unexpected/g.


storage/replica_raftstorage.go, line 639 [r12] (raw file):

          return 0, errors.Wrap(err, "unable to write synthesized HardState")
      }
  }

Isn't there a 3rd condition here: raft.IsEmptyHardState(hs) && !isPreemptive. Seems good to error in that case. If it is expected, please add an empty else branch containing a comment about why it is expected.


storage/replica_state.go, line 325 [r12] (raw file):

// synthesizeHardState synthesizes a HardState from the given ReplicaState and
// updates it from any existing on-disk HardState in the context of a snapshot,

s/updates in from//g


storage/replica_state.go, line 344 [r12] (raw file):

// receive a snapshot, which we prevent due to overlap with the still-existing
// left-hand side).
func synthesizeHardState(eng engine.ReadWriter, s storagebase.ReplicaState) error {

This method deserves an exhaustive test of all the possible error conditions and ways in which an existing hard state can be updated.


storage/store.go, line 862 [r12] (raw file):

      log.Fatal(errors.Wrap(err, "unable to load HardState"))
  }
  // Only update the HardState with nontrivial Commit field. We don't have

s/with/when there is a/g


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 8, 2016

Review status: 0 of 7 files reviewed at latest revision, 13 unresolved discussions, some commit checks pending.


storage/replica_raftstorage.go, line 629 [r12] (raw file):

Previously, petermattis (Peter Mattis) wrote…

This error should include the hardstate via %+v. I think I would change the wording: s/cannot apply/unexpected/g.

Done, PTAL.

storage/replica_raftstorage.go, line 639 [r12] (raw file):

Previously, petermattis (Peter Mattis) wrote…

Isn't there a 3rd condition here: raft.IsEmptyHardState(hs) && !isPreemptive. Seems good to error in that case. If it is expected, please add an empty else branch containing a comment about why it is expected.

No, that's what I originally had, but Ben pointed out (https://reviewable.io/reviews/cockroachdb/cockroach/7598#-KM58r4d4Of6blRjNT0v) that we shouldn't rely on Raft emitting a HardState update after a snapshot (though normally it will since the Commit index will have changed). It's not clear to me that we'll ever get one without, though.

storage/replica_state.go, line 325 [r12] (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/updates in from//g

Done.

storage/replica_state.go, line 344 [r12] (raw file):

Previously, petermattis (Peter Mattis) wrote…

This method deserves an exhaustive test of all the possible error conditions and ways in which an existing hard state can be updated.

Done.

storage/store.go, line 862 [r12] (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/with/when there is a/g

Done.

Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Jul 8, 2016

Reviewed 1 of 6 files at r10, 5 of 6 files at r20, 1 of 2 files at r26, 2 of 2 files at r27.
Review status: all files reviewed at latest revision, 16 unresolved discussions, some commit checks failed.


storage/replica_raftstorage.go, line 526 [r27] (raw file):

// which case it will be synthesized from any existing on-disk HardState
// appropriately. For a regular snapshot, a HardState may or may not be
// supplied, though in the common case it is (since it changes as a result of

s/it changes/the commit index changes/


storage/replica_state.go, line 324 [r27] (raw file):

}

// synthesizeHardState synthesizes a HardState from the given ReplicaState and

Comment that this must be called after s.TruncatedState has been updated.


storage/replica_state.go, line 328 [r27] (raw file):

// that the application of the snapshot does not violate Raft invariants.
//
// The HardState is Raft's view of term and acknowledged log entries, and so we

s/term/term, vote,/
s/acknowledged/committed/ The log itself is where acknowledged log entries live.


storage/replica_state.go, line 334 [r27] (raw file):

// state. For example, a preemptive snapshot might arrive late, after the Range
// has already been created and caught up by vanilla Raft. In that case it's
// very likely that we must _refuse_ the snapshot at this point since it would

Link to #7619. Your comment there said that this PR fixes that issue, but I don't see where we actually reject the snapshot.

I think this comment is going into too much detail. The salient points are that if there is an existing HardState, we must respect it, and we must not apply a snapshot that would move us backwards.


storage/replica_state.go, line 337 [r27] (raw file):

// otherwise violate Raft invariants. On the other hand, when the existing
// HardState is simply a replica which hasn't ever received any log entries but
// only managed to cast a vote or take a note of a Term compatible with the

What does "compatible with the snapshot" mean?


storage/replica_state.go, line 357 [r27] (raw file):

  if oldHS.Commit > newHS.Commit {
      return errors.Errorf("can't decrease HardState.Commit from %d to %d",

Is this intended to be the solution to #7619? It's not enough - we can't apply a snapshot that would cause us to discard acknowledged but uncommitted log entries.


storage/replica_state.go, line 366 [r27] (raw file):

      newHS.Term = oldHS.Term
  }
  // If the existing HardState voted, remember that.

s/voted/voted in this term/


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 8, 2016

Review status: 4 of 9 files reviewed at latest revision, 16 unresolved discussions, some commit checks pending.


storage/replica_raftstorage.go, line 526 [r27] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/it changes/the commit index changes/

Done.

storage/replica_state.go, line 324 [r27] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Comment that this must be called after s.TruncatedState has been updated.

Done.

storage/replica_state.go, line 328 [r27] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/term/term, vote,/
s/acknowledged/committed/ The log itself is where acknowledged log entries live.

Done.

storage/replica_state.go, line 334 [r27] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Link to #7619. Your comment there said that this PR fixes that issue, but I don't see where we actually reject the snapshot.

I think this comment is going into too much detail. The salient points are that if there is an existing HardState, we must respect it, and we must not apply a snapshot that would move us backwards.

Trimmed this down and moved some parts of it to `applySnapshot`.

storage/replica_state.go, line 337 [r27] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What does "compatible with the snapshot" mean?

Gone.

storage/replica_state.go, line 357 [r27] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Is this intended to be the solution to #7619? It's not enough - we can't apply a snapshot that would cause us to discard acknowledged but uncommitted log entries.

Good call. I added that and left a question about testing in the code for you.

storage/replica_state.go, line 366 [r27] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/voted/voted in this term/

Done.

Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Jul 9, 2016

Review status: 4 of 9 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 648 [r28] (raw file):

      }
  } else if isPreemptive {
      // Preemptive snapshots get special verifications (see #7619) of their

I was thinking that these checks would belong before the call to applySnapshot (just like the way that vanilla raft intercepts bad snapshots before we try to apply them), but i guess this works too as long as we don't touch our in-memory state until the batch commits.


storage/replica_raftstorage.go, line 651 [r28] (raw file):

      // last index and (necessarily synthesized) HardState.
      if snap.Metadata.Index < oldLastIndex {
          // TODO(tschottdorf): how exactly would this happen? Perhaps:

I can't come up with a way this could happen with a preemptive snapshot (in the scenario described here, the second snapshot would not have replica ID 0).

etcd/raft doesn't appear to have a check like this. I think they assume that snapshots only transfer the state machine and don't touch the logs. They do, however, drop any snapshots from older terms. You say below that we need to allow snapshots from old terms. Why?


Comments from Reviewable

@petermattis
Copy link
Collaborator

The testing of the applySnapshot changes appears weak. I see precisely 1 direct call to Replica.applySnapshot() from test code. Seems feasible to test all the various scenarios directly. Am I missing something?


Review status: 4 of 9 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 9, 2016

Reviewed 1 of 6 files at r10, 8 of 8 files at r19, 3 of 6 files at r20, 2 of 2 files at r27, 5 of 5 files at r28, 1 of 1 files at r29, 2 of 2 files at r30, 2 of 2 files at r31.
Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


storage/migration_test.go, line 58 [r30] (raw file):

  expTS := roachpb.RaftTruncatedState{Term: raftInitialLogTerm, Index: raftInitialLogIndex}
  if !reflect.DeepEqual(expTS, ts) {

comparable directly


storage/replica_raftstorage.go, line 583 [r28] (raw file):

  oldLastIndex, err := loadLastIndex(batch, desc.RangeID)
  if err != nil {
      return 0, errors.Wrap(err, "loading last index")

nit: this will read loading last index: boom which is weird - there's no mention of an error in the text


storage/replica_raftstorage.go, line 642 [r28] (raw file):

  if !raft.IsEmptyHardState(hs) {
      if isPreemptive {
          return 0, errors.Errorf("unexpected HardState %+v on preemptive snapshot", &hs)

throughout: taking a pointer shouldn't be necessary for %+v


storage/replica_raftstorage.go, line 656 [r28] (raw file):

          // - ReplicaID=0 receives older snapshot, truncates state
          // Should simulate that test.
          return 0, errors.Wrap(err,

what err are you wrapping here? seems wrong.


storage/replica_state_test.go, line 73 [r28] (raw file):

          if err := synthesizeHardState(batch, testState); ((err == nil) != (test.Err == "")) ||
              (err != nil && !testutils.IsError(err, test.Err)) {

no need for the nil check


storage/replica_state_test.go, line 84 [r28] (raw file):

              t.Fatal(err)
          }
          if !reflect.DeepEqual(hs, test.NewHS) {

can't these be compared directly?


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 9, 2016

Which applySnapshot changes do you mean? Not resetting the last index? I was waiting for @bdarnell's input.


Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 651 [r28] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I can't come up with a way this could happen with a preemptive snapshot (in the scenario described here, the second snapshot would not have replica ID 0).

etcd/raft doesn't appear to have a check like this. I think they assume that snapshots only transfer the state machine and don't touch the logs. They do, however, drop any snapshots from older terms. You say below that we need to allow snapshots from old terms. Why?

The SnapshotWins test fails (it drops a snapshot from term 5 in term ~8 and then times out). I'm going to look into that - agree that it shouldn't be necessary to let those throug.

storage/replica_state_test.go, line 73 [r28] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

no need for the nil check

`testutils.IsError(nil, )` is `false`.

Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 651 [r28] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

The SnapshotWins test fails (it drops a snapshot from term 5 in term ~8 and then times out). I'm going to look into that - agree that it shouldn't be necessary to let those throug.

Hmm, that's weird. Despite the name SnapshotWins (which refers to the fact that the snapshot message arrives first), no snapshot is actually applied in that test.

Comments from Reviewable

@petermattis
Copy link
Collaborator

I was referring to applySnapshot in general. It looks like it is only tested indirectly, but I can't see a reason for that. It's not necessary to address in this PR, but we should file an issue to thoroughly test all of the applySnapshot possibilities (preemptive vs non-preemptive, existing HardState vs empty HardState, etc).


Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 11, 2016

Yeah, that's a good point. I filed #7737 about that.

@bdarnell waiting for your comments (and/or LGTM).


Review status: 0 of 9 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.


storage/migration_test.go, line 58 [r30] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

comparable directly

Done.

storage/replica_raftstorage.go, line 583 [r28] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: this will read loading last index: boom which is weird - there's no mention of an error in the text

Done.

storage/replica_raftstorage.go, line 642 [r28] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

throughout: taking a pointer shouldn't be necessary for %+v

They will use different string representations - `&hs` will result in `migration_test.go:64: expected term:5 vote:0 commit:10 , got term:5 vote:0 commit:10` (i.e. proto compact string representation) and the `hs` in `{Term:5 Vote:0 Commit:10 XXX_unrecognized:[]}` (i.e. Go default struct repr).

storage/replica_raftstorage.go, line 651 [r28] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Hmm, that's weird. Despite the name SnapshotWins (which refers to the fact that the snapshot message arrives first), no snapshot is actually applied in that test.

The issue is that the split trigger would actually fail (remember that it wants `Term: 5`), and more generally that we use `synthesizeHardState` in contexts which are not just the application of a snapshot. I think we can be more focused here once we don't synthesize HardStates any more and "everything is a snapshot" (or we could now, but it's kind of a bottomless pit and I've got to draw the line somewhere).

I've taken out the TODO. Anything else you want done for #7619 in the meantime?


storage/replica_raftstorage.go, line 656 [r28] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

what err are you wrapping here? seems wrong.

Ew, indeed (this one is always nil I think). Fixed.

storage/replica_state_test.go, line 84 [r28] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can't these be compared directly?

Nope, the usual
type HardState struct {
    Term             uint64 `protobuf:"varint,1,opt,name=term" json:"term"`
    Vote             uint64 `protobuf:"varint,2,opt,name=vote" json:"vote"`
    Commit           uint64 `protobuf:"varint,3,opt,name=commit" json:"commit"`
    XXX_unrecognized []byte `json:"-"`
}

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 11, 2016

Reviewed 6 of 7 files at r33, 1 of 2 files at r35, 2 of 2 files at r36.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 642 [r28] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

They will use different string representations - &hs will result in migration_test.go:64: expected term:5 vote:0 commit:10 , got term:5 vote:0 commit:10 (i.e. proto compact string representation) and the hs in {Term:5 Vote:0 Commit:10 XXX_unrecognized:[]} (i.e. Go default struct repr).

Right, but you're using `%+v` here, so string representation doesn't matter, does it?

Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jul 11, 2016

Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 642 [r28] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Right, but you're using %+v here, so string representation doesn't matter, does it?

`%+v` uses `%s` if it's available. See https://play.golang.org/p/ANx1_grQq9

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 11, 2016

Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 642 [r28] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

%+v uses %s if it's available. See https://play.golang.org/p/ANx1_grQq9

TIL

Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 651 [r28] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

The issue is that the split trigger would actually fail (remember that it wants Term: 5), and more generally that we use synthesizeHardState in contexts which are not just the application of a snapshot. I think we can be more focused here once we don't synthesize HardStates any more and "everything is a snapshot" (or we could now, but it's kind of a bottomless pit and I've got to draw the line somewhere).

I've taken out the TODO. Anything else you want done for #7619 in the meantime?

What do you mean the split trigger "wants `Term: 5`"? How does it fail (I can't reproduce this so I must be putting the check somewhere different than you did)? The split trigger should create the HardState with term 5 if it isn't already there, but it's fine if raft messages have already pushed the term forward on the uninitialized replica.

We should add a term check when applying preemptive snapshots before merging this PR or understand why we can't (maybe you already understand this and i'm just not getting it yet). This seems pretty important to me.


Comments from Reviewable

tbg added 8 commits July 12, 2016 04:02
As discovered in
cockroachdb#6991 (comment),
it's possible that we apply a Raft snapshot without writing a corresponding
HardState since we write the snapshot in its own batch first and only then
write a HardState.

If that happens, the server is going to panic on restart: It will have a
nontrivial first index, but a committed index of zero (from the empty
HardState).

This change prevents us from applying a snapshot when there is no HardState
supplied along with it, except when applying a preemptive snapshot (in which
case we synthesize a HardState).

Ensure that the new HardState and Raft log does not break promises made by an
existing one during preemptive snapshot application.

Fixes cockroachdb#7619.

storage: prevent loss of uncommitted log entries
See cockroachdb#6991. It's possible that the HardState is missing after a snapshot was
applied (so there is a TruncatedState). In this case, synthesize a HardState
(simply setting everything that was in the snapshot to committed). Having lost
the original HardState can theoretically mean that the replica was further
ahead or had voted, and so there's no guarantee that this will be correct. But
it will be correct in the majority of cases, and some state *has* to be
recovered.

To illustrate this in the scenario in cockroachdb#6991: There, we (presumably) have
applied an empty snapshot (no real data, but a Raft log which starts and
ends at index ten as designated by its TruncatedState). We don't have a
HardState, so Raft will crash because its Commit index zero isn't in line
with the fact that our Raft log starts only at index ten.

The migration sees that there is a TruncatedState, but no HardState. It will
synthesize a HardState with Commit:10 (and the corresponding Term from the
TruncatedState, which is five).
@tbg
Copy link
Member Author

tbg commented Jul 12, 2016

Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


storage/replica_raftstorage.go, line 651 [r28] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What do you mean the split trigger "wants Term: 5"? How does it fail (I can't reproduce this so I must be putting the check somewhere different than you did)? The split trigger should create the HardState with term 5 if it isn't already there, but it's fine if raft messages have already pushed the term forward on the uninitialized replica.

We should add a term check when applying preemptive snapshots before merging this PR or understand why we can't (maybe you already understand this and i'm just not getting it yet). This seems pretty important to me.

I put the check in `synthesizeHardState` (i.e. it refused to synthesize an initial HardState when there was already one with a higher term), but that's clearly the wrong place now that I think about it. Put the check here and added a simple test, PTAL.

Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Review status: 0 of 11 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


Comments from Reviewable

@tbg tbg merged commit fb6cc2e into cockroachdb:master Jul 12, 2016
@tbg tbg deleted the hard-state branch July 12, 2016 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants