Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: don't clobber HardState on splits #17051

Merged
merged 3 commits into from
Jul 18, 2017
Merged

Conversation

tbg
Copy link
Member

@tbg tbg commented Jul 17, 2017

Since the move to proposer-evaluated KV, we were potentially clobbering the
HardState on splits as we accidentally moved HardState synthesis upstream
of Raft as well. This change moves it downstream again.

Though not strictly necessary, writing lastIndex was moved as well. This is
cosmetic, though it aids @irfansharif's PR #16809, which moves lastIndex to
the Raft engine. After this PR, neither HardState nor last index keys are
added to the WriteBatch, so that pre-#16993 TruncateLog is the only
remaining command that does so (and it, too, won't keep doing that for
long).

Note that there is no migration concern.

Fixes #16749.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -2212,3 +2212,131 @@ func TestPushTxnQueueDependencyCycleWithRangeSplit(t *testing.T) {
})
}
}

func TestMinimal(t *testing.T) {
Copy link
Contributor

@irfansharif irfansharif Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, TestMinimal was something I used for the quick hacky test. Can/Should be renamed (also breaks my hacky workflows).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, didn't actually mean to commit. Removed.

@tbg tbg force-pushed the split-clobbering branch from dfc1768 to ce3bc6c Compare July 17, 2017 18:53
@tbg tbg requested a review from bdarnell July 17, 2017 19:02
@tbg
Copy link
Member Author

tbg commented Jul 17, 2017

Hmm, the assertion I put in is obviously garbage. Let me rework that.

@tbg tbg force-pushed the split-clobbering branch from ce3bc6c to 7db82c4 Compare July 17, 2017 19:23
@tbg
Copy link
Member Author

tbg commented Jul 17, 2017

Cleaned up, PTAL.

@bdarnell
Copy link
Contributor

Are there really no migration concerns? When this goes in, leaders will stop emitting HardStates in their write batches for splits, but followers may not yet know that they need to do it themselves. I don't think that will work because we won't fill in the missing HardState values at read time.


Reviewed 5 of 14 files at r1, 1 of 2 files at r3, 8 of 8 files at r4.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/storage/replica_state.go, line 489 at r4 (raw file):

// synthesizeHardState synthesizes a HardState from the given ReplicaState and
// any existing on-disk HardState in the context of a split, while verifying
// that the application of the snapshot does not violate Raft invariants. It

The remaining mentions of snapshot in this comment look stale.


pkg/storage/replica_state.go, line 498 at r4 (raw file):

	eng engine.ReadWriter,
	oldHS raftpb.HardState,
	truncState roachpb.RaftTruncatedState,

Instead of passing in truncState and raftAppliedIndex, maybe this should take oldHS, newHS raftpb.HardState. Or since this is only used for bootstrapping and splits, can we hard-code raftInitialLogTerm and raftInitialLogIndex? It doesn't look like we ever use this when the term and index might have other values.


pkg/storage/client_split_test.go, line 2216 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Oops, didn't actually mean to commit. Removed.

It's now added in one commit and removed in the next one. You can remove this file completely from both commits.


Comments from Reviewable

@tbg tbg force-pushed the split-clobbering branch from 7db82c4 to 515e711 Compare July 17, 2017 20:22
@tbg
Copy link
Member Author

tbg commented Jul 17, 2017

You're right, my statement is wrong. We'll have to migrate as per the usual. I think I'll send a quick PR to get some stubs into place so that we can start landing changes before the migration RFC is implemented. Modified the commit message to point out the migration concern.


Review status: 3 of 11 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/storage/replica_state.go, line 489 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The remaining mentions of snapshot in this comment look stale.

Done.


pkg/storage/replica_state.go, line 498 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Instead of passing in truncState and raftAppliedIndex, maybe this should take oldHS, newHS raftpb.HardState. Or since this is only used for bootstrapping and splits, can we hard-code raftInitialLogTerm and raftInitialLogIndex? It doesn't look like we ever use this when the term and index might have other values.

I like to avoid hard-coding raftInitialLog{Term,Index}. We don't use this code in much generality, but I prefer it as written, where you can (hope to) reason about its correctness without knowing the contexts in which it is called. I also don't mind that we pass a truncated state here as that is the source of the intial log term and index - it's a good reminder of how things tie together. In the absence of strong opinions, I'd just leave as is.


pkg/storage/client_split_test.go, line 2216 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

It's now added in one commit and removed in the next one. You can remove this file completely from both commits.

Done.


Comments from Reviewable

@tbg tbg force-pushed the split-clobbering branch from 515e711 to 6413b91 Compare July 17, 2017 20:32
@tbg
Copy link
Member Author

tbg commented Jul 17, 2017

Added the "migration" as well.

@tbg tbg force-pushed the split-clobbering branch 3 times, most recently from fb0fc47 to 9550420 Compare July 17, 2017 23:38
@bdarnell
Copy link
Contributor

:lgtm:


Review status: 2 of 12 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 18, 2017

Reviewed 4 of 14 files at r1, 9 of 9 files at r5, 1 of 1 files at r6, 7 of 7 files at r7, 2 of 2 files at r8.
Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


pkg/storage/below_raft_protos_test.go, line 80 at r8 (raw file):

		populatedConstructor: func(r *rand.Rand) proto.Message {
			n := r.Uint64()
			// NB: this would rot if HardState ever picked up more (relevant)

why not generate a new function like these other protos have?


pkg/storage/replica.go, line 4413 at r6 (raw file):

		newHS, err := loadHardState(ctx, r.store.Engine(), rResult.Split.RightDesc.RangeID)
		if err != nil {
			panic(err)

is this intentionally mixing log.Fatal and panic?


pkg/storage/replica_command.go, line 3079 at r8 (raw file):

			// won't write a HardState at all and is guaranteed to crash.
			if err := makeReplicaStateLoader(split.RightDesc.RangeID).synthesizeRaftState(ctx, batch); err != nil {
				return enginepb.MVCCStats{}, EvalResult{}, errors.Wrap(err, "unable to write initial Raft state")

s/write initial/synthesize/?


Comments from Reviewable

tbg added 3 commits July 18, 2017 10:04
This was previously in storage/engine, which is not a canonical place for it.
Motivated by cockroachdb#16749. Added an assertion that catches HardState clobbering. Now

```
make stressrace PKG=./pkg/storage/ TESTS=TestStoreRangeSplitRaceUninitializedRHS
```

fails immediately with

```
clobbered hard state: [Term: 8 != 9 Commit: 10 != 0]

previously: raftpb.HardState{
    Term:             0x9,
    Vote:             0x2,
    Commit:           0x0,
    XXX_unrecognized: nil,
}
overwritten with: raftpb.HardState{
    Term:             0x8,
    Vote:             0x2,
    Commit:           0xa,
    XXX_unrecognized: nil,
}
```

which is fixed in the next commit in this PR.
Since the move to proposer-evaluated KV, we were potentially clobbering the
HardState on splits as we accidentally moved HardState synthesis upstream of
Raft as well. This change moves it downstream again.

Though not strictly necessary, writing lastIndex was moved as well. This is
cosmetic, though it aids @irfansharif's PR cockroachdb#16809, which moves lastIndex to
the Raft engine. After this PR, neither HardState nor last index keys are
added to the WriteBatch, so that pre-cockroachdb#16993 `TruncateLog` is the only remaining
command that does so (and it, too, won't keep doing that for long).

Migration concerns: a lease holder running the new version will propose splits
that don't propose the HardState to Raft. A follower running the old version
will not write the HardState downstream of Raft. In combination, the HardState
would never get written, and would thus be incompatible with the
TruncatedState. Thus, while 1.0 might be around, we're still sending the
potentially dangerous HardState.

Fixes cockroachdb#16749.
@tbg
Copy link
Member Author

tbg commented Jul 18, 2017

Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


pkg/storage/below_raft_protos_test.go, line 80 at r8 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why not generate a new function like these other protos have?

Wouldn't that entail mucking with the vendored proto? I guess we could upstream the gogoproto.populate option, but ... ok, I'll try. Not holding up this PR though. Lmk if you were thinking something else.


pkg/storage/replica.go, line 4413 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

is this intentionally mixing log.Fatal and panic?

absolutely not, made them both Fatalfs.


pkg/storage/replica_command.go, line 3079 at r8 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

s/write initial/synthesize/?

Done. (synthesize initial).


Comments from Reviewable

@tbg tbg force-pushed the split-clobbering branch from 9550420 to 1741cd3 Compare July 18, 2017 14:25
@tbg tbg merged commit 6c75ec3 into cockroachdb:master Jul 18, 2017
@tbg tbg deleted the split-clobbering branch July 18, 2017 14:41
@bdarnell
Copy link
Contributor

Review status: 3 of 12 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


pkg/storage/below_raft_protos_test.go, line 80 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Wouldn't that entail mucking with the vendored proto? I guess we could upstream the gogoproto.populate option, but ... ok, I'll try. Not holding up this PR though. Lmk if you were thinking something else.

Instead of upstreaming a new gogoproto option, It would be simpler to use reflection on the HardState struct to make sure that it only has the expected fields.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Jul 19, 2017

Reviewed 1 of 8 files at r4, 9 of 9 files at r11.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/below_raft_protos_test.go, line 80 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Instead of upstreaming a new gogoproto option, It would be simpler to use reflection on the HardState struct to make sure that it only has the expected fields.

Ah, it's upstream. @bdarnell's suggestion SGTM.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants