storage: outside-of-raft snapshots put consistency in jeopardy #7619

tbg · 2016-07-05T15:51:19Z

When we send a snapshot around outside of Raft (preemptive snapshot), we don't check whether the state we're overwriting is ahead of us.
This could lead to a replica being created, acknowledging progress, and then being reset through a snapshot.

Perhaps canApplySnapshot is a good place to check that. The necessary information should be deducible from the respective HardStates.

The text was updated successfully, but these errors were encountered:

bdarnell · 2016-07-05T16:00:14Z

Specifically, there are some checks on the snapshot metadata in raft.restore which we don't do for preemptive snapshots. If there is data on disk we should check the term and index of the snapshot and drop it if it's older than what we already have.

Alternately, we could reorganize the checks in handleRaftMessage so we only call applySnapshot directly when we have no non-trivial local state, and pass the snapshot to r.raftGroup.Step otherwise to let raft make the decision.

tbg · 2016-07-06T12:46:46Z

I know this is assigned to @tamird, but it ties in with stuff for #7600, so I'm currently poking around in it too (let's avoid crossing streams).

tamird · 2016-07-06T14:26:44Z

OK. I already have a WIP for this:
master...tamird:preemptive-snap-no-backwards

On Wed, Jul 6, 2016 at 8:46 AM, Tobias Schottdorf [email protected]
wrote:

I know this is assigned to @tamird https://github.com/tamird, but it
ties in with stuff for #7600
#7600, so I'm currently
poking around in it too (let's avoid crossing streams).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7619 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABdsPBHGYOn8jYtsE1SrIdPxJmMWgIQHks5qS6O9gaJpZM4JFQpb
.

Ensure that a new HardState does not break promises made by an existing one during snapshot application. Update the HardState for both preemptive (where they're required both from an updating and error checking perspective) and Raft-delivered snapshots (where it must never error out and should never have to update anything since the supplied HardState is fresh out of Raft). Fixes cockroachdb#7619.

tbg · 2016-07-06T20:43:50Z

Snagging this issue from you as believed fixed in #7598 (won't merge without a test for this added).

Fixes cockroachdb#7659 Updates cockroachdb#7600 Updates cockroachdb#7619

As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that a new HardState does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619.

As discovered in cockroachdb#6991 (comment), it's possible that we apply a Raft snapshot without writing a corresponding HardState since we write the snapshot in its own batch first and only then write a HardState. If that happens, the server is going to panic on restart: It will have a nontrivial first index, but a committed index of zero (from the empty HardState). This change prevents us from applying a snapshot when there is no HardState supplied along with it, except when applying a preemptive snapshot (in which case we synthesize a HardState). Ensure that the new HardState and Raft log does not break promises made by an existing one during preemptive snapshot application. Fixes cockroachdb#7619. storage: prevent loss of uncommitted log entries

tbg added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 5, 2016

petermattis assigned tamird Jul 5, 2016

bdarnell mentioned this issue Jul 6, 2016

storage: clobbering between uninitialized replica and RHS of a split #7600

Closed

tbg assigned tbg and unassigned tamird Jul 6, 2016

tbg mentioned this issue Jul 7, 2016

storage: Make uninitialized replicas distinct types #6144

Closed

bdarnell added a commit to bdarnell/cockroach that referenced this issue Jul 7, 2016

storage: Temporarily disable preemptive snapshots

f7734ba

Fixes cockroachdb#7659 Updates cockroachdb#7600 Updates cockroachdb#7619

bdarnell mentioned this issue Jul 7, 2016

storage: Temporarily disable preemptive snapshots #7674

Merged

bdarnell mentioned this issue Jul 8, 2016

storage: always write a HardState #7598

Merged

petermattis modified the milestone: Q3 Jul 11, 2016

petermattis added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Jul 11, 2016

tbg closed this as completed in bf44eb9 Jul 12, 2016

petermattis mentioned this issue Jul 13, 2016

storage: preemptive snapshots #7819

Closed

tamird mentioned this issue Jul 28, 2016

storage: update confusing comment #8106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: outside-of-raft snapshots put consistency in jeopardy #7619

storage: outside-of-raft snapshots put consistency in jeopardy #7619

tbg commented Jul 5, 2016

bdarnell commented Jul 5, 2016

tbg commented Jul 6, 2016

tamird commented Jul 6, 2016

tbg commented Jul 6, 2016

storage: outside-of-raft snapshots put consistency in jeopardy #7619

storage: outside-of-raft snapshots put consistency in jeopardy #7619

Comments

tbg commented Jul 5, 2016

bdarnell commented Jul 5, 2016

tbg commented Jul 6, 2016

tamird commented Jul 6, 2016

tbg commented Jul 6, 2016