Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport-19.2: terminate nodes in minority on consistency failure #42149

Merged
merged 3 commits into from
Nov 19, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Nov 4, 2019

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release

tbg added 2 commits November 4, 2019 18:05
Previously, a consistency failure was always framed as a deviation from
the leaseholder. However, in practice a mismatch is always symmetric:
the leaseholder could be right and the follower wrong; the follower
might be right; and in fact the (usually 2+) followers may agree that
the leaseholder is wrong.

We've historically found that the majority was usually right, that is,
with very few exceptions (I remember a single one), if there is one
outlier replica, that replica is usually the problematic one, or more
generally, the SHA with the lowest multiplicity is usually the one
worth suspecting (we call this a "minority" in this context).

The consistency checker currently fatals the leaseholder
indiscriminately. This is an especially poor choice that was made out of
convenience. Killing the leaseholder, which may very well have the
correct data, forces the lease to move to a follower, often resulting in
the corrupt node taking over the lease and spreading the corruption
further. In a follow-up change, we'll want to kill the replicas
representing the SHA with the lowest multiplicity instead of defaulting
to the leaseholder.

Release note (general change): Improve the consistency checker's log output.
Following up on cockroachdb#41893, the consistency checker now instructs the
replicas corresponding to the SHA with the lowest multiplicity to
terminate.

Note that this change is a candidate for backporting into 19.2. An "old"
leaseholder will never populate the newly introduced field, and will
always terminate itself. A new leaseholder will populate the field, but
only new nodes will interpret it. In the worst case, the result will be
a consistency failure that does not lead to a fatal error (but collects
RocksDB checkpoints just the same). Since no significant time is usually
spent in a mixed-patch-releases, this is not a concern that outweighs
the benefit of terminating (heuristically) the "right" nodes.

Release note (general improvement): when the replicas within a range
have found to been corrupted, the outliers will be terminated.
Previously, the leaseholder replica would terminate, regardless of
which replicas disagreed with each other. This is expected to curb
the spreading of corrupted data better than the previous approach.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Nov 4, 2019

If stats mismatches aren't severe (which has been our stance so far), I don't think we want to backport changes to crash on them. We can backport the fix for the known issue #42148 (although it'll go into 19.2.1 now), but I'd let stats mismatches continue without aborting in case we've missed some causes.

@tbg
Copy link
Member Author

tbg commented Nov 4, 2019

@bdarnell the PR is mislabeled. You need to s/stats mismatch/consistency failure/. Sorry, I hadn't noticed this until you pointed it out.

@tbg tbg changed the title backport-19.2: terminate nodes in minority on stats mismatch backport-19.2: terminate nodes in minority on consistency failure Nov 4, 2019
@bdarnell
Copy link
Contributor

bdarnell commented Nov 4, 2019

OK. If it's for data mismatches instead of stats mismatches then I don't object as much, but this is still a significant change to a little-exercised code path and I don't think it's a great candidate for backporting. I'd personally leave this for 20.1

@tbg
Copy link
Member Author

tbg commented Nov 4, 2019

Yeah, this isn't a slam dunk for backporting. There is a certain risk in this backport but also the current code tends to spread inconsistencies around the cluster, something that really bit us in the past. I'll bring this up in the weekly tomorrow.

@tbg
Copy link
Member Author

tbg commented Nov 11, 2019

@bdarnell when are you OK with merging this backport? Note that I'm planning to merge #42264 only a) after the 19.2 release or b) after this backport has landed, whichever happens earlier. This is because otherwise we'll get spurious test failures because we can't skip the roachtest on only one of the branches right now.

@bdarnell
Copy link
Contributor

Let's wait a few days after 19.2 is out to see if there is going to be a quick 19.2.1.

Many deployments auto-restart crashed nodes unconditionally. As a
result, we are often pulled into inconsistency failures relatively late,
which results in a loss of information (in particular if the source of
the problem turns out to be one of the nodes that did not actually
crash); if the node that crashes *is* the one with the problem,
restarting it lets it serve data again (until the next time the
consistency checker nukes it), which is bad as well.

This commit introduces a "death rattle" - if a node is told to terminate
as the result of a consistency check, it will write a marker file that
prevents subsequent restarts of the node with an informative message
alerting the operator that there is a serious problem that needs to be
addressed. We use the same strategy when a replica finds that its
internal invariants are violated.

Fixes cockroachdb#40442.

Release note (general change): nodes that have been terminated as the
result of a failed consistency check now refuse to restart, making it
more likely that the operator notices that there is a persistent issue
in a timely manner.
@tbg
Copy link
Member Author

tbg commented Nov 18, 2019

Cherry-picked #42401 into this branch. Looks like we're holding off on the merge for a bit longer, too.

@tbg tbg requested a review from a team as a code owner November 18, 2019 15:08
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go ahead and merge now that the commit has been picked for 19.2.1.

@tbg tbg merged commit 84ea776 into cockroachdb:release-19.2 Nov 19, 2019
@tbg tbg deleted the backport19.2-41893-41902 branch November 19, 2019 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants