backport-19.2: terminate nodes in minority on consistency failure #42149

tbg · 2019-11-04T17:08:06Z

Backport:

1/1 commits from "storage: compute minority in failed consistency check" (storage: compute minority in failed consistency check #41893)
1/1 commits from "storage: terminate nodes in minority on stats mismatch" (storage: terminate nodes in minority on stats mismatch #41902)

Please see individual PRs for details.

/cc @cockroachdb/release

Previously, a consistency failure was always framed as a deviation from the leaseholder. However, in practice a mismatch is always symmetric: the leaseholder could be right and the follower wrong; the follower might be right; and in fact the (usually 2+) followers may agree that the leaseholder is wrong. We've historically found that the majority was usually right, that is, with very few exceptions (I remember a single one), if there is one outlier replica, that replica is usually the problematic one, or more generally, the SHA with the lowest multiplicity is usually the one worth suspecting (we call this a "minority" in this context). The consistency checker currently fatals the leaseholder indiscriminately. This is an especially poor choice that was made out of convenience. Killing the leaseholder, which may very well have the correct data, forces the lease to move to a follower, often resulting in the corrupt node taking over the lease and spreading the corruption further. In a follow-up change, we'll want to kill the replicas representing the SHA with the lowest multiplicity instead of defaulting to the leaseholder. Release note (general change): Improve the consistency checker's log output.

Following up on cockroachdb#41893, the consistency checker now instructs the replicas corresponding to the SHA with the lowest multiplicity to terminate. Note that this change is a candidate for backporting into 19.2. An "old" leaseholder will never populate the newly introduced field, and will always terminate itself. A new leaseholder will populate the field, but only new nodes will interpret it. In the worst case, the result will be a consistency failure that does not lead to a fatal error (but collects RocksDB checkpoints just the same). Since no significant time is usually spent in a mixed-patch-releases, this is not a concern that outweighs the benefit of terminating (heuristically) the "right" nodes. Release note (general improvement): when the replicas within a range have found to been corrupted, the outliers will be terminated. Previously, the leaseholder replica would terminate, regardless of which replicas disagreed with each other. This is expected to curb the spreading of corrupted data better than the previous approach.

cockroach-teamcity · 2019-11-04T17:08:15Z

This change is

bdarnell · 2019-11-04T17:10:37Z

If stats mismatches aren't severe (which has been our stance so far), I don't think we want to backport changes to crash on them. We can backport the fix for the known issue #42148 (although it'll go into 19.2.1 now), but I'd let stats mismatches continue without aborting in case we've missed some causes.

tbg · 2019-11-04T17:11:35Z

@bdarnell the PR is mislabeled. You need to s/stats mismatch/consistency failure/. Sorry, I hadn't noticed this until you pointed it out.

bdarnell · 2019-11-04T17:13:08Z

OK. If it's for data mismatches instead of stats mismatches then I don't object as much, but this is still a significant change to a little-exercised code path and I don't think it's a great candidate for backporting. I'd personally leave this for 20.1

tbg · 2019-11-04T18:09:35Z

Yeah, this isn't a slam dunk for backporting. There is a certain risk in this backport but also the current code tends to spread inconsistencies around the cluster, something that really bit us in the past. I'll bring this up in the weekly tomorrow.

tbg · 2019-11-11T11:21:53Z

@bdarnell when are you OK with merging this backport? Note that I'm planning to merge #42264 only a) after the 19.2 release or b) after this backport has landed, whichever happens earlier. This is because otherwise we'll get spurious test failures because we can't skip the roachtest on only one of the branches right now.

bdarnell · 2019-11-11T17:42:52Z

Let's wait a few days after 19.2 is out to see if there is going to be a quick 19.2.1.

Many deployments auto-restart crashed nodes unconditionally. As a result, we are often pulled into inconsistency failures relatively late, which results in a loss of information (in particular if the source of the problem turns out to be one of the nodes that did not actually crash); if the node that crashes *is* the one with the problem, restarting it lets it serve data again (until the next time the consistency checker nukes it), which is bad as well. This commit introduces a "death rattle" - if a node is told to terminate as the result of a consistency check, it will write a marker file that prevents subsequent restarts of the node with an informative message alerting the operator that there is a serious problem that needs to be addressed. We use the same strategy when a replica finds that its internal invariants are violated. Fixes cockroachdb#40442. Release note (general change): nodes that have been terminated as the result of a failed consistency check now refuse to restart, making it more likely that the operator notices that there is a persistent issue in a timely manner.

tbg · 2019-11-18T15:07:17Z

Cherry-picked #42401 into this branch. Looks like we're holding off on the merge for a bit longer, too.

bdarnell

We can go ahead and merge now that the commit has been picked for 19.2.1.

tbg added 2 commits November 4, 2019 18:05

tbg changed the title ~~backport-19.2: terminate nodes in minority on stats mismatch~~ backport-19.2: terminate nodes in minority on consistency failure Nov 4, 2019

tbg requested a review from a team as a code owner November 18, 2019 15:08

bdarnell reviewed Nov 18, 2019

View reviewed changes

tbg merged commit 84ea776 into cockroachdb:release-19.2 Nov 19, 2019

tbg deleted the backport19.2-41893-41902 branch November 19, 2019 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backport-19.2: terminate nodes in minority on consistency failure #42149

backport-19.2: terminate nodes in minority on consistency failure #42149

tbg commented Nov 4, 2019

cockroach-teamcity commented Nov 4, 2019

bdarnell commented Nov 4, 2019

tbg commented Nov 4, 2019

bdarnell commented Nov 4, 2019

tbg commented Nov 4, 2019

tbg commented Nov 11, 2019

bdarnell commented Nov 11, 2019

tbg commented Nov 18, 2019

bdarnell left a comment

backport-19.2: terminate nodes in minority on consistency failure #42149

backport-19.2: terminate nodes in minority on consistency failure #42149

Conversation

tbg commented Nov 4, 2019

cockroach-teamcity commented Nov 4, 2019

bdarnell commented Nov 4, 2019

tbg commented Nov 4, 2019

bdarnell commented Nov 4, 2019

tbg commented Nov 4, 2019

tbg commented Nov 11, 2019

bdarnell commented Nov 11, 2019

tbg commented Nov 18, 2019

bdarnell left a comment

Choose a reason for hiding this comment