Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: simplify replicasByKey handling #74384

Closed
tbg opened this issue Jan 3, 2022 · 1 comment
Closed

kvserver: simplify replicasByKey handling #74384

tbg opened this issue Jan 3, 2022 · 1 comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity X-stale

Comments

@tbg
Copy link
Member

tbg commented Jan 3, 2022

Is your feature request related to a problem? Please describe.

In #73721 we found a bug in our replica handling that could lead to data corruption. The basic problem is that the Store maintains a mapping of keyspace to Replica/Placeholder but code interacts very directly with it and it's easy to mess it up.

In addition to this, (*Replica).Desc() provides the key bounds of the replica object within the btree, which is deadlock prone (and also means that (*Replica).setDescLocked has an affect on the btree. This is all fairly cryptic and it would likely be better not to require a mutex when looking up a Replica from the tree, and to make any bounds adjustments explicit.

Describe the solution you'd like

Mediate all mutations to the Store's mapping (including handling of uninitialized Replicas) through an API that enforces that Replica insertion or bounds changes go through a ReplicaPlaceholder. In particular,

  • shrinking or extending a replica in the btree must happen explicitly via a call through the API (and caller must properly synchronize it with changing r.Desc()).
  • split needs to create a placeholder for the RHS atomically with shrinking the left-hand side, then (atomically) upgrade the placeholder to the newly created right-hand side. (Or directly insert the right-hand side; whatever is cleaner).
  • merge needs to swap right-hand side for a placeholder that it then absorbs into the left-hand side. (Or directly if possible).

Describe alternatives you've considered

Additional context

Jira issue: CRDB-12057

@tbg tbg added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 3, 2022
tbg added a commit to tbg/cockroach that referenced this issue Jan 3, 2022
In cockroachdb#73721 we saw the following assertion fire:

> kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1  unable to
> remove placeholder: corrupted replicasByKey map: <nil> and [...]

This is because `MergeRange` removes from the Store's in-memory map the
right-hand side Replica before extending the left-hand side, leaving a
gap for a snapshot to sneak in. A similar problem exists when a snapshot
widens the existing range (i.e. the snapshot reflects the results of a
merge). This commit closes both gaps.

I verified the fix by calling the newly-introduced (but promptly
disabled, see comment within) `(*Store).maybeAssertNoHole` from
both `(*Store).MergeRange` and `applySnapshot`.

The bug reproduced via this assertion, before this fix, in
`TestStoreRangeMergeTimestampCache` and
`TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the
snapshot and merge trigger cases. I'm not too happy to merge this
without the same kind of active test coverage, but since this will
require a backport, it's better not to rattle the cage too much.
I filed cockroachdb#74384 to clean this up fully.

Release note (bug fix): A bug was fixed that, in very rare cases, could
result in a node terminating with fatal error "unable to remove
placeholder: corrupted replicasByKey map". To avoid potential data
corruption, users affected by this crash should not restart the node,
but instead decommission it in absentia and have it rejoin the cluster
under a new NodeID.
tbg added a commit to tbg/cockroach that referenced this issue Jan 4, 2022
In cockroachdb#73721 we saw the following assertion fire:

> kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1  unable to
> remove placeholder: corrupted replicasByKey map: <nil> and [...]

This is because `MergeRange` removes from the Store's in-memory map the
right-hand side Replica before extending the left-hand side, leaving a
gap for a snapshot to sneak in. A similar problem exists when a snapshot
widens the existing range (i.e. the snapshot reflects the results of a
merge). This commit closes both gaps.

I verified the fix by calling the newly-introduced (but promptly
disabled, see comment within) `(*Store).maybeAssertNoHole` from
both `(*Store).MergeRange` and `applySnapshot`.

The bug reproduced via this assertion, before this fix, in
`TestStoreRangeMergeTimestampCache` and
`TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the
snapshot and merge trigger cases. I'm not too happy to merge this
without the same kind of active test coverage, but since this will
require a backport, it's better not to rattle the cage too much.
I filed cockroachdb#74384 to clean this up fully.

Release note (bug fix): A bug was fixed that, in very rare cases, could
result in a node terminating with fatal error "unable to remove
placeholder: corrupted replicasByKey map". To avoid potential data
corruption, users affected by this crash should not restart the node,
but instead decommission it in absentia and have it rejoin the cluster
under a new NodeID.
craig bot pushed a commit that referenced this issue Jan 4, 2022
73734: kvserver: avoid unprotected keyspace during MergeRange r=erikgrinaker a=tbg

In #73721 we saw the following assertion fire:

> kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1  unable to
> remove placeholder: corrupted replicasByKey map: <nil> and [...]

This is because `MergeRange` removes from the Store's in-memory map the
right-hand side Replica before extending the left-hand side, leaving a
gap for a snapshot to sneak in. A similar problem exists when a snapshot
widens the existing range (i.e. the snapshot reflects the results of a
merge). This commit closes both gaps.

I verified the fix by calling the newly-introduced (but promptly
disabled, see comment within) `(*Store).maybeAssertNoHole` from
both `(*Store).MergeRange` and `applySnapshot`.

The bug reproduced via this assertion, before this fix, in
`TestStoreRangeMergeTimestampCache` and
`TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the
snapshot and merge trigger cases. I'm not too happy to merge this
without the same kind of active test coverage, but since this will
require a backport, it's better not to rattle the cage too much.
I filed #74384 to clean this up fully.

Release note (bug fix): A bug was fixed that, in very rare cases, could
result in a node terminating with fatal error "unable to remove
placeholder: corrupted replicasByKey map". To avoid potential data
corruption, users affected by this crash should not restart the node,
but instead decommission it in absentia and have it rejoin the cluster
under a new NodeID.


Co-authored-by: Tobias Grieger <[email protected]>
gustasva pushed a commit to gustasva/cockroach that referenced this issue Jan 4, 2022
In cockroachdb#73721 we saw the following assertion fire:

> kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1  unable to
> remove placeholder: corrupted replicasByKey map: <nil> and [...]

This is because `MergeRange` removes from the Store's in-memory map the
right-hand side Replica before extending the left-hand side, leaving a
gap for a snapshot to sneak in. A similar problem exists when a snapshot
widens the existing range (i.e. the snapshot reflects the results of a
merge). This commit closes both gaps.

I verified the fix by calling the newly-introduced (but promptly
disabled, see comment within) `(*Store).maybeAssertNoHole` from
both `(*Store).MergeRange` and `applySnapshot`.

The bug reproduced via this assertion, before this fix, in
`TestStoreRangeMergeTimestampCache` and
`TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the
snapshot and merge trigger cases. I'm not too happy to merge this
without the same kind of active test coverage, but since this will
require a backport, it's better not to rattle the cage too much.
I filed cockroachdb#74384 to clean this up fully.

Release note (bug fix): A bug was fixed that, in very rare cases, could
result in a node terminating with fatal error "unable to remove
placeholder: corrupted replicasByKey map". To avoid potential data
corruption, users affected by this crash should not restart the node,
but instead decommission it in absentia and have it rejoin the cluster
under a new NodeID.
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity X-stale
Projects
None yet
Development

No branches or pull requests

2 participants