Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: readOnlyCmdMu not acquired by executeWriteBatch #46329

Closed
nvanbenschoten opened this issue Mar 19, 2020 · 2 comments · Fixed by #64471
Closed

kv: readOnlyCmdMu not acquired by executeWriteBatch #46329

nvanbenschoten opened this issue Mar 19, 2020 · 2 comments · Fixed by #64471
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently.

Comments

@nvanbenschoten
Copy link
Member

#46234 noticed that unlike on the read-path (executeReadOnlyBatch), we don't synchronize with r.readOnlyCmdMu here. Is that ok? What if the replica is destroyed concurrently with a write? We won't be able to successfully propose as the lease will presumably have changed, but what if we hit an error during evaluation (e.g. a ConditionFailedError)?

@tbg:

I don't have a good answer here. If everything went through Raft, I think this code:

for _, p := range r.mu.proposals {
r.cleanupFailedProposalLocked(p)
// NB: each proposal needs its own version of the error (i.e. don't try to
// share the error across proposals).
p.finishApplication(ctx, proposalResult{
Err: roachpb.NewError(roachpb.NewAmbiguousResultError("removing replica")),
})
}

would do it (assuming we check the destroy status under the same lock as we put new proposals in the map), but the failed CPut would skip the read. I think there's probably at least one bug here.

It's also striking how destroying the replica avoids the latch manager. You would assume that destroying the replica is something like

  1. set destroy status
  2. cancel all open proposals
  3. grab an "everything r/w" latch
  4. delete things

but this isn't how it works.

I look forward to when we unify the rw and ro paths, at which point we'll be forced to confront these head on.

@nvanbenschoten:

It's also striking how destroying the replica avoids the latch manager.

Yeah, I was wondering about this too. It would be very nice to flush out all requests using the latch manager, but I fear we might have some dependencies where a request can't make progress in raft until a replica is destroyed but the replica can't be destroyed (if it grabs latches) until the request releases its latches. Maybe that's just FUD.

@tbg:

but I fear we might have some dependencies where a request can't make progress in raft until a replica is destroyed but the replica can't be destroyed (if it grabs latches) until the request releases its latches.

I think that's a legitimate concern, but we just have to handle it. Roughly, step 1 above sets a destroy status that also prevents handleRaftReady from carrying out any more cycles. step 2 cancels all proposals including the raft-owned ones (which now makes sense since we know the state machine won't transition any more). This starts looking a lot like the current code, but by grabbing the latches in addition before deleting we serialize cleanly with inflight reads, so we address exactly the open questions we have here.

@nvanbenschoten nvanbenschoten added C-investigation Further steps needed to qualify. C-label will change. A-kv Anything in KV that doesn't belong in a more specific category. labels Mar 19, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 13, 2020
This PR cleans up some handling around replica destruction that scared me
when working on cockroachdb#46329 and cockroachdb#55293. Specifically, there were cases during
merges where the destroy status on a replica would not be set until after
that replicas data was destroyed. This was true of merges applied through
entry application, although in those cases, no user data was removed so it
seems mostly harmless. More concerning is that is was true of merges applied
through snapshot application. This was extra concerning because snapshot
application can delete user data if a subsumed range is only partially
covered (see `clearSubsumedReplicaDiskData`). So we were hypothetically
risking user-visible correctness problems, especially now that we allow
follower reads through on subsumed ranges as of cockroachdb#51594.

This PR patches up these issues and then adds a few extra assertions that
enforce stricter preconditions for the functions at play during replica
destruction. Specifically, we now assert that the destroy status is set to
`destroyReasonRemoved` _before_ calling `preDestroyRaftMuLocked`. We also
assert that if `RemoveOptions.DestroyData` is false when passed to
`removeInitializedReplicaRaftMuLocked`, then the destroy status is also set.

This unification allows us to remove the `ignoreDestroyStatus` flag on
`RemoveOptions`, whose meaning is now exactly the inverse of `DestroyData`.
In hindsight, pushing on why this extra flag was needed and what new
information it was conveying to the function could have potentially caught
these issues a little earlier.

I think we'll want to backport this to v20.2, though probably in the first
patch release because there is some risk here (especially without sufficient
time to bake on master) and we aren't aware of seeing any correctness issues
from the combination of the bug fixed here and cockroachdb#51594.

Release note (bug fix): A hypothesized bug that could allow a follower
read to miss data on a range in the middle of being merged away into its
left-hand neighbor was fixed. The bug seems exceedingly unlikely to have
materialized in practice.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 14, 2020
This PR cleans up some handling around replica destruction that scared me
when working on cockroachdb#46329 and cockroachdb#55293. Specifically, there were cases during
merges where the destroy status on a replica would not be set until after
that replicas data was destroyed. This was true of merges applied through
entry application, although in those cases, no user data was removed so it
seems mostly harmless. More concerning is that is was true of merges applied
through snapshot application. This was extra concerning because snapshot
application can delete user data if a subsumed range is only partially
covered (see `clearSubsumedReplicaDiskData`). So we were hypothetically
risking user-visible correctness problems, especially now that we allow
follower reads through on subsumed ranges as of cockroachdb#51594.

This PR patches up these issues and then adds a few extra assertions that
enforce stricter preconditions for the functions at play during replica
destruction. Specifically, we now assert that the destroy status is set to
`destroyReasonRemoved` _before_ calling `preDestroyRaftMuLocked`. We also
assert that if `RemoveOptions.DestroyData` is false when passed to
`removeInitializedReplicaRaftMuLocked`, then the destroy status is also set.

This unification allows us to remove the `ignoreDestroyStatus` flag on
`RemoveOptions`, whose meaning is now exactly the inverse of `DestroyData`.
In hindsight, pushing on why this extra flag was needed and what new
information it was conveying to the function could have potentially caught
these issues a little earlier.

I think we'll want to backport this to v20.2, though probably in the first
patch release because there is some risk here (especially without sufficient
time to bake on master) and we aren't aware of seeing any correctness issues
from the combination of the bug fixed here and cockroachdb#51594.

Release note (bug fix): A hypothesized bug that could allow a follower
read to miss data on a range in the middle of being merged away into its
left-hand neighbor was fixed. The bug seems exceedingly unlikely to have
materialized in practice.
craig bot pushed a commit that referenced this issue Oct 15, 2020
55477: kv: set destroy status before destroying data on subsumed replicas r=nvanbenschoten a=nvanbenschoten

This PR cleans up some handling around replica destruction that scared me when working on #46329 and #55293. Specifically, there were cases during merges where the destroy status on a replica would not be set until after that replicas data was destroyed. This was true of merges applied through entry application, although in those cases, no user data was removed so it seems mostly harmless. More concerning is that is was true of merges applied through snapshot application. This was extra concerning because snapshot application can delete user data if a subsumed range is only partially covered (see `clearSubsumedReplicaDiskData`). So we were hypothetically risking user-visible correctness problems, especially now that we allow follower reads through on subsumed ranges as of #51594.

This PR patches up these issues and then adds a few extra assertions that enforce stricter preconditions for the functions at play during replica destruction. Specifically, we now assert that the destroy status is set to `destroyReasonRemoved` _before_ calling `preDestroyRaftMuLocked`. We also assert that if `RemoveOptions.DestroyData` is false when passed to `removeInitializedReplicaRaftMuLocked`, then the destroy status is also set.

This unification allows us to remove the `ignoreDestroyStatus` flag on `RemoveOptions`, whose meaning is now exactly the inverse of `DestroyData`. In hindsight, pushing on why this extra flag was needed and what new information it was conveying to the function could have potentially caught these issues a little earlier.

I think we'll want to backport this to v20.2, though probably in the first patch release because there is some risk here (especially without sufficient time to bake on master) and we aren't aware of seeing any correctness issues from the combination of the bug fixed here and #51594.

Release note (bug fix): A hypothesized bug that could allow a follower read to miss data on a range in the middle of being merged away into its left-hand neighbor was fixed. The bug seems exceedingly unlikely to have materialized in practice.

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 12, 2020
This PR cleans up some handling around replica destruction that scared me
when working on cockroachdb#46329 and cockroachdb#55293. Specifically, there were cases during
merges where the destroy status on a replica would not be set until after
that replicas data was destroyed. This was true of merges applied through
entry application, although in those cases, no user data was removed so it
seems mostly harmless. More concerning is that is was true of merges applied
through snapshot application. This was extra concerning because snapshot
application can delete user data if a subsumed range is only partially
covered (see `clearSubsumedReplicaDiskData`). So we were hypothetically
risking user-visible correctness problems, especially now that we allow
follower reads through on subsumed ranges as of cockroachdb#51594.

This PR patches up these issues and then adds a few extra assertions that
enforce stricter preconditions for the functions at play during replica
destruction. Specifically, we now assert that the destroy status is set to
`destroyReasonRemoved` _before_ calling `preDestroyRaftMuLocked`. We also
assert that if `RemoveOptions.DestroyData` is false when passed to
`removeInitializedReplicaRaftMuLocked`, then the destroy status is also set.

This unification allows us to remove the `ignoreDestroyStatus` flag on
`RemoveOptions`, whose meaning is now exactly the inverse of `DestroyData`.
In hindsight, pushing on why this extra flag was needed and what new
information it was conveying to the function could have potentially caught
these issues a little earlier.

I think we'll want to backport this to v20.2, though probably in the first
patch release because there is some risk here (especially without sufficient
time to bake on master) and we aren't aware of seeing any correctness issues
from the combination of the bug fixed here and cockroachdb#51594.

Release note (bug fix): A hypothesized bug that could allow a follower
read to miss data on a range in the middle of being merged away into its
left-hand neighbor was fixed. The bug seems exceedingly unlikely to have
materialized in practice.
@erikgrinaker erikgrinaker self-assigned this Apr 28, 2021
@erikgrinaker
Copy link
Contributor

Confirmed. I adapted the test case from #64324 to do a conditional put on an existing value, giving the current value of the target key, and it failed because the request read nil instead.

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. and removed C-investigation Further steps needed to qualify. C-label will change. labels Apr 30, 2021
@knz knz added the S-0-corruption-or-data-loss Unrecoverable corruption, data loss, or other catastrophic issues that can’t be fixed by upgrading. label Apr 30, 2021
@erikgrinaker erikgrinaker added S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. and removed S-0-corruption-or-data-loss Unrecoverable corruption, data loss, or other catastrophic issues that can’t be fixed by upgrading. labels May 1, 2021
@tbg tbg added the GA-blocker label May 3, 2021
@blathers-crl
Copy link

blathers-crl bot commented May 3, 2021

Hi @tbg, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@tbg tbg removed the GA-blocker label May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants