kvserver: refactor replicaAppBatch for standalone log application #93266

tbg · 2022-12-08T16:46:18Z

This long (but individually small) sequence of commits moves (*replicaAppBatch).Stage close to the structure that was prototyped in #93265, where it has the following steps:

command checks (standalone)
testing interceptors (replica)
pre-add triggers (standalone)
pre-add triggers (replica)
add (to pebble batch, standalone)
post-add triggers (standalone)
post-add triggers (replica)

In standalone application (e.g. for #93244) we'd use an apply.Batch that is an appBatch, instead of a replicaAppBatch, i.e. skip all of the (replica) steps above.

This PR doesn't get us all the way there - we still need to tease apart the post-add triggers (replica) step, which currently contains code that should be in post-add triggers (standalone); this is best tackled in a separate PR since it's going to be quite a bit of work.

Touches #75729.

Epic: CRDB-220
Release note: None

cockroach-teamcity · 2022-12-08T16:46:49Z

This change is

Epic: CRDB-220 Release note: None

This streamlines it by moving it to the method that handles the other side effects. Epic: CRDB-220 Release note: None

Epic: CRDB-220 Release note: None

Switch to simple `preAdd / add / postAdd` scheme. Epic: CRDB-220 Release note: None

The next commit will remove its dependency on `replicaAppBatch`. Epic: CRDB-220 Release note: None

This also gives appBatch the first field of its own, `mutations`. Epic: CRDB-220 Release note: None

It turns out that the one step previously in `runPreAddTriggers` belongs in it, too, so it was moved there. Epic: CRDB-220 Release note: None

pav-kv

Broadly LGTM, needs some polishing. Reviewed commit by commit.

pkg/kv/kvserver/app_batch.go

pkg/kv/kvserver/replica_app_batch.go

pkg/kv/kvserver/app_batch.go

pkg/kv/kvserver/replica_app_batch.go

pkg/kv/kvserver/replica_application_state_machine.go

pkg/kv/kvserver/replica_app_batch.go

pkg/kv/kvserver/app_batch.go

This is a bit outside of the current workstream, but I noticed this method really didn't depend on the `*Replica` in a meaningful way; all it needs is a `*rangefeed.Filter`. By minimizing the dependency this method is now much more amenable to being unit tested, and we can conceivably do so as a side effect of the unit testing that will go into the separate raft log. (Note that the `filter` is a performance optimization - putting a filter in that covers everything is valid behavior). Something similar could be done with `handleLogicalOpLogRaftMuLocked` but we leave it for now since that one is a tad more involved. Epic: CRDB-220 Release note: None

This further cleans up `(*replicaAppBatch).Stage`. Acquiring the split/merge lock can be done as a regular side effect of a command and doesn't require special ordering, despite what its previous location suggested. What's important is that we hold this lock by the time we actually modify either the right-hand side replica or any engine state related to it. But this happens later: first we'll commit the batch in `(replicaAppBatch).ApplyToStateMachine` and then, in `(stateMachine).ApplySideEffects`, the split/merge of the in-memory Replicas is carried out. Then we'll release the lock. Effectively everything between the first call to `c.Stage` and `c.ApplyToStateMachine` is one large critical section, so this could be done differently too. I think we might even get away with doing all the work related to the split/merge right in the post-add trigger, except it would probably cause lots of annoying problems with stats updates. A comment was added to reflect the above. Epic: CRDB-220 Release note: None

GCHint is not a "trivial" state update[^1] so it being handled as a trivial update was, while functionally correct, semantically incorrect. [^1]: https://github.com/cockroachdb/cockroach/blob/6265cbb2cd6be60e18ebef0f176eab38954cd276/pkg/kv/kvserver/kvserverpb/proposer_kv.go#L52-L80 Epic: CRDB-220 Release note: None

It was holding a reference to it only to access stats. Might as well hold on to the stats directly. Epic: CRDB-220 Release note: None

`applyStats` is a better name; let's reserve `stats` for `MVCCStats`. Epic: CRDB-220 Release note: None

`replicaAppBatch` holds on to a shallow copy of `r.mu.State` and the contract is that it needs to allocate new memory whenever mutating one of the pointer fields. `Stats` is such a pointer field and it needs to be updated with each command, so to avoid allocating anew each time, we keep one around and re-use it for the lifetime of each Batch. Nothing about that has changed except that now we stash the field on `replicaStateMachine`, out of view of `replicaAppBatch`. Epic: CRDB-220 Release note: None

This was checked already at this point, earlier in `c.Stage`. Epic: CRDB-220 Release note: None

No logic change, just re-anchoring these (currently noop) methods to `appBatch`, where they belong. Epic: CRDB-220 Release note: None

Epic: CRDB-220 Release note: None

We want these stats counters during standalone log application as well, primarily so that they can be verified in unit testing. Epic: CRDB-220 Release note: None

Mechanical, in preparation for sharing a struct among `applyCommittedEntriesStats` and the counters on `appBatch`. Epic: CRDB-220 Release note: None

Epic: CRDB-220 Release note: None

tbg · 2022-12-21T12:34:16Z

Patched in the fixup commits, verified that git diff @{upstream} was unchanged. TFTR!

bors r=pavelkalinnikov

craig · 2022-12-21T14:10:30Z

Build succeeded:

Bazel Essential CI (Cockroach)

In cockroachdb#94633, I introduced[^1] an assertion that attempted to catch cases in which we might otherwise accidentally end up applying a proposal twice. This assertion had a false positive, see the updated comment within. I was able to reproduce the failure within ~minutes via `./experiment.sh` in cockroachdb#97173 as of 33dcdef. Better testing of these cases would be desirable. Unfortunately, while there is an abstraction over command application (`apply.Task`), most of the logic worth testing lives in `(*replicaAppBatch)` which is essentially a `*Replica` with more moving parts attached. This does not lend itself well to unit testing. I had a run[^1][^2][^3] earlier this year to make log application standalone, but then didn't have enough time to follow through. It would be desirable to do so at a later date, perhaps with the explicit goals of having interactions like the one discussion in this PR unit become testable. No release note because unreleased (except perhaps in an alpha). [3]: cockroachdb#93309 [2]: cockroachdb#93266 [1]: cockroachdb#93239 Closes cockroachdb#94633. [^1]: https://github.com/cockroachdb/cockroach/pull/94633/files#diff-50e458584d176deae52b20a7c04461b3e4110795c8c9a307cf7ee6696ba6bc60R238 Epic: none Release note: None

In cockroachdb#94633, I introduced[^1] an assertion that attempted to catch cases in which we might otherwise accidentally end up applying a proposal twice. This assertion had a false positive. I was able to reproduce the failure within ~minutes via `./experiment.sh` in cockroachdb#97173 as of 33dcdef. Better testing of these cases would be desirable. Unfortunately, while there is an abstraction over command application (`apply.Task`), most of the logic worth testing lives in `(*replicaAppBatch)` which is essentially a `*Replica` with more moving parts attached. This does not lend itself well to unit testing. I had a run[^2][^3][^4] earlier this year to make log application standalone, but then didn't have enough time to follow through. It would be desirable to do so at a later date, perhaps with the explicit goals of having interactions like the one discussion in this PR unit become testable. [^4]: cockroachdb#93309 [^3]: cockroachdb#93266 [^2]: cockroachdb#93239 [^1]: https://github.com/cockroachdb/cockroach/pull/94633/files#diff-50e458584d176deae52b20a7c04461b3e4110795c8c9a307cf7ee6696ba6bc60R238 This assertion was previously trying to assert too much at a distance and was not only incorrect, but additionally inscrutable. It was mixing up two assertions, the first one of which is sensible: If an entry is accepted, it must not be superseded by inflight proposal. If this were violated, this superseded proposal could also apply, resulting in a failure of replay protection. This assertion is now still around as a stand-alone assertion. The other half of the assertion was more confused: if an entry is rejected, it was claiming that it couldn't also be superseded. The thinking was that if a superseding log entry exists, maybe it could apply, and that would be bad since we just told the waiting client that their proposal got rejected. This reasoning is incorrect, as the following example shows. Consider the following initial situation: [lease seq is 1] log idx 99: unrelated cmd at LAI 10000, lease seq = 1 log idx 100: cmd X at LAI 10000, lease seq = 1 And next: - a new lease enters the log at idx 101 (lease seq = 2) - an identical copy of idx 100 enters the log at idx 102 - we apply idx 100, leading to superseding reproposal at idx 103 resulting in the log: [lease seq is 1] log idx 99: unrelated cmd at LAI 10000, lease seq = 1 log idx 100: cmd X at LAI 10000, lease seq = 1 log idx 101: lease seq = 2 log idx 102: (same as idx 100) log idx 103: cmd X at LAI = 20000, lease seq = 1 During application of idx 102, we get a *permanent* rejection and yet the entry is superseded (by the proposal at idx 103). This would erroneously trigger the assertion, even though this is a legal sequence of events with no detrimental outcomes: the superseding proposal will always have the same lease sequence as its superseded copies, so it will also fail. I initially tried only soften the assertion a *little bit*. Observing that the example above led to a *permanent* rejection, should we only require that a proposal (which in this assertion is always local) is not superseded if it got rejected due to its lease index (which implies that it passed the lease check)? It turns out that this is primarily an assertion on when superseded proposals are counted as "local" at this point in the code: if there were multiple copies of this rejected proposal in the current `appTask` (i.e. the current `CommittedEntries` slice handed to us for application by raft), then all copies are initially local; and a copy that successfully spawns a superseding proposal would be made non-local from that point on. On the face of it, All other copies in the same `appTask` would now hit the assertion (erroneously): they are local, they are rejected, so why don't they enter the branch? The magic ingredient is that if an entry is superseded when we handle the lease index rejection, we also unlink the proposal from it. So these never enter this path since it's not local at this point. For example, if these are the log entries to apply (all at valid lease seq): log idx 99: unrelated cmd at LAI 10000 log idx 100: cmd X at LAI 10000 log idx 101: (identical copy of idx 100) and idxs 99-101 are applied in one batch, then idx 100 would spawn a reproposal at a new lease applied index: log idx 99: unrelated cmd at LAI 10000 log idx 100: cmd X at LAI 10000 <- applied log idx 101: (identical copy of idx 100) log idx 100: cmd X at LAI 20000 <- not in current batch When we apply 101, we observe an illegal lease index, but the proposal supersedes the entry, so we mark it as non-local and don't enter the branch that contains the assertion. The above reasoning is very difficult to understand, and it happens too far removed from where the interesting state changes happen. Also, for testing purposes it is interesting to introduce "errors" in the lease applied index assignment to artificially exercise these reproposal mechanisms. In doing so, these assertions can trip because the lease applied index assigned to a reproposal might accidentally (or intentionally!) match the existing lease applied index, in which case copies of the command in the same batch now *don't* consider themselves superseded. The value of this testing outweighs the very limited benefit of this branch of the assertion. An argument could even be made that this assertion alone as negative benefit due to its complexity. We are removing it in this commit and will instead work towards simplifying the mechanisms that played a role in explaining the asssertion. Closes cockroachdb#94633. Closes cockroachdb#97347. No release note because unreleased (except perhaps in an alpha). Epic: none Release note: None

97564: kvserver: narrow down 'finishing a proposal with outstanding reproposal' r=pavelkalinnikov a=tbg In #94633, I introduced[^1] an assertion that attempted to catch cases in which we might otherwise accidentally end up applying a proposal twice. This assertion had a false positive. I was able to reproduce the failure within ~minutes via `./experiment.sh` in #97173 as of 33dcdef. Better testing of these cases would be desirable. Unfortunately, while there is an abstraction over command application (`apply.Task`), most of the logic worth testing lives in `(*replicaAppBatch)` which is essentially a `*Replica` with more moving parts attached. This does not lend itself well to unit testing. I had a run[^2][^3][^4] earlier this year to make log application standalone, but then didn't have enough time to follow through. It would be desirable to do so at a later date, perhaps with the explicit goals of having interactions like the one discussion in this PR unit become testable. [^4]: #93309 [^3]: #93266 [^2]: #93239 [^1]: https://github.com/cockroachdb/cockroach/pull/94633/files#diff-50e458584d176deae52b20a7c04461b3e4110795c8c9a307cf7ee6696ba6bc60R238 This assertion was previously trying to assert too much at a distance and was not only incorrect, but additionally inscrutable. It was mixing up two assertions, the first one of which is sensible: If an entry is accepted, it must not be superseded by inflight proposal. If this were violated, this superseded proposal could also apply, resulting in a failure of replay protection. This assertion is now still around as a stand-alone assertion. The other half of the assertion was more confused: if an entry is rejected, it was claiming that it couldn't also be superseded. The thinking was that if a superseding log entry exists, maybe it could apply, and that would be bad since we just told the waiting client that their proposal got rejected. This reasoning is incorrect, as the following example shows. Consider the following initial situation: [lease seq is 1] log idx 99: unrelated cmd at LAI 10000, lease seq = 1 log idx 100: cmd X at LAI 10000, lease seq = 1 And next: - a new lease enters the log at idx 101 (lease seq = 2) - an identical copy of idx 100 enters the log at idx 102 - we apply idx 100, leading to superseding reproposal at idx 103 resulting in the log: [lease seq is 1] log idx 99: unrelated cmd at LAI 10000, lease seq = 1 log idx 100: cmd X at LAI 10000, lease seq = 1 log idx 101: lease seq = 2 log idx 102: (same as idx 100) log idx 103: cmd X at LAI = 20000, lease seq = 1 During application of idx 102, we get a *permanent* rejection and yet the entry is superseded (by the proposal at idx 103). This would erroneously trigger the assertion, even though this is a legal sequence of events with no detrimental outcomes: the superseding proposal will always have the same lease sequence as its superseded copies, so it will also fail. I initially tried only soften the assertion a *little bit*. Observing that the example above led to a *permanent* rejection, should we only require that a proposal (which in this assertion is always local) is not superseded if it got rejected due to its lease index (which implies that it passed the lease check)? It turns out that this is primarily an assertion on when superseded proposals are counted as "local" at this point in the code: if there were multiple copies of this rejected proposal in the current `appTask` (i.e. the current `CommittedEntries` slice handed to us for application by raft), then all copies are initially local; and a copy that successfully spawns a superseding proposal would be made non-local from that point on. On the face of it, All other copies in the same `appTask` would now hit the assertion (erroneously): they are local, they are rejected, so why don't they enter the branch? The magic ingredient is that if an entry is superseded when we handle the lease index rejection, we also unlink the proposal from it. So these never enter this path since it's not local at this point. For example, if these are the log entries to apply (all at valid lease seq): log idx 99: unrelated cmd at LAI 10000 log idx 100: cmd X at LAI 10000 log idx 101: (identical copy of idx 100) and idxs 99-101 are applied in one batch, then idx 100 would spawn a reproposal at a new lease applied index: log idx 99: unrelated cmd at LAI 10000 log idx 100: cmd X at LAI 10000 <- applied log idx 101: (identical copy of idx 100) log idx 100: cmd X at LAI 20000 <- not in current batch When we apply 101, we observe an illegal lease index, but the proposal supersedes the entry, so we mark it as non-local and don't enter the branch that contains the assertion. The above reasoning is very difficult to understand, and it happens too far removed from where the interesting state changes happen. Also, for testing purposes it is interesting to introduce "errors" in the lease applied index assignment to artificially exercise these reproposal mechanisms. In doing so, these assertions can trip because the lease applied index assigned to a reproposal might accidentally (or intentionally!) match the existing lease applied index, in which case copies of the command in the same batch now *don't* consider themselves superseded. The value of this testing outweighs the very limited benefit of this branch of the assertion. An argument could even be made that this assertion alone as negative benefit due to its complexity. We are removing it in this commit and will instead work towards simplifying the mechanisms that played a role in explaining the asssertion. Closes #97102. Closes #97347. Closes #97447. Closes #97612. No release note because unreleased (except perhaps in an alpha). Epic: none Release note: None Co-authored-by: Tobias Grieger <[email protected]>

tbg requested review from pav-kv and a team December 9, 2022 07:38

tbg marked this pull request as ready for review December 9, 2022 07:38

tbg requested a review from a team as a code owner December 9, 2022 07:38

This was referenced Dec 9, 2022

kvserver: handle AddSST for standalone log application #93309

Merged

kvserver: decouple cmd checks in replicaAppBatch #93239

Merged

tbg force-pushed the standalone-app-batch-preflight branch from dc94ac5 to 76d6c47 Compare December 16, 2022 11:26

tbg added 7 commits December 20, 2022 12:20

kvserver: inline migratedReplicatedResult

6ff7938

Epic: CRDB-220 Release note: None

kvserver: move apply-time write byte accounting

8b8e728

This streamlines it by moving it to the method that handles the other side effects. Epic: CRDB-220 Release note: None

kvserver: clarify some comments

3194835

Epic: CRDB-220 Release note: None

kvserver: rename some callees of replicaAppBatch.Stage

6d07fd4

Switch to simple `preAdd / add / postAdd` scheme. Epic: CRDB-220 Release note: None

kvserver: move addWriteBatch

3750147

The next commit will remove its dependency on `replicaAppBatch`. Epic: CRDB-220 Release note: None

kvserver: shift addWriteBatch to appBatch

ccb573b

This also gives appBatch the first field of its own, `mutations`. Epic: CRDB-220 Release note: None

kvserver: introduce runPreAddTriggersReplicaOnly

4daf378

It turns out that the one step previously in `runPreAddTriggers` belongs in it, too, so it was moved there. Epic: CRDB-220 Release note: None

tbg force-pushed the standalone-app-batch-preflight branch from 76d6c47 to a87932c Compare December 20, 2022 11:22

pav-kv approved these changes Dec 20, 2022

View reviewed changes

pav-kv approved these changes Dec 21, 2022

View reviewed changes

tbg added 11 commits December 21, 2022 13:33

kvserver: wean replicaAppBatch off replicaStateMachine

21e5a17

It was holding a reference to it only to access stats. Might as well hold on to the stats directly. Epic: CRDB-220 Release note: None

kvserver: rename a field

bd740a5

`applyStats` is a better name; let's reserve `stats` for `MVCCStats`. Epic: CRDB-220 Release note: None

kvserver: remove redundant assertion

c51a313

This was checked already at this point, earlier in `c.Stage`. Epic: CRDB-220 Release note: None

kvserver: move run{Pre,Post}AddTriggers to appBatch

f81a9f9

No logic change, just re-anchoring these (currently noop) methods to `appBatch`, where they belong. Epic: CRDB-220 Release note: None

apply: clarify that StateMachine is not thread safe

0b2cdb1

Epic: CRDB-220 Release note: None

kvserver: move stats counters from replicaAppBatch to appBatch

7010e28

We want these stats counters during standalone log application as well, primarily so that they can be verified in unit testing. Epic: CRDB-220 Release note: None

kvserver: harmonize field names for apply stats

ba9e0e7

Mechanical, in preparation for sharing a struct among `applyCommittedEntriesStats` and the counters on `appBatch`. Epic: CRDB-220 Release note: None

tbg added 2 commits December 21, 2022 13:33

kvserver: introduce appBatchStats

7151234

Epic: CRDB-220 Release note: None

kvserver: add some TODOs about future renames, clarify some comments

edf6976

Epic: CRDB-220 Release note: None

tbg force-pushed the standalone-app-batch-preflight branch from 94f45da to edf6976 Compare December 21, 2022 12:34

craig bot merged commit e23ef86 into cockroachdb:master Dec 21, 2022

tbg deleted the standalone-app-batch-preflight branch December 21, 2022 14:23

tbg mentioned this pull request Feb 20, 2023

kvserver: fix false positive in 'outstanding reproposal' assertion #97347

Closed

tbg mentioned this pull request Feb 23, 2023

kvserver: narrow down 'finishing a proposal with outstanding reproposal' #97564

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: refactor replicaAppBatch for standalone log application #93266

kvserver: refactor replicaAppBatch for standalone log application #93266

tbg commented Dec 8, 2022 •

edited

Loading

cockroach-teamcity commented Dec 8, 2022

pav-kv left a comment •

edited

Loading

tbg commented Dec 21, 2022

craig bot commented Dec 21, 2022

kvserver: refactor replicaAppBatch for standalone log application #93266

kvserver: refactor replicaAppBatch for standalone log application #93266

Conversation

tbg commented Dec 8, 2022 • edited Loading

cockroach-teamcity commented Dec 8, 2022

pav-kv left a comment • edited Loading

Choose a reason for hiding this comment

tbg commented Dec 21, 2022

craig bot commented Dec 21, 2022

tbg commented Dec 8, 2022 •

edited

Loading

pav-kv left a comment •

edited

Loading