Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvnemesis: add Raft application assertions #116319

Merged
merged 6 commits into from
Jan 2, 2024

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Dec 13, 2023

This adds basic kvnemesis assertions for Raft command application:

  • A request is only applied once, at a single log index and lease applied index across all replicas (i.e. no double-applies or replays).

  • Commands do not regress the Raft index/term, lease applied index, and closed timestamp (assuming no node restarts).

  • All replicas apply the same commands in the same order at the same positions.

Resolves #115771.
Touches #114421.
Epic: none
Release note: None

@erikgrinaker erikgrinaker requested a review from pav-kv December 13, 2023 13:44
@erikgrinaker erikgrinaker self-assigned this Dec 13, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch from 03858ed to fd9cf09 Compare December 13, 2023 13:45
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

Do we have an idea what conditions increase probability of double-apply situations? We could add a new kvnemesis test with tweaked probabilities for the things that we throw into the mix.

I'm guessing though that the most interesting conditions are chaos things like slow connections and down nodes. The more lease/leader moves and snapshots, the merrier.

pkg/kv/kvserver/raft_apply_asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/raft_apply_asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/raft_apply_asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/raft_apply_asserter.go Outdated Show resolved Hide resolved
mu syncutil.Mutex
// appliedCmds is a map of applied command IDs to the range ID and Raft log
// index at which it was applied.
appliedCmds map[kvserverbase.CmdIDKey]*appliedCmd
Copy link
Collaborator

@pav-kv pav-kv Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we could have a per-(rangeID/replicaID) tracking. It's probably true that within a (rangeID/replicaID) Apply is called sequentially for all the applied LAIs (except those that were skipped due to snapshots), and LAI always increases. We could assert on this.

Additionally, we could assert that the closed timestamp only increases, in order to catch something that could cause #70894.

Fine to add both assertions in separate commits/PRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are "local" assertions though that can be checked by an individual replica, we do have them in the code.

But maybe there is some kind of "global" assertion that would be more powerful here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kind of "global" assertion that becomes possible is: after the test is done, piece each rangeID log together by joining it across its replicas. Then check for LAI and CT regressions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-apply is not the only problem we should be catching. Replica inconsistency can also be caused if there is a single apply that should not have happened. A global join of all the applied commands for a range between all replicas can help spotting such an illegal apply. The global LAI and CT regression check partially detects these cases.

We could also check each lease epoch applies a contiguous span of commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should extend this to include more assertions, but I'm inclined to limit it to only double-applies for now to get something shipped. We can extend it later.

We can consider recording the full sequence of events and only asserting after the test completes, which is maybe more in line with what we want for other assertions. However, if we want to record the actual command (to ease debugging failures) we can end up storing a lot of data in memory. Maybe that's ok though.

@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch from fd9cf09 to 110467c Compare December 15, 2023 11:54
@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch 2 times, most recently from 26a4d9a to acf42ec Compare December 15, 2023 13:16
@erikgrinaker
Copy link
Contributor Author

Ok, I've added basic double-apply assertions, with snapshot handling. Does this seem reasonable for a first stab? Anything else we should be checking while we're at it?

@erikgrinaker
Copy link
Contributor Author

I guess we could add LAI and CT tracking while we're at it, even though we do already have some code assertions for it. I'll have to refresh my memory on the invariants there, but "don't regress" seems easy enough. I might add that afterwards.

pkg/kv/kvserver/apply/asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/apply/asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/apply/asserter.go Outdated Show resolved Hide resolved
@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch from acf42ec to cc4ca2e Compare December 15, 2023 14:06
a.ensureRangeLocked(rangeID)

// Record the applied index.
a.appliedIndex[rangeID][replicaID] = index
Copy link
Collaborator

@pav-kv pav-kv Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we could/should verify for a snapshot is:

  • There is at least one applied command at <= log index (i.e. someone, presumably a leader/leaseholder, has applied a command, and then sent a snapshot that includes it; a snapshot can't materialize itself).
  • (exception: the initial snapshot index, I think it's 10 or something; need to dig it)
  • The command with max RaftIndex <= snapshot's index is the last applied one. It always exists in our tracker (can be proved).
  • The LAI (and maybe some other metadata like CT) of this command matches what the snapshot state says. We can verify also that the MVCC stats match the cumulative MVCC stats of this prefix.

This effectively allows us to "forget" this snapshot. We matched it against a command, and assume that the snapshot represents a prefix ending with this command. There is seemingly no other information we can derive from a snapshot's metadata (like what other commands have been applied).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this for later -- we need to start getting some runtime in the nightlies to look for reapplication.

@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch 2 times, most recently from 2560bbb to 6f49e8d Compare December 15, 2023 16:26
@pav-kv
Copy link
Collaborator

pav-kv commented Dec 15, 2023

One detail about the kvserverbase.CmdIDKey: reproposals that bump the LAI, are proposed under a different ID (after Tobi's refactoring). Previously they all used to share the same ID, as well as the corresponding proposals map slot.

To check at-most-once semantics for the entire proposal (including all its reproposals of both kinds), we need a canonical ID that is shared between all sub-proposals. The current implementation will only check at-most-once application for each LAI reprpoposal, but not the entire group.

To fix that, we can introduce a canonical uber-proposal ID that all sub-proposals copy around and share. Or, with the current code, you could use the ID of the seedProposal as the canonical ID (but then we sort of have to trust it's correct; longer-term we should go back to having a canonical ID that's obviously correct).

@erikgrinaker
Copy link
Contributor Author

Good catch @pavelkalinnikov. I guess we'll need to also track proposals and reproposals.

@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch 8 times, most recently from ada0ca8 to 7eb89d8 Compare December 19, 2023 12:33
@erikgrinaker erikgrinaker changed the title kvnemesis: add Raft double-apply assertions kvnemesis: add Raft application assertions Dec 19, 2023
@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch 3 times, most recently from abf87dd to 6d75fcd Compare December 20, 2023 11:38
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pav-kv)


pkg/kv/kvserver/apply/asserter.go line 29 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you consider adding validation that a request is not skipped on some replicas, if it applied on other replicas?

I dropped it in the interest of time -- I was hoping to have something simple merged last week, so we could start to get some coverage for #114421. I have other things I need to wrap up before the holidays, so I'd prefer to get this in as-is and extend the assertions later, rather than letting this sit unmerged for the next couple of weeks.


pkg/kv/kvserver/apply/asserter.go line 37 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We have 5 hash-maps here that are all keyed on rangeID, each with one or more inner hash-maps as values. It's all a bit difficult to read. Consider at least splitting the top-level map off into a separate structure. Something like:

type Asserter struct {
    mu syncutil.Mutex
    ranges map[roachpb.RangeID]rangeAsserter
}

type rangeAsserter struct {
    proposedCmds map[kvserverbase.CmdIDKey]roachpb.ReplicaID
    // ...
}

Done.


pkg/kv/kvserver/apply/asserter.go line 249 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This logic could make it clearer what exactly we're trying to check and how we're going to check it, both in the comment and in the name of this variable. Consider seedAndReproposedCmds for the name.

Rewrote the logic.

@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch 2 times, most recently from 06bc9bc to 8063415 Compare December 21, 2023 12:20
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pav-kv)


pkg/kv/kvserver/apply/asserter.go line 29 at r12 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I dropped it in the interest of time -- I was hoping to have something simple merged last week, so we could start to get some coverage for #114421. I have other things I need to wrap up before the holidays, so I'd prefer to get this in as-is and extend the assertions later, rather than letting this sit unmerged for the next couple of weeks.

Rewrote this to track the applied Raft log, and added additional assertions, including that all replicas apply all commands.

Hoping we can get this merged today, before the holidays, since we need to get some runtime before shipping 23.2.

@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch from 8063415 to b46264b Compare December 21, 2023 12:26
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to start with. I didn't review too closely, good if Nathan takes another look.

pkg/kv/kvserver/apply/asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/apply/asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/apply/asserter.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/apply/asserter.go Outdated Show resolved Hide resolved
Comment on lines 175 to 176
// INVARIANT: lease requests never set a LAI or closed timestamp. These have
// their own replay protection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this lease replay protection? The epoch number? TODO adding some assertions for regressions in this mechanism too?

Not for this PR, but I'm curious how the two mechanisms interact and don't step on each other's toes. Intuitively, it's not enough to have two independent mechanisms, they have to play well together. TODO adding some assertions to this regard?

Copy link
Contributor Author

@erikgrinaker erikgrinaker Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a conditional put, predicated on the old lease record. Mentioned it in the comment.

Comment on lines 279 to 287
// INVARIANT: the Raft index must progress.
if entry.raftIndex <= s.raftIndex {
fail("Raft index regression %d -> %d", s.raftIndex, entry.raftIndex)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably trivially true by construction, from the fact that stateAt(k) returns raftIndex <= k, and we're in the else branch in which this k < entry.raftIndex.

I.e. this is the invariant of this Asserter (rather than what we're testing), and we can probably replace it by a simpler one that log[i].raftIndex == i (except when it's gap/zero).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but figured we can keep it in as documentation anyway. Added a comment saying as much.

Comment on lines +283 to +291
// INVARIANT: the Raft term must not regress.
if entry.raftTerm < s.raftTerm {
fail("Raft term regression %d -> %d", s.raftTerm, entry.raftTerm)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is provided by raft (the committed log entries have a non-decreasing term), not sure we need to test it. Kinda belongs to the abstraction layer below.

The quirk with this check is gaps. stateAt() returns some term at some index (skipping all the no-op gaps), so this check isn't testing this invariant for the entire raft log, just its applied subset.

Feel free to leave it in though, it doesn't harm to have an extra check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but that's the case for many other assertions here as well (e.g. that all replicas apply the same command at the same index). We may as well assert these invariants here as well, since we rely on them holding. Don't feel particularly strongly about it though, but it doesn't really cost us anything either.

@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch from b46264b to 9c0a33c Compare January 2, 2024 14:51
This adds basic kvnemesis assertions for Raft command application:

* A request is only applied once, at a single log index and lease
  applied index across all replicas (i.e. no double-applies or replays).

* Commands do not regress the Raft index/term, lease applied index, and
  closed timestamp (assuming no node restarts).

* All replicas apply the same commands in the same order at the same
  positions.

Epic: none
Release note: None
@erikgrinaker erikgrinaker force-pushed the replica-assert-double-apply branch from 9c0a33c to 05cff8e Compare January 2, 2024 15:24
@nvanbenschoten nvanbenschoten requested a review from pav-kv January 2, 2024 15:25
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r21, 2 of 3 files at r22, 2 of 2 files at r23, 4 of 4 files at r25, 1 of 1 files at r26, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker and @pav-kv)

@erikgrinaker
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 2, 2024

Build succeeded!

And happy new year! 🎉

@craig craig bot merged commit e218b13 into cockroachdb:master Jan 2, 2024
8 of 10 checks passed
Copy link

blathers-crl bot commented Jan 2, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from a9f4364 to blathers/backport-release-23.2-116319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

craig bot pushed a commit that referenced this pull request Sep 27, 2024
131503: apply: fix initial snapshot assertion r=arulajmani a=pav-kv

The broad assertion here is that all the entries composing a snapshot must have been applied by someone before (at least the sender of the snapshot has done it, otherwise it couldn’t have sent this snapshot).

And then there is an additional assertion (which failed occasionally before this PR) that there can’t be a snapshot without at least one entry applied. This is generally true, except that there are entries not registered by the "apply" stack: particularly, empty entries that raft leader appends at the start of its term. There is a comment to this extent in the `asserter.go` code:
 https://github.com/cockroachdb/cockroach/blob/5b0371b44009ac2ae3f58ea7ed95b290dd4e8227/pkg/kv/kvserver/apply/asserter.go#L254-L259

This commit modifies the snapshot assertion to correctly handle this case. It is possible that an initial snapshot sent from leader to followers contains only this dummy entry, and there were no "real" proposals applied.

Fixes the false assertion failure in #118471 (comment)
Related to #116319

Co-authored-by: Pavel Kalinnikov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: runtime assertions for double-application and duplicate proposals
4 participants