-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvnemesis: add Raft application assertions #116319
kvnemesis: add Raft application assertions #116319
Conversation
03858ed
to
fd9cf09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
Do we have an idea what conditions increase probability of double-apply situations? We could add a new kvnemesis test with tweaked probabilities for the things that we throw into the mix.
I'm guessing though that the most interesting conditions are chaos things like slow connections and down nodes. The more lease/leader moves and snapshots, the merrier.
mu syncutil.Mutex | ||
// appliedCmds is a map of applied command IDs to the range ID and Raft log | ||
// index at which it was applied. | ||
appliedCmds map[kvserverbase.CmdIDKey]*appliedCmd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, we could have a per-(rangeID/replicaID) tracking. It's probably true that within a (rangeID/replicaID) Apply
is called sequentially for all the applied LAIs (except those that were skipped due to snapshots), and LAI always increases. We could assert on this.
Additionally, we could assert that the closed timestamp only increases, in order to catch something that could cause #70894.
Fine to add both assertions in separate commits/PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are "local" assertions though that can be checked by an individual replica, we do have them in the code.
But maybe there is some kind of "global" assertion that would be more powerful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kind of "global" assertion that becomes possible is: after the test is done, piece each rangeID log together by joining it across its replicas. Then check for LAI and CT regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-apply is not the only problem we should be catching. Replica inconsistency can also be caused if there is a single apply that should not have happened. A global join of all the applied commands for a range between all replicas can help spotting such an illegal apply. The global LAI and CT regression check partially detects these cases.
We could also check each lease epoch applies a contiguous span of commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should extend this to include more assertions, but I'm inclined to limit it to only double-applies for now to get something shipped. We can extend it later.
We can consider recording the full sequence of events and only asserting after the test completes, which is maybe more in line with what we want for other assertions. However, if we want to record the actual command (to ease debugging failures) we can end up storing a lot of data in memory. Maybe that's ok though.
fd9cf09
to
110467c
Compare
26a4d9a
to
acf42ec
Compare
Ok, I've added basic double-apply assertions, with snapshot handling. Does this seem reasonable for a first stab? Anything else we should be checking while we're at it? |
I guess we could add LAI and CT tracking while we're at it, even though we do already have some code assertions for it. I'll have to refresh my memory on the invariants there, but "don't regress" seems easy enough. I might add that afterwards. |
acf42ec
to
cc4ca2e
Compare
pkg/kv/kvserver/apply/asserter.go
Outdated
a.ensureRangeLocked(rangeID) | ||
|
||
// Record the applied index. | ||
a.appliedIndex[rangeID][replicaID] = index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we could/should verify for a snapshot is:
- There is at least one applied command at <= log index (i.e. someone, presumably a leader/leaseholder, has applied a command, and then sent a snapshot that includes it; a snapshot can't materialize itself).
- (exception: the initial snapshot index, I think it's 10 or something; need to dig it)
- The command with max
RaftIndex
<= snapshot's index is the last applied one. It always exists in our tracker (can be proved). - The LAI (and maybe some other metadata like CT) of this command matches what the snapshot state says. We can verify also that the MVCC stats match the cumulative MVCC stats of this prefix.
This effectively allows us to "forget" this snapshot. We matched it against a command, and assume that the snapshot represents a prefix ending with this command. There is seemingly no other information we can derive from a snapshot's metadata (like what other commands have been applied).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this for later -- we need to start getting some runtime in the nightlies to look for reapplication.
2560bbb
to
6f49e8d
Compare
One detail about the To check at-most-once semantics for the entire proposal (including all its reproposals of both kinds), we need a canonical ID that is shared between all sub-proposals. The current implementation will only check at-most-once application for each LAI reprpoposal, but not the entire group. To fix that, we can introduce a canonical uber-proposal ID that all sub-proposals copy around and share. Or, with the current code, you could use the ID of the seedProposal as the canonical ID (but then we sort of have to trust it's correct; longer-term we should go back to having a canonical ID that's obviously correct). |
Good catch @pavelkalinnikov. I guess we'll need to also track proposals and reproposals. |
ada0ca8
to
7eb89d8
Compare
abf87dd
to
6d75fcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pav-kv)
pkg/kv/kvserver/apply/asserter.go
line 29 at r12 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you consider adding validation that a request is not skipped on some replicas, if it applied on other replicas?
I dropped it in the interest of time -- I was hoping to have something simple merged last week, so we could start to get some coverage for #114421. I have other things I need to wrap up before the holidays, so I'd prefer to get this in as-is and extend the assertions later, rather than letting this sit unmerged for the next couple of weeks.
pkg/kv/kvserver/apply/asserter.go
line 37 at r12 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We have 5 hash-maps here that are all keyed on rangeID, each with one or more inner hash-maps as values. It's all a bit difficult to read. Consider at least splitting the top-level map off into a separate structure. Something like:
type Asserter struct { mu syncutil.Mutex ranges map[roachpb.RangeID]rangeAsserter } type rangeAsserter struct { proposedCmds map[kvserverbase.CmdIDKey]roachpb.ReplicaID // ... }
Done.
pkg/kv/kvserver/apply/asserter.go
line 249 at r12 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This logic could make it clearer what exactly we're trying to check and how we're going to check it, both in the comment and in the name of this variable. Consider
seedAndReproposedCmds
for the name.
Rewrote the logic.
06bc9bc
to
8063415
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pav-kv)
pkg/kv/kvserver/apply/asserter.go
line 29 at r12 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I dropped it in the interest of time -- I was hoping to have something simple merged last week, so we could start to get some coverage for #114421. I have other things I need to wrap up before the holidays, so I'd prefer to get this in as-is and extend the assertions later, rather than letting this sit unmerged for the next couple of weeks.
Rewrote this to track the applied Raft log, and added additional assertions, including that all replicas apply all commands.
Hoping we can get this merged today, before the holidays, since we need to get some runtime before shipping 23.2.
8063415
to
b46264b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to start with. I didn't review too closely, good if Nathan takes another look.
pkg/kv/kvserver/apply/asserter.go
Outdated
// INVARIANT: lease requests never set a LAI or closed timestamp. These have | ||
// their own replay protection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this lease replay protection? The epoch number? TODO adding some assertions for regressions in this mechanism too?
Not for this PR, but I'm curious how the two mechanisms interact and don't step on each other's toes. Intuitively, it's not enough to have two independent mechanisms, they have to play well together. TODO adding some assertions to this regard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a conditional put, predicated on the old lease record. Mentioned it in the comment.
pkg/kv/kvserver/apply/asserter.go
Outdated
// INVARIANT: the Raft index must progress. | ||
if entry.raftIndex <= s.raftIndex { | ||
fail("Raft index regression %d -> %d", s.raftIndex, entry.raftIndex) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably trivially true by construction, from the fact that stateAt(k)
returns raftIndex
<= k, and we're in the else
branch in which this k < entry.raftIndex
.
I.e. this is the invariant of this Asserter
(rather than what we're testing), and we can probably replace it by a simpler one that log[i].raftIndex == i
(except when it's gap/zero).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but figured we can keep it in as documentation anyway. Added a comment saying as much.
// INVARIANT: the Raft term must not regress. | ||
if entry.raftTerm < s.raftTerm { | ||
fail("Raft term regression %d -> %d", s.raftTerm, entry.raftTerm) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is provided by raft (the committed log entries have a non-decreasing term), not sure we need to test it. Kinda belongs to the abstraction layer below.
The quirk with this check is gaps. stateAt()
returns some term
at some index
(skipping all the no-op gaps), so this check isn't testing this invariant for the entire raft log, just its applied subset.
Feel free to leave it in though, it doesn't harm to have an extra check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but that's the case for many other assertions here as well (e.g. that all replicas apply the same command at the same index). We may as well assert these invariants here as well, since we rely on them holding. Don't feel particularly strongly about it though, but it doesn't really cost us anything either.
Epic: none Release note: None
Epic: none Release note: None
Epic: none Release note: None
Epic: none Release note: None
Epic: none Release note: None
b46264b
to
9c0a33c
Compare
This adds basic kvnemesis assertions for Raft command application: * A request is only applied once, at a single log index and lease applied index across all replicas (i.e. no double-applies or replays). * Commands do not regress the Raft index/term, lease applied index, and closed timestamp (assuming no node restarts). * All replicas apply the same commands in the same order at the same positions. Epic: none Release note: None
9c0a33c
to
05cff8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r21, 2 of 3 files at r22, 2 of 2 files at r23, 4 of 4 files at r25, 1 of 1 files at r26, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker and @pav-kv)
TFTR! bors r+ |
Build succeeded! And happy new year! 🎉 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from a9f4364 to blathers/backport-release-23.2-116319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
131503: apply: fix initial snapshot assertion r=arulajmani a=pav-kv The broad assertion here is that all the entries composing a snapshot must have been applied by someone before (at least the sender of the snapshot has done it, otherwise it couldn’t have sent this snapshot). And then there is an additional assertion (which failed occasionally before this PR) that there can’t be a snapshot without at least one entry applied. This is generally true, except that there are entries not registered by the "apply" stack: particularly, empty entries that raft leader appends at the start of its term. There is a comment to this extent in the `asserter.go` code: https://github.com/cockroachdb/cockroach/blob/5b0371b44009ac2ae3f58ea7ed95b290dd4e8227/pkg/kv/kvserver/apply/asserter.go#L254-L259 This commit modifies the snapshot assertion to correctly handle this case. It is possible that an initial snapshot sent from leader to followers contains only this dummy entry, and there were no "real" proposals applied. Fixes the false assertion failure in #118471 (comment) Related to #116319 Co-authored-by: Pavel Kalinnikov <[email protected]>
This adds basic kvnemesis assertions for Raft command application:
A request is only applied once, at a single log index and lease applied index across all replicas (i.e. no double-applies or replays).
Commands do not regress the Raft index/term, lease applied index, and closed timestamp (assuming no node restarts).
All replicas apply the same commands in the same order at the same positions.
Resolves #115771.
Touches #114421.
Epic: none
Release note: None