kvserver: runtime assertions for double-application and duplicate proposals #115771

erikgrinaker · 2023-12-07T14:51:16Z

In the context of #114421, we should ensure we have sufficient runtime assertions to detect double-application/replays of Raft proposals. This may already exist, but we should confirm. Ideally, this would be enabled in roachtests.

Jira issue: CRDB-34202

Epic CRDB-32846

blathers-crl · 2023-12-07T14:51:19Z

cc @cockroachdb/replication

blathers-crl · 2023-12-07T14:52:58Z

Hi @erikgrinaker, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

pav-kv · 2023-12-07T21:38:39Z

We have this flag tracking whether the application has happened, and some safety checks for it to not be true in a few places. Although I think this is scoped to one sub-proposal, not the entire proposal.

After the refactoring which made it possible to have multiple ProposalData per single "logical" proposal, this flag is present in multiple copies, so I'm not absolutely sure if setting it to true prevents other parts of the pipeline seeing it as false (if reading from a copied ProposalData).

pav-kv · 2023-12-07T21:55:53Z

Which again leads me to conclude that the reproposals area needs a cleaner lifetime tracking. There are things "common" across all the (re-)proposals of a single proposal, and things that are unique. The previous design puts everything into a single mutable struct, the new design puts everything into separate semi-immutable structs but copies/moves the shared things between them. The design that we need: a shared part for the "logical" proposal, and a set of per-reproposal structs that link into it.

The "shared" proposal struct then makes it possible to track the entire state of a single proposal rather than piece it together from multiple structs. It also makes invariant checks more reliable, like the "applied only once" check. Issue #115759 is a stepping stone towards such a restructure - we can first build a solid set of invariants in test-only environment, plus cover it with tests, and then slowly transition it to better code with inline assertions.

Previous problem that we had because of this copying: #108775.

erikgrinaker · 2023-12-13T09:57:51Z

Downgrading to non-blocker, test coverage.

erikgrinaker · 2023-12-13T10:29:42Z

The "shared" proposal struct then makes it possible to track the entire state of a single proposal rather than piece it together from multiple structs. It also makes invariant checks more reliable, like the "applied only once" check. Issue #115759 is a stepping stone towards such a restructure - we can first build a solid set of invariants in test-only environment, plus cover it with tests, and then slowly transition it to better code with inline assertions.

That's a longer-term change though. We need something in the next week to try to track down #114421. Thoughts on this, @pavelkalinnikov?

erikgrinaker · 2023-12-13T10:57:12Z

So we already have this assertion:

cockroach/pkg/kv/kvserver/replica_application_decoder.go

Lines 95 to 109 in f580049

    
           // INVARIANT: a proposal is consumed (i.e. removed from the proposals map) 
        
           // the first time it comes up for application. (If the proposal fails due 
        
           // to an illegal LeaseAppliedIndex, a new proposal might be spawned to 
        
           // retry but that proposal will be unrelated as far as log application is 
        
           // concerned). 
        
           // 
        
           // INVARIANT: local proposals are not in the proposals map while being 
        
           // applied, and they never re-enter the proposals map either during or 
        
           // afterwards. 
        
           // 
        
           // (propBuf.{Insert,ReinsertLocked} ignores proposals that have 
        
           // v2SeenDuringApplicationSet to make this true). 
        
           if cmd.proposal.v2SeenDuringApplication { 
        
           	err := errors.AssertionFailedf("ProposalData seen twice during application: %+v", cmd.proposal) 
        
           	logcrash.ReportOrPanic(d.r.AnnotateCtx(cmd.ctx), &d.r.store.ClusterSettings().SV, "%v", err)

I don't think that does what we want though, because as you say, it only applies to a sub-(re)proposal not the logical client proposal.

I know Tobi added a bunch of other assertions that were later removed because they weren't actually invariant. I'll try to dig that up just to see what we've already tried.

pav-kv · 2023-12-13T10:59:18Z

In a general sense, asserting/detecting double-apply situations is a costly problem. It basically requires a global map of all the proposal IDs ever, and checking that an applied command is not yet in this map.

On the leaseholder, we can reduce the scope of this checking to a single lease epoch. The Replica.mu.proposals sort of plays the role of "all the proposal IDs ever" (scoped to this lease epoch), but it forgets the proposals after application. And that's the "problem", because what if there is another application soon after we forgot the proposal?

So we can't use this simple map-based ID checking approach. Instead, we must rely on maths (e.g. see INVARIANT things quoted above).

erikgrinaker · 2023-12-13T11:01:21Z

Replica.mu.proposals also only tracks proposals which were submitted by the local node, so it won't catch double-application on followers. It also won't detect double-application with snapshot application.

pav-kv · 2023-12-13T11:01:42Z

Sort of things that we rely on are along the lines of:

there is always at most one appliable proposal in flight
we only repropose it with a new LAI if it came back from the application pipeline as a no-op (and now there is exactly zero appliable proposals in flight).

pav-kv · 2023-12-13T11:09:01Z

so it won't catch double-application on followers. It also won't detect double-application with snapshot application

Exactly. On the followers detecting this is costly: we basically need to introduce a map by some ID to detect dups. On the leaseholder this is somewhat free because we need this map for other purposes. But this isn't exactly the map that would detect all dups flawlessly, because we remove from it.

Snapshot application can happen on followers and (as I learned recently) on the leaseholder, so yeah, this additionally makes it impossible to observe all the applied IDs even on the leaseholder.

erikgrinaker · 2023-12-13T11:10:29Z

asserting/detecting double-apply situations is a costly problem. It basically requires a global map of all the proposal IDs ever, and checking that an applied command is not yet in this map.

We could assume that double-apply is most likely to happen for recent operations though, and keep a ring buffer of e.g. the last 100k command IDs. We could persist this ring buffer in the state machine, such that it's propagated in snapshots.

Clearly, we could only do this in test builds, since it's likely going to be too costly otherwise. But it doesn't seem terribly complicated to implement something like this (famous last words).

pav-kv · 2023-12-13T11:13:32Z

For test-only purposes (esp. small-/mid-size unit-tests), we can just have a global map of all the proposals, track their lifetime transitions, and check all the invariants and semantical expectations. This is adjacent to #115759.

erikgrinaker · 2023-12-13T11:14:42Z

Sure, that covers the proposal lifecycle, but doesn't fully address the double-apply problem does it? Because of snapshots, lease transfers, etc.

pav-kv · 2023-12-13T11:18:31Z

In tests we can intercept all the apply events, both on leaseholders and followers. Every log index will be applied explicitly (not via a snapshot) at least once, so we can build a global picture of applied commands by joining the lifetimes across replicas.

erikgrinaker · 2023-12-13T11:19:43Z

I think we want something we can assert in roachtests and test clusters, which exercise a wider range of scenarios.

pav-kv · 2023-12-13T11:23:35Z

I guess the same technique can be done in a distributed env. Periodically dump all the applied (index, command ID) to a log/table/whatever. At the end of the test, do a global join of these things and see if there are command ID dups.

pav-kv · 2023-12-13T11:31:58Z

If this "applied commands" is a SQL table, things could be detected timely by inserting batches of recently applied commands to this table, and having some clever key constraints that would detect dups for us.

erikgrinaker · 2023-12-13T11:35:15Z

Maybe we can build something like this into kvnemesis. It already does similar tracking and analysis for serializable transaction schedules.

Note that we need something here that can be built, merged, and backported in ~2 days, so let's not get too fancy.

pav-kv · 2023-12-13T11:43:35Z

Does kvnemesis use TestCluster? In this case, we could inject a testing knob that dumps all the applied (command ID, raft index) pairs to us, which we (the test) then immediately can put into a Go map[commandID]index to detect a commandID dup.

We basically need the TestingPostApplySideEffectsFilter interceptor from PR #114191 or somesuch (crucial: it has to be called on every replica that applies the command, not just the leaseholder; this will guarantee that all the applied commands get recorded irrespective of snapshots).

erikgrinaker · 2023-12-13T11:47:43Z

Yes, it uses TestCluster. I think we already have what we need in TestingPostApplyFilter, assuming we can get the log index from ReplicatedEvalResult.State.RaftAppliedIndex or LeaseAppliedIndex or some such.

We'll also want to track this across splits/merges.

pav-kv · 2023-12-13T11:49:07Z

Command ID (or proposal IDs, need to check how they are called) are likely globally unique, so the across splits/merges aspect should come automatically. Just track all proposals globally.

erikgrinaker · 2023-12-13T11:51:22Z

likely globally unique

They're not, they're a random 8-byte sequence, so there is a non-zero collision probability.

the across splits/merges aspect should come automatically

If we're going to tie them to log indexes, the log indexes will change with splits/merges. So we'll probably need a range/index tuple.

But I don't know if it even makes sense to track command IDs across ranges, since proposals, logs, and applications are range-local by definition. I don't see how a proposal/command could leak to a different Raft group.

erikgrinaker · 2023-12-13T12:20:33Z

We are probably fine to assume it won't happen at the unit test scale. But to be safer we can mix in the range ID into the key, so that we find dups for (rangeID, commandID).

If it ever becomes a problem, we can inject a different command ID generator that's guaranteed to be unique in unit tests, e.g. UUIDv1.

pav-kv · 2023-12-13T12:47:03Z

Apparently github can double-apply proposals too. I sent a comment, it appeared twice in the timeline. So I removed the second one. Now there is none :)

I was saying that the proposals can be carried across ranges (stay with the LHS during splits, and inherited from LHS during merges), but will be associated with the same rangeID because the LHS keeps it.

The proposal being a random int64 can become a problem if there were too many proposals that we were tracking at the same time. The probability of a collision is ~50% if we reach ~5 billion proposals (see "birthday paradox").

We are probably fine to assume it won't happen at the unit test scale. But to be safer we can mix in the range ID into the key, so that we find dups for (rangeID, commandID).

I am slightly concerned that this can happen in prod though. We detect it and panic at the insertion time. But since we're removing from this map too, it's possible that 2 different proposals have had the same ID after a while. We're probably fine since it won't happen within a short period. But.

Relying on Go's random number generator for crucial correctness properties seems incorrect by design (if a collision happens, we might ack an incorrect proposal or something). It would better be a UUID, a cryptosecure hash of the command content, or (best of all) the commands would be identified by a (rangeID, lease epoch, LAI) tuple or similar, and processed sequentially [#116020]. A random ID should be best used for sanity checking rather than relied upon.

erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Dec 7, 2023

erikgrinaker added the GA-blocker label Dec 7, 2023

erikgrinaker added the branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 label Dec 7, 2023

erikgrinaker mentioned this issue Dec 7, 2023

kv,storage: replica inconsistency in 23.2 scale test #114421

Closed

erikgrinaker self-assigned this Dec 13, 2023

erikgrinaker removed the GA-blocker label Dec 13, 2023

erikgrinaker mentioned this issue Dec 13, 2023

kvnemesis: add Raft application assertions #116319

Merged

exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jan 2, 2024

craig bot closed this as completed in e218b13 Jan 2, 2024

erikgrinaker mentioned this issue Jan 3, 2024

release-23.2: kvnemesis: add Raft application assertions #117271

Merged

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Closed in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: runtime assertions for double-application and duplicate proposals #115771

kvserver: runtime assertions for double-application and duplicate proposals #115771

erikgrinaker commented Dec 7, 2023 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Dec 7, 2023

blathers-crl bot commented Dec 7, 2023

pav-kv commented Dec 7, 2023 •

edited

Loading

pav-kv commented Dec 7, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023 •

edited

Loading

kvserver: runtime assertions for double-application and duplicate proposals #115771

kvserver: runtime assertions for double-application and duplicate proposals #115771

Comments

erikgrinaker commented Dec 7, 2023 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Dec 7, 2023

blathers-crl bot commented Dec 7, 2023

pav-kv commented Dec 7, 2023 • edited Loading

pav-kv commented Dec 7, 2023 • edited Loading

erikgrinaker commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023 • edited Loading

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023 • edited Loading

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023 • edited Loading

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023 • edited Loading

pav-kv commented Dec 13, 2023 • edited Loading

erikgrinaker commented Dec 13, 2023 • edited Loading

pav-kv commented Dec 13, 2023 • edited Loading

erikgrinaker commented Dec 13, 2023 • edited Loading

pav-kv commented Dec 13, 2023 • edited Loading

erikgrinaker commented Dec 13, 2023 • edited Loading

erikgrinaker commented Dec 13, 2023

pav-kv commented Dec 13, 2023 • edited Loading

erikgrinaker commented Dec 7, 2023 •

edited by exalate-issue-sync bot

Loading

pav-kv commented Dec 7, 2023 •

edited

Loading

pav-kv commented Dec 7, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading

erikgrinaker commented Dec 13, 2023 •

edited

Loading

pav-kv commented Dec 13, 2023 •

edited

Loading