-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: runtime assertions for double-application and duplicate proposals #115771
Comments
cc @cockroachdb/replication |
Hi @erikgrinaker, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
We have this flag tracking whether the application has happened, and some safety checks for it to not be After the refactoring which made it possible to have multiple |
Which again leads me to conclude that the reproposals area needs a cleaner lifetime tracking. There are things "common" across all the (re-)proposals of a single proposal, and things that are unique. The previous design puts everything into a single mutable struct, the new design puts everything into separate semi-immutable structs but copies/moves the shared things between them. The design that we need: a shared part for the "logical" proposal, and a set of per-reproposal structs that link into it. The "shared" proposal struct then makes it possible to track the entire state of a single proposal rather than piece it together from multiple structs. It also makes invariant checks more reliable, like the "applied only once" check. Issue #115759 is a stepping stone towards such a restructure - we can first build a solid set of invariants in test-only environment, plus cover it with tests, and then slowly transition it to better code with inline assertions. Previous problem that we had because of this copying: #108775. |
Downgrading to non-blocker, test coverage. |
That's a longer-term change though. We need something in the next week to try to track down #114421. Thoughts on this, @pavelkalinnikov? |
So we already have this assertion: cockroach/pkg/kv/kvserver/replica_application_decoder.go Lines 95 to 109 in f580049
I don't think that does what we want though, because as you say, it only applies to a sub-(re)proposal not the logical client proposal. I know Tobi added a bunch of other assertions that were later removed because they weren't actually invariant. I'll try to dig that up just to see what we've already tried. |
In a general sense, asserting/detecting double-apply situations is a costly problem. It basically requires a global map of all the proposal IDs ever, and checking that an applied command is not yet in this map. On the leaseholder, we can reduce the scope of this checking to a single lease epoch. The So we can't use this simple map-based ID checking approach. Instead, we must rely on maths (e.g. see INVARIANT things quoted above). |
|
Sort of things that we rely on are along the lines of:
|
Exactly. On the followers detecting this is costly: we basically need to introduce a map by some ID to detect dups. On the leaseholder this is somewhat free because we need this map for other purposes. But this isn't exactly the map that would detect all dups flawlessly, because we remove from it. Snapshot application can happen on followers and (as I learned recently) on the leaseholder, so yeah, this additionally makes it impossible to observe all the applied IDs even on the leaseholder. |
We could assume that double-apply is most likely to happen for recent operations though, and keep a ring buffer of e.g. the last 100k command IDs. We could persist this ring buffer in the state machine, such that it's propagated in snapshots. Clearly, we could only do this in test builds, since it's likely going to be too costly otherwise. But it doesn't seem terribly complicated to implement something like this (famous last words). |
For test-only purposes (esp. small-/mid-size unit-tests), we can just have a global map of all the proposals, track their lifetime transitions, and check all the invariants and semantical expectations. This is adjacent to #115759. |
Sure, that covers the proposal lifecycle, but doesn't fully address the double-apply problem does it? Because of snapshots, lease transfers, etc. |
In tests we can intercept all the apply events, both on leaseholders and followers. Every log index will be applied explicitly (not via a snapshot) at least once, so we can build a global picture of applied commands by joining the lifetimes across replicas. |
I think we want something we can assert in roachtests and test clusters, which exercise a wider range of scenarios. |
I guess the same technique can be done in a distributed env. Periodically dump all the applied (index, command ID) to a log/table/whatever. At the end of the test, do a global join of these things and see if there are command ID dups. |
If this "applied commands" is a SQL table, things could be detected timely by inserting batches of recently applied commands to this table, and having some clever key constraints that would detect dups for us. |
Maybe we can build something like this into kvnemesis. It already does similar tracking and analysis for serializable transaction schedules. Note that we need something here that can be built, merged, and backported in ~2 days, so let's not get too fancy. |
Does We basically need the |
Yes, it uses We'll also want to track this across splits/merges. |
Command ID (or proposal IDs, need to check how they are called) are likely globally unique, so the across splits/merges aspect should come automatically. Just track all proposals globally. |
They're not, they're a random 8-byte sequence, so there is a non-zero collision probability.
If we're going to tie them to log indexes, the log indexes will change with splits/merges. So we'll probably need a range/index tuple. But I don't know if it even makes sense to track command IDs across ranges, since proposals, logs, and applications are range-local by definition. I don't see how a proposal/command could leak to a different Raft group. |
If it ever becomes a problem, we can inject a different command ID generator that's guaranteed to be unique in unit tests, e.g. UUIDv1. |
Apparently github can double-apply proposals too. I sent a comment, it appeared twice in the timeline. So I removed the second one. Now there is none :) I was saying that the proposals can be carried across ranges (stay with the LHS during splits, and inherited from LHS during merges), but will be associated with the same rangeID because the LHS keeps it. The proposal being a random int64 can become a problem if there were too many proposals that we were tracking at the same time. The probability of a collision is ~50% if we reach ~5 billion proposals (see "birthday paradox"). We are probably fine to assume it won't happen at the unit test scale. But to be safer we can mix in the range ID into the key, so that we find dups for (rangeID, commandID). I am slightly concerned that this can happen in prod though. We detect it and panic at the insertion time. But since we're removing from this map too, it's possible that 2 different proposals have had the same ID after a while. We're probably fine since it won't happen within a short period. But. Relying on Go's random number generator for crucial correctness properties seems incorrect by design (if a collision happens, we might ack an incorrect proposal or something). It would better be a UUID, a cryptosecure hash of the command content, or (best of all) the commands would be identified by a (rangeID, lease epoch, LAI) tuple or similar, and processed sequentially [#116020]. A random ID should be best used for sanity checking rather than relied upon. |
In the context of #114421, we should ensure we have sufficient runtime assertions to detect double-application/replays of Raft proposals. This may already exist, but we should confirm. Ideally, this would be enabled in roachtests.
Jira issue: CRDB-34202
Epic CRDB-32846
The text was updated successfully, but these errors were encountered: