-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: purge gc-able, unmigrated replicas during migrations #62838
Conversation
Fixes cockroachdb#58378. Fixes cockroachdb#62267. Previously it was possible for us to have replicas in-memory, with pre-migrated state, even after a migration was finalized. This led to the kind of badness we were observing in cockroachdb#62267, where it appeared that a replica was not using the applied state key despite us having migrated into it (see TruncatedAndRangeAppliedState, introduced in cockroachdb#58088). --- To see how, consider the following set of events: - Say r42 starts off on n1, n2, and n3 - n3 flaps and so we place a replica for r42 on n4 - n3's replica, r42/3, is now GC-able, but still un-GC-ed - We run the applied state migration, first migrating all ranges into it and then purging outdated replicas - Well, we should want to purge r42/3, cause it's un-migrated and evaluating anything on it (say a lease request) is unsound because we've bumped version gates that tell the kvserver to always expect post-migration state - What happens when we try to purge r42/3? Previous to this PR if it didn't have a replica version, we'd skip over it (!) - Was it possible for r42/3 to not have a replica version? Shouldn't it have been accounted for when we migrated all ranges? No, that's precisely why the migration infrastructure purge outdated replicas. The migrate request only returns once its applied on all followers; in our example that wouldn't include r42/3 since it was no longer one - The stop-gap in cockroachdb#60429 made it so that we didn't GC r42/3, when we should've been doing the opposite. When iterating over a store's replicas for purging purposes, an empty replica version is fine and expected; we should interpret that as signal that we're dealing with a replica that was obviously never migrated (to even start using replica versions in the first place). Because it didn't have a valid replica version installed, we can infer that it's soon to be GC-ed (else we wouldn't have been able to finalize the applied state + replica version migration) - The conditions above made it possible for us to evaluate requests on replicas with migration state out-of-date relative to the store's version - Boom Release note: None
8d4b905
to
1416a4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work tracking this down! Really glad you took that shower. How confident are we that this fixes version-upgrade? What you're saying is plausible, I just remember looking at the logs and not being able to explain to myself why we were even replicaGC'ing that replica (there wasn't any prior rebalancing of it). Were you able to find out what we missed then? I am pretty sure we wouldn't replicaGC by accident, so there is likely an explanation, I just wonder why we weren't able to easily find the rebalances.
Very confident. Looking back at those logs, it's clear that the panicky replica should've already been GC-ed on the new node. When we were checking for the applied state assertion, we were comparing against the store's cluster version that was bumped past the migration that rolls over all replicas into using it.
Looking at the range log for those ranges, the other replicas of that range existed elsewhere with lease start times earlier than the node's start up timestamp. Basically, like this PR suggests, the migration infrastructure depends on the GC-able-but-not-yet-GC-ed replica to have been purged by then. It wasn't by accident, this replica was no longer part of the raft group.
I didn't really dig much further, but it wasn't the only range that exhibited this behavior. If this comes up again, I'll look in closer detail. In either 21.1 or 20.2 we're probably missing a log statement in some codepath.
Reads funny out of context haha. bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Fixes #58378.
Fixes #62267.
Previously it was possible for us to have replicas in-memory, with
pre-migrated state, even after a migration was finalized. This led to
the kind of badness we were observing in #62267, where it appeared that
a replica was not using the applied state key despite us having migrated
into it (see TruncatedAndRangeAppliedState, introduced in #58088).
To see how, consider the following set of events:
and then purging outdated replicas
evaluating anything on it (say a lease request) is unsound because
we've bumped version gates that tell the kvserver to always expect
post-migration state
didn't have a replica version, we'd skip over it (!)
have been accounted for when we migrated all ranges? No, that's precisely
why the migration infrastructure purge outdated replicas. The migrate
request only returns once its applied on all followers; in our example
that wouldn't include r42/3 since it was no longer one
should've been doing the opposite. When iterating over a store's
replicas for purging purposes, an empty replica version is fine and
expected; we should interpret that as signal that we're dealing with a
replica that was obviously never migrated (to even start using replica
versions in the first place). Because it didn't have a valid replica
version installed, we can infer that it's soon to be GC-ed (else we
wouldn't have been able to finalize the applied state + replica
version migration)
replicas with migration state out-of-date relative to the store's
version
Release note: None