Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add ReplicaID to the persistent raft state #75740

Closed
sumeerbhola opened this issue Jan 31, 2022 · 2 comments
Closed

kvserver: add ReplicaID to the persistent raft state #75740

sumeerbhola opened this issue Jan 31, 2022 · 2 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Jan 31, 2022

We currently do not store the ReplicaID when instantiating an uninitialized replica. We do know the ReplicaID when creating the corresponding uninitialized Replica object, since we pass it to newUnloadedReplica. Without the ReplicaID, if the node were to crash before the replica were initialized, the restarting node will have HardState without any knowledge of what ReplicaID it corresponds to. This creates difficulties with the ReplicaStorage design and can create a leak (see details below).

(copy-paste of conversation)
[sumeer] while implementing ReplicasStorage.Init I realized something I had missed earlier: for an UninitializedStateMachine replica, there is nothing in the engine that contains its ReplicaID.

  • We happen to luck out that we don’t need the ReplicaID for such a replica on the SplitReplica path (when this RangeID is the RHS of a split), since we never compare the ReplicaID of the RHS in the split with that of the uninitialized replica (we only compare the former with the RangeTombstone).
  • But this situation is peculiar in that after Init , ReplicasStorage has incomplete knowledge and would need a higher layer to provide this ReplicaID. How would a higher layer know the ReplicaID if there is no persistent record of it? Presumably, it would need to talk to other nodes before it figures this out, which means there could be an arbitrary time interval for which ReplicasStorage has 2 kinds of uninitialized replicas — those with a ReplicaID and those without.
  • Is there also a leak here? If the raft group has already removed this replica from this store, there isn’t going to be any raft activity. Someone needs to figure that out before we can clean up this state (i.e., we can’t unilaterally remove the HardState since the Term and Vote values may be relevant).

[tbg] Yeah, there’s always been something odd about the replicaID not getting stored. There’s definitely a leak, if you have an uninitialized replica and the node restarts, nothing will ever instantiate this replica in-memory again (unless it receives another message, which is doubtful at that point) but also if we did, we wouldn’t know the replicaID. So even if we tried, we couldn’t destroy this state, as we can’t ever be sure that it’s not still needed. This is definitely leaking today.
Seems kind of clear that we want the replicaID to be stored. Seems like it would have to be on the raft engine, parallel to the HardState (i.e. store ReplicaID whenever it changes, which should imply doing it on the first write to the HardState and never again since replicas can’t exist across replicaIDs).

cc: @tbg, @erikgrinaker

Epic: CRDB-220
Jira issue: CRDB-12811

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Jan 31, 2022
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Feb 1, 2022
The RaftReplicaIDKey is an unreplicated range-id local key that
contains the ReplicaID of the replica whose HardState is represented
in the RaftHardStateKey. These two keys share the same lifetime,
and are removed atomically when we clear the range-id local keys
for a replica.

We currently do not utilize this information on node restart
to figure out whether we should cleanup stale uninitialized replicas.
Doing such cleanup can wait until we implement and start using
ReplicasStorage. The change here is meant to set us up to rely
on RaftReplicaID from the next release onwards.

Informs cockroachdb#75740

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Feb 3, 2022
The RaftReplicaIDKey is an unreplicated range-id local key that
contains the ReplicaID of the replica whose HardState is represented
in the RaftHardStateKey. These two keys share the same lifetime,
and are removed atomically when we clear the range-id local keys
for a replica.

We currently do not utilize this information on node restart
to figure out whether we should cleanup stale uninitialized replicas.
Doing such cleanup can wait until we implement and start using
ReplicasStorage. The change here is meant to set us up to rely
on RaftReplicaID from the next release onwards.

Informs cockroachdb#75740

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Feb 4, 2022
The RaftReplicaIDKey is an unreplicated range-id local key that
contains the ReplicaID of the replica whose HardState is represented
in the RaftHardStateKey. These two keys are removed atomically when
we clear the range-id local keys for a replica. See
store_create_replica.go for a detailed comment on correctness
and version compatibility.

We currently do not utilize this information on node restart
to figure out whether we should cleanup stale uninitialized replicas.
Doing such cleanup can wait until we implement and start using
ReplicasStorage. The change here is meant to set us up to rely
on RaftReplicaID from the next release onwards.

Informs cockroachdb#75740

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Feb 4, 2022
The RaftReplicaIDKey is an unreplicated range-id local key that
contains the ReplicaID of the replica whose HardState is represented
in the RaftHardStateKey. These two keys are removed atomically when
we clear the range-id local keys for a replica. See
store_create_replica.go for a detailed comment on correctness
and version compatibility.

We currently do not utilize this information on node restart
to figure out whether we should cleanup stale uninitialized replicas.
Doing such cleanup can wait until we implement and start using
ReplicasStorage. The change here is meant to set us up to rely
on RaftReplicaID from the next release onwards.

Informs cockroachdb#75740

Release note: None
craig bot pushed a commit that referenced this issue Feb 4, 2022
75761: keys,kvserver: introduce RaftReplicaID r=tbg a=sumeerbhola

The RaftReplicaIDKey is an unreplicated range-id local key that
contains the ReplicaID of the replica whose HardState is represented
in the RaftHardStateKey. These two keys are removed atomically when
we clear the range-id local keys for a replica. See
store_create_replica.go for a detailed comment on correctness
and version compatibility.

We currently do not utilize this information on node restart
to figure out whether we should cleanup stale uninitialized replicas.
Doing such cleanup can wait until we implement and start using
ReplicasStorage. The change here is meant to set us up to rely
on RaftReplicaID from the next release onwards.

Informs #75740

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
@tbg
Copy link
Member

tbg commented Mar 4, 2022

What's left here @sumeerbhola?

@sumeerbhola
Copy link
Collaborator Author

The writing of ReplicaID is done.

  • Using it in ReplicasStorage is of course not, and we don't need a specific issue for that since the implementation of ReplicaStorage.Init was blocked on this.
  • Once we implement ReplicasStorage, it will know the (rangeID, replicaID) of all the uninitialized replicas. Hooking that up to some cleanup mechanism will then be viable. We may want to have a separate issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

3 participants