Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterversion: introduce replication admission control v2 cluster versions #131102

Closed
kvoli opened this issue Sep 20, 2024 · 0 comments · Fixed by #131106
Closed

clusterversion: introduce replication admission control v2 cluster versions #131102

kvoli opened this issue Sep 20, 2024 · 0 comments · Fixed by #131106
Assignees
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Sep 20, 2024

Add the two cluster versions described in the quorum replication flow control design:

While there is the possibility that some leader could run v1, there must be no v2 encoded entries in the Raft log, since the v1 implementation (even in v24.3) assumes v1 encoded entries). So the switch from v1 => v2 will employ two cluster versions: useRACv2withV1EntryEncoding, useRACv2Full.

The transition we have described above is when the leader sees cluster version useRACv2withV1EntryEncoding. As indicated by the name, the entry encoding will continue to be v1, and the cluster setting (discussed later) will be ignored, i.e., v2 is only applied to elastic traffic, and push mode is being used. The leader uses the v2 protocol.

When the leader sees cluster version useRACv2Full, it starts using v2 encoding since any future leader is also guaranteed to use the v2 protocol (the cluster setting also becomes relevant). Note that a future leader can be at cluster version useRACv2withV1EntryEncoding, and would use v1 encoding and the v2 protocol. That is, after some v2 encoded entries have been added to the raft log, there can be v1 encoded entries added.

Jira issue: CRDB-42374

Epic CRDB-37515

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team A-replication-admission-control-v2 Related to introduction of replication AC v2 labels Sep 20, 2024
@kvoli kvoli self-assigned this Sep 20, 2024
@kvoli kvoli changed the title clusterversion: introduce replication admission control v2 cluster settings clusterversion: introduce replication admission control v2 cluster versions Sep 20, 2024
@exalate-issue-sync exalate-issue-sync bot changed the title clusterversion: introduce replication admission control v2 cluster versions clusterversion: introduce replication admission control v2 cluster settings Sep 20, 2024
@exalate-issue-sync exalate-issue-sync bot changed the title clusterversion: introduce replication admission control v2 cluster settings clusterversion: introduce replication admission control v2 cluster versions Sep 20, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue Sep 20, 2024
Introduce two new cluster version gates:

```
V24_3_UseRACV2WithV1EntryEncoding
V24_3_UseRACV2Full
```

Upon a range leader first encountering
`V24_3_UseRACV2WithV1EntryEncoding` via `handleRaftReadyRaftMuLocked`,
it will begin a new term using the replication flow control v2 protocol,
creating a `RangeController` but continue using the v1 entry encoding
and raft still operating in push mode.

Upon a range leader first encountering `V24_3_UseRACV2Full`, it will continue
using the replication flow control v2 protocol, but will now switch to
using the V2 entry encoding.

Note that the necessary protocol migration at the leader, (base) =>
`V24_3_UseRACV2WithV1EntryEncoding` occurs before any other calls in
`handleRaftReadyRaftMuLocked`.

The two version gates are necessary to ensure there are never v2 encoded
entries in the raft log while there is a possibility of a leader running
v1.

Resolves: cockroachdb#131102
Release note: None
craig bot pushed a commit that referenced this issue Sep 24, 2024
131106: clusterversion: introduce rac2 cluster version gates r=sumeerbhola a=kvoli

Introduce two new cluster version gates:

```
V24_3_UseRACV2WithV1EntryEncoding
V24_3_UseRACV2Full
```

Upon a range leader first encountering
`V24_3_UseRACV2WithV1EntryEncoding` via `handleRaftReadyRaftMuLocked`,
it will begin a new term using the replication flow control v2 protocol,
creating a `RangeController` but continue using the v1 entry encoding
and raft still operating in push mode.

Upon a range leader first encountering `V24_3_UseRACV2Full`, it will continue
using the replication flow control v2 protocol, but will now switch to
using the V2 entry encoding.

Note that the necessary protocol migration at the leader, (base) =>
`V24_3_UseRACV2WithV1EntryEncoding` occurs before any other calls in
`handleRaftReadyRaftMuLocked`.

The two version gates are necessary to ensure there are never v2 encoded
entries in the raft log while there is a possibility of a leader running
v1.

---

Move `EnabledWhenLeaderLevel` from `replica_rac2` to the parent package
`kvflowcontrol` and rename `V2EnabledWhenLeaderLevel` to reflect the
move to a shared v1/v2 package.

Also move the corresponding function `racV2EnabledWhenLeaderLevel` to
`kvflowcontrol`. `GetV2EnabledWhenLeaderLevel` will check if there are
testing knob overrides for the enabled level, and if not continue
returning `V2NotEnabledWhenLeader`. Some commentary and todos are also
left around this function, for when we enable the protocol and
separately, pull mode.

Resolves: #131102
Release note: None

131231: backupccl: fill incremental cluster id on alter schedule r=msbutler a=kev-cao

When altering the recurrence of a singleton full backup schedule created before v23.2, the corresponding new incremental schedule creation will fail due to a missing cluster ID. This patch ensures that the cluster ID is set when creating the incremental.

Fixes: #131127

Release note (bug fix): Fixed a bug introduced in v23.2.0 where creating a new incremental schedule via `ALTER SCHEDULE` on a full backup schedule created on an older version would fail.

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Kevin Cao <[email protected]>
@craig craig bot closed this as completed in f455bc3 Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant