Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: integration unit test for the replication flow control transition from v1 to v2. #130431

Closed
Tracked by #129276
kvoli opened this issue Sep 10, 2024 · 0 comments · Fixed by #130728
Closed
Tracked by #129276
Assignees
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Sep 10, 2024

There are currently no integration tests which exercise the transition of replication flow control from v1 to v2 (A-replication-admission-control-v2).

This issue is to add a suite of such tests.

Jira issue: CRDB-42049

Epic CRDB-37515

@blathers-crl blathers-crl bot added the X-blathers-untriaged blathers was unable to find an owner label Sep 10, 2024
@kvoli kvoli changed the title Integration unit test for the transition from v1 to v2. kvserver: integration unit test for the replication flow control transition from v1 to v2. Sep 10, 2024
@kvoli kvoli added T-kv KV Team A-replication-admission-control-v2 Related to introduction of replication AC v2 and removed X-blathers-untriaged blathers was unable to find an owner labels Sep 10, 2024
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Sep 10, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue Sep 20, 2024
Add `TestFlowControlV1ToV2Transition`, which ratchets up the enabled
version of replication flow control v2:

```
v1 protocol with v1 encoding =>
v2 protocol with v1 encoding =>
v2 protocol with v2 encoding
```

The test is structured to issue writes and wait for returned tokens
whenever the protocol transitions from v1 to v2, or a leader changes.

More specifically, the test takes the following steps:

```
(1) Start n1, n2, n3 with v1 protocol and v1 encoding.
(2) Upgrade n1 to v2 protocol with v1 encoding.
(3) Transfer the range lease to n2.
(4) Upgrade n2,n3 to v2 protocol with v1 encoding.
(5) Upgrade n1 to v2 protocol with v2 encoding.
(6) Transfer the range lease to n3.
(7) Upgrade n2,n3 to v2 protocol with v2 encoding.
```

Resolves: cockroachdb#130431
Release note: None
@kvoli kvoli self-assigned this Sep 20, 2024
@exalate-issue-sync exalate-issue-sync bot assigned kvoli and unassigned kvoli Sep 20, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue Sep 20, 2024
Add `TestFlowControlV1ToV2Transition`, which ratchets up the enabled
version of replication flow control v2:

```
v1 protocol with v1 encoding =>
v2 protocol with v1 encoding =>
v2 protocol with v2 encoding
```

The test is structured to issue writes and wait for returned tokens
whenever the protocol transitions from v1 to v2, or a leader changes.

More specifically, the test takes the following steps:

```
(1) Start n1, n2, n3 with v1 protocol and v1 encoding.
(2) Upgrade n1 to v2 protocol with v1 encoding.
(3) Transfer the range lease to n2.
(4) Upgrade n2,n3 to v2 protocol with v1 encoding.
(5) Upgrade n1 to v2 protocol with v2 encoding.
(6) Transfer the range lease to n3.
(7) Upgrade n2,n3 to v2 protocol with v2 encoding.
```

Resolves: cockroachdb#130431
Resolves: cockroachdb#129276
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Sep 24, 2024
The `Processor` calls `isLeaderUsingV2ProcLocked` to determine which
store work queue admit method to call, opting for the v1 method if
`isLeaderUsingV2ProcLocked` returns false.

Update `isLeaderUsingV2ProcLocked` to correctly return false when the
local replica is the leader and running v1, having previously seen a
leader running v2.

Part of: cockroachdb#130431
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Sep 24, 2024
Add `TestFlowControlV1ToV2Transition`, which ratchets up the enabled
version of replication flow control v2:

```
v1 protocol with v1 encoding =>
v2 protocol with v1 encoding =>
v2 protocol with v2 encoding
```

The test is structured to issue writes and wait for returned tokens
whenever the protocol transitions from v1 to v2, or a leader changes.

More specifically, the test takes the following steps:

```
(1) Start n1, n2, n3 with v1 protocol and v1 encoding.
(2) Upgrade n1 to v2 protocol with v1 encoding.
(3) Transfer the range lease to n2.
(4) Upgrade n2 to v2 protocol with v1 encoding.
(5) Upgrade n3 to v2 protocol with v1 encoding.
(5) Upgrade n1 to v2 protocol with v2 encoding.
(6) Transfer the range lease to n1.
(7) Upgrade n2,n3 to v2 protocol with v2 encoding.
(8) Transfer the range lease to n3.
Between each step, we issue writes, (un)block admission and observe the
flow control metrics and vtables.
```

Resolves: cockroachdb#130431
Resolves: cockroachdb#129276
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Sep 24, 2024
Add `TestFlowControlV1ToV2Transition`, which ratchets up the enabled
version of replication flow control v2:

```
v1 protocol with v1 encoding =>
v2 protocol with v1 encoding =>
v2 protocol with v2 encoding
```

The test is structured to issue writes and wait for returned tokens
whenever the protocol transitions from v1 to v2, or a leader changes.

More specifically, the test takes the following steps:

```
(1) Start n1, n2, n3 with v1 protocol and v1 encoding.
(2) Upgrade n1 to v2 protocol with v1 encoding.
(3) Transfer the range lease to n2.
(4) Upgrade n2 to v2 protocol with v1 encoding.
(5) Upgrade n3 to v2 protocol with v1 encoding.
(5) Upgrade n1 to v2 protocol with v2 encoding.
(6) Transfer the range lease to n1.
(7) Upgrade n2,n3 to v2 protocol with v2 encoding.
(8) Transfer the range lease to n3.
Between each step, we issue writes, (un)block admission and observe the
flow control metrics and vtables.
```

Resolves: cockroachdb#130431
Resolves: cockroachdb#129276
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Sep 25, 2024
The `Processor` calls `isLeaderUsingV2ProcLocked` to determine which
store work queue admit method to call, opting for the v1 method if
`isLeaderUsingV2ProcLocked` returns false.

Update `isLeaderUsingV2ProcLocked` to correctly return false when the
local replica is the leader and running v1, having previously seen a
leader running v2.

Part of: cockroachdb#130431
Release note: None
craig bot pushed a commit that referenced this issue Sep 25, 2024
130728: kvserver: add rac2 v1 integration tests r=sumeerbhola a=kvoli

1st commit from #130619.
2nd-3rd commits from #131106.
4th-5th commits from #131107.
6th-7th commits from #131108.
8th commit from #131109.

---
Introduce several tests in `flow_control_integration_test.go`, mirroring
the existing tests but applied to the replication flow control v2
machinery.

The tests largely follow an identical pattern to the existing v1 tests,
swapping in rac2 metrics and vtables.

The following tests are added:

```
TestFlowControlBasicV2
TestFlowControlRangeSplitMergeV2
TestFlowControlBlockedAdmissionV2
TestFlowControlAdmissionPostSplitMergeV2
TestFlowControlCrashedNodeV2
TestFlowControlRaftSnapshotV2
TestFlowControlRaftMembershipV2
TestFlowControlRaftMembershipRemoveSelfV2
TestFlowControlClassPrioritizationV2
TestFlowControlQuiescedRangeV2
TestFlowControlUnquiescedRangeV2
TestFlowControlTransferLeaseV2
TestFlowControlLeaderNotLeaseholderV2
TestFlowControlGranterAdmitOneByOneV2
```

These tests all have at least two variants:

```
V2EnabledWhenLeaderV1Encoding
V2EnabledWhenLeaderV2Encoding
```

When `V2EnabledWhenLeaderV1Encoding` is run, the tests use a different
testdata file, which has a `_v1_encoding` suffix. A separate file is
necessary because when the protocol enablement level is
`V2EnabledWhenLeaderV1Encoding`, all entries which are subject to
admission control are encoded as `raftpb.LowPri`, regardless of their
original priority, as we don't want to pay the cost to deserialize the
raft admission meta.

The v1 encoding variants retain the same comments as the v2 encoding,
however any comments referring to regular tokens should be interpreted
as elastic tokens instead, due to the above.

Two v1 tests are not ported over to v2:

```
TestFlowControlRaftTransportBreak
TestFlowControlRaftTransportCulled
```

These omitted tests behave identically to `TestFlowControlCrashedNodeV2`
as rac2 is less tightly coupled to the raft transport, instead operating
on replication states (e.g., `StateProbe`, `StateReplicate`).

--- 

Add `TestFlowControlV1ToV2Transition`, which ratchets up the enabled
version of replication flow control v2:

```
v1 protocol with v1 encoding =>
v2 protocol with v1 encoding =>
v2 protocol with v2 encoding
```

The test is structured to issue writes and wait for returned tokens
whenever the protocol transitions from v1 to v2, or a leader changes.

More specifically, the test takes the following steps:

```
(1) Start n1, n2, n3 with v1 protocol and v1 encoding.
(2) Upgrade n1 to v2 protocol with v1 encoding.
(3) Transfer the range lease to n2.
(4) Upgrade n2 to v2 protocol with v1 encoding.
(5) Upgrade n3 to v2 protocol with v1 encoding.
(6) Upgrade n1 to v2 protocol with v2 encoding.
(7) Transfer the range lease to n1.
(8) Upgrade n2,n3 to v2 protocol with v2 encoding.
(9) Transfer the range lease to n3.
```

Between each step, we issue writes, (un)block admission and observe the
flow control metrics and vtables.

Resolves: #130431
Resolves: #129276
Release note: None

131252: roachtest: port decommission/mixed-versions r=srosenberg,DarrylWong a=renatolabs

This commit ports the `decommission/mixed-versions` roachtest to use the `mixedversion` framework (instead of the old `newUpgradeTest` API). It also updates `acceptance/decommission-self` since both tests used shared functionality that needed to be updated. Prior to this commit, the acceptance test used the old upgrade test API even though it was not an upgrade test.

Fixes: #110531
Fixes: #110530

Release note: None

131364: upgrades: give test an additional core under remote exec r=rail a=rickystewart

This has been timing out.

Epic: none
Release note: None

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig craig bot closed this as completed in abbf477 Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant