Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add e2e rac2 send queue functionality tests #132614

Open
Tracked by #133838
kvoli opened this issue Oct 14, 2024 · 2 comments
Open
Tracked by #133838

kvserver: add e2e rac2 send queue functionality tests #132614

kvoli opened this issue Oct 14, 2024 · 2 comments
Assignees
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Oct 14, 2024

For #128040 we modified the existing TestFlowControl.*V2 e2e integration tests to also have a pull-mode (send queue) variation.

This issue is to add new tests which purposefully exhaust send tokens and assert on the send queue formation, force flushing and token deduction to prevent send queue formation.

Jira issue: CRDB-43177

Epic CRDB-42900

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team A-replication-admission-control-v2 Related to introduction of replication AC v2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 14, 2024
@kvoli kvoli self-assigned this Oct 14, 2024
@kvoli kvoli changed the title kvserver: add e2e rac2 pull mode tests kvserver: add e2e rac2 send queue functionality tests Oct 14, 2024
@kvoli
Copy link
Collaborator Author

kvoli commented Oct 24, 2024

Removing the GA blocker, I will still backport these tests to v24.3 using a testing change justification.

These pertain to disabled by default behavior in v24.3, hence the removal.

@kvoli kvoli removed GA-blocker branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 24, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 29, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 29, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 29, 2024
...

TODO(kvoli): commit details. For now, run w/:

```bash
./dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueue --show-logs --rewrite
```

Resolves: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 29, 2024
...

TODO(kvoli): commit details. For now, run w/:

```bash
./dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueue --show-logs --rewrite
```

Resolves: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
...

TODO(kvoli): commit details. For now, run w/:

```bash
./dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueue --show-logs --rewrite
```

Resolves: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
...

TODO(kvoli): commit details. For now, run w/:

```bash
./dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueue --show-logs --rewrite
```

Resolves: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
...

TODO(kvoli): commit details. For now, run w/:

```bash
./dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueue --show-logs --rewrite
```

Resolves: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
...

TODO(kvoli): commit details. For now, run w/:

```bash
./dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueue --show-logs --rewrite
```

Resolves: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
...

TODO(kvoli): commit details. For now, run w/:

```bash
./dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueue --show-logs --rewrite
```

Resolves: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
Introduce `TestFlowControlSendQueue`, which exercises:

1. Send queue formation via send token exhaustion
2. Send queue force-flushing via a node disconnect
3. Send queue formation prevention via a testing knob to skip
   wait-for-eval

More specifically, the test has the following structure:

```sql
--   Start with three voters on [n1,n2,n3], where n1 is the leader+leaseholder.
--   Large regular write -16 MiB.
--   Allow admission [n1,n2].
--   Regular write -1 MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2].
--   - Metrics should reflect send queue formation on n3.
--   Stop n2.
--   Regular write -1 MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect n3 being force flushed.
--   Allow admission [n1,n2,n3].
--   Start n2.
--   Add n4, n5, the voters now are [n1,n2,n3,n4,n5].
--   Block admission [n4,n5] (already blocked)
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2,n3]
--   - Metrics should reflect send queue formation on n4,n5.
--   Unblock admission [n4,n5].
--   - Wait for tokens to be returned.
--   Block admission [n2,n3,n4,n5].
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect 2 streams being prevented from forming a send queue.
--   Allow admission [n1,n2,n3,n4,n5] (all).
--   Assert all tokens returned.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
Introduce `TestFlowControlSendQueue`, which exercises:

1. Send queue formation via send token exhaustion
2. Send queue force-flushing via a node disconnect
3. Send queue formation prevention via a testing knob to skip
   wait-for-eval

More specifically, the test has the following structure:

```sql
--   Start with three voters on [n1,n2,n3], where n1 is the leader+leaseholder.
--   Large regular write -16 MiB.
--   Allow admission [n1,n2].
--   - Tokens should be returned for n1 and n2.
--   Block admission [n2].
--   Regular write -1 MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2].
--   - Metrics should reflect send queue formation on n3.
--   Stop n2.
--   Regular write -1 MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect n3 being force flushed.
--   Allow admission [n1,n2,n3].
--   Start n2.
--   Add n4, n5, the voters now are [n1,n2,n3,n4,n5].
--   Block admission [n4,n5] (already blocked)
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2,n3]
--   - Metrics should reflect send queue formation on n4,n5.
--   Unblock admission [n4,n5].
--   - Wait for tokens to be returned.
--   Block admission [n2,n3,n4,n5].
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect 2 streams being prevented from forming a send queue.
--   Allow admission [n1,n2,n3,n4,n5] (all).
--   Assert all tokens returned.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 31, 2024
Introduce `TestFlowControlSendQueue`, which exercises:

1. Send queue formation via send token exhaustion
2. Send queue force-flushing via a node disconnect
3. Send queue formation prevention via a testing knob to skip
   wait-for-eval

More specifically, the test has the following structure:

```sql
--   Start with three voters on [n1,n2,n3], where n1 is the leader+leaseholder.
--   Large regular write -16 MiB.
--   Allow admission [n1,n2].
--   - Tokens should be returned for n1 and n2.
--   Block admission [n2].
--   Regular write -1 MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2].
--   - Metrics should reflect send queue formation on n3.
--   Stop n2.
--   Regular write -1 MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect n3 being force flushed.
--   Allow admission [n1,n2,n3].
--   Start n2.
--   Add n4, n5, the voters now are [n1,n2,n3,n4,n5].
--   Block admission [n4,n5] (already blocked)
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2,n3]
--   - Metrics should reflect send queue formation on n4,n5.
--   Unblock admission [n4,n5].
--   - Wait for tokens to be returned.
--   Block admission [n2,n3,n4,n5].
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect 2 streams being prevented from forming a send queue.
--   Allow admission [n1,n2,n3,n4,n5] (all).
--   Assert all tokens returned.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 31, 2024
Introduce `TestFlowControlSendQueue`, which exercises:

1. Send queue formation via send token exhaustion
2. Send queue force-flushing via a node disconnect
3. Send queue formation prevention via a testing knob to skip
   wait-for-eval

More specifically, the test has the following structure:

```sql
--   Start with three voters on [n1,n2,n3], where n1 is the leader+leaseholder.
--   Large regular write -16 MiB.
--   Allow admission [n1,n2].
--   - Tokens should be returned for n1 and n2.
--   Block admission [n2].
--   Regular write -1 MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2].
--   - Metrics should reflect send queue formation on n3.
--   Stop n2.
--   Regular write -1 MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect n3 being force flushed.
--   Allow admission [n1,n2,n3].
--   Start n2.
--   Add n4, n5, the voters now are [n1,n2,n3,n4,n5].
--   Block admission [n4,n5] (already blocked)
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2,n3]
--   - Metrics should reflect send queue formation on n4,n5.
--   Unblock admission [n4,n5].
--   - Wait for tokens to be returned.
--   Block admission [n2,n3,n4,n5].
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect 2 streams being prevented from forming a send queue.
--   Allow admission [n1,n2,n3,n4,n5] (all).
--   Assert all tokens returned.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 31, 2024
Introduce `TestFlowControlSendQueue`, which exercises:

1. Send queue formation via send token exhaustion
2. Send queue force-flushing via a node disconnect
3. Send queue formation prevention via a testing knob to skip
   wait-for-eval

More specifically, the test has the following structure:

```sql
--   Start with three voters on [n1,n2,n3], where n1 is the leader+leaseholder.
--   Large regular write -16 MiB.
--   Allow admission [n1,n2].
--   - Tokens should be returned for n1 and n2.
--   Block admission [n2].
--   Regular write -1 MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2].
--   - Metrics should reflect send queue formation on n3.
--   Stop n2.
--   Regular write -1 MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect n3 being force flushed.
--   Allow admission [n1,n2,n3].
--   Start n2.
--   Add n4, n5, the voters now are [n1,n2,n3,n4,n5].
--   Block admission [n4,n5] (already blocked)
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2,n3]
--   - Metrics should reflect send queue formation on n4,n5.
--   Unblock admission [n4,n5].
--   - Wait for tokens to be returned.
--   Block admission [n2,n3,n4,n5].
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect 2 streams being prevented from forming a send queue.
--   Allow admission [n1,n2,n3,n4,n5] (all).
--   Assert all tokens returned.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 31, 2024
Introduce `TestFlowControlSendQueue`, which exercises:

1. Send queue formation via send token exhaustion
2. Send queue force-flushing via a node disconnect
3. Send queue formation prevention via a testing knob to skip
   wait-for-eval

More specifically, the test has the following structure:

```sql
--   Start with three voters on [n1,n2,n3], where n1 is the leader+leaseholder.
--   Large regular write -16 MiB.
--   Allow admission [n1,n2].
--   - Tokens should be returned for n1 and n2.
--   Block admission [n2].
--   Regular write -1 MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2].
--   - Metrics should reflect send queue formation on n3.
--   Stop n2.
--   Regular write -1 MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect n3 being force flushed.
--   Allow admission [n1,n2,n3].
--   Start n2.
--   Add n4, n5, the voters now are [n1,n2,n3,n4,n5].
--   Block admission [n4,n5] (already blocked)
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2,n3]
--   - Metrics should reflect send queue formation on n4,n5.
--   Unblock admission [n4,n5].
--   - Wait for tokens to be returned.
--   Block admission [n2,n3,n4,n5].
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect 2 streams being prevented from forming a send queue.
--   Allow admission [n1,n2,n3,n4,n5] (all).
--   Assert all tokens returned.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 31, 2024
This commit introduces two testing knobs:

```
// OverrideBypassAdmitWaitForEval is used to override the behavior of the
// WaitForEval. When set to true, WaitForEval will return immediately and
// pretend that the request was admitted. Otherwise, when set to false, or
// unset, WaitForEval will behave normally.
OverrideBypassAdmitWaitForEval func() bool
// OverrideAlwaysRefreshSendStreamStats is used to override the behavior of
// the send stream stats refresh. When set to true, the send stream stats
// will always be refreshed on a HandleRaftEventRaftMuLocked call. Otherwise,
// when set to false, the default behavior will be used.
OverrideAlwaysRefreshSendStreamStats bool
```

Also, thread send queue precise size adjustments via a method which also
asserts that the size is consistent between the count and bytes.

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 31, 2024
1. Send queue formation via send token exhaustion
2. Send queue force-flushing via a node disconnect
3. Send queue formation prevention via a testing knob to skip
   wait-for-eval

More specifically, the test has the following structure:

```sql
--   Start with three voters on [n1,n2,n3], where n1 is the leader+leaseholder.
--   Large regular write -16 MiB.
--   Allow admission [n1,n2].
--   - Tokens should be returned for n1 and n2.
--   Block admission [n2].
--   Regular write -1 MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2].
--   - Metrics should reflect send queue formation on n3.
--   Stop n2.
--   Regular write -1 MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect n3 being force flushed.
--   Allow admission [n1,n2,n3].
--   Start n2.
--   Add n4, n5, the voters now are [n1,n2,n3,n4,n5].
--   Block admission [n4,n5] (already blocked)
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Shouldn't be blocked on wait-for-eval because of quorum [n1,n2,n3]
--   - Metrics should reflect send queue formation on n4,n5.
--   Unblock admission [n4,n5].
--   - Wait for tokens to be returned.
--   Block admission [n2,n3,n4,n5].
--   Regular write -16 MiB.
--   Regular write -1  MiB.
--   - Blocks on wait-for-eval, however the test bypasses this instance.
--   - Metrics should reflect 2 streams being prevented from forming a send queue.
--   Allow admission [n1,n2,n3,n4,n5] (all).
--   Assert all tokens returned.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 18, 2024
Add a flow control integration test
`TestFlowControlSendQueueRangeRelocate`. The test has 6 variations,
either transferring or not transferring the lease ontop:

```
We three relocate variations (*=lh,^=send_queue):
- [n1*,n2 ,n3^,n4 ,n5] -> [n2 ,n3^,n4 ,n5 ,n6*] (transfer_lease)
  - The leader and leaseholder is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n4 ,n5 ,n6 ]
  - The replica with a send queue is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n3^,n4 ,n6 ]
  - The replica without a send queue is relocated.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 18, 2024
Add a flow control integration test
`TestFlowControlSendQueueRangeRelocate`. The test has 6 variations,
either transferring or not transferring the lease on-top:

```
We use three relocate variations (*=lh,^=send_queue):
- [n1*,n2 ,n3^,n4 ,n5] -> [n2 ,n3^,n4 ,n5 ,n6*] (transfer_lease)
  - The leader and leaseholder is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n4 ,n5 ,n6 ]
  - The replica with a send queue is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n3^,n4 ,n6 ]
  - The replica without a send queue is relocated.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 19, 2024
Add a new RACv2 integration test:
`TestFlowControlSendQueueManyInflight`, which attempts to do the
following:

```sql
-- We will exhaust the tokens across all streams while admission is blocked,
-- using a single 16 MiB (deduction, the write itself is small) write. Then,
-- we will write a thousand or so entries (@ 4KiB deduction) which should be
-- queued towards one of the replica send streams, while the other has a send
-- queue prevented from forming. Lastly, we will unblock admission and stress
-- the raft in-flights tracker as the queue is drained.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 19, 2024
Add a flow control integration test
`TestFlowControlSendQueueRangeRelocate`. The test has 6 variations,
either transferring or not transferring the lease on-top:

```
We use three relocate variations (*=lh,^=send_queue):
- [n1*,n2 ,n3^,n4 ,n5] -> [n2 ,n3^,n4 ,n5 ,n6*] (transfer_lease)
  - The leader and leaseholder is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n4 ,n5 ,n6 ]
  - The replica with a send queue is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n3^,n4 ,n6 ]
  - The replica without a send queue is relocated.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 25, 2024
Add a new RACv2 integration test:
`TestFlowControlSendQueueManyInflight`, which attempts to do the
following:

```sql
-- We will exhaust the tokens across all streams while admission is blocked,
-- using a single 16 MiB (deduction, the write itself is small) write. Then,
-- we will write a thousand or so entries (@ 4KiB deduction) which should be
-- queued towards one of the replica send streams, while the other has a send
-- queue prevented from forming. Lastly, we will unblock admission and stress
-- the raft in-flights tracker as the queue is drained.
```

Part of: cockroachdb#132614
Release note: None
craig bot pushed a commit that referenced this issue Nov 25, 2024
135331: kvserver: add TestFlowControlSendQueueManyInflight test r=sumeerbhola a=kvoli

Add a new RACv2 integration test:
`TestFlowControlSendQueueManyInflight`, which attempts to do the following:

```sql
-- We will exhaust the tokens across all streams while admission is blocked,
-- using a single 16 MiB (deduction, the write itself is small) write. Then,
-- we will write a thousand or so entries (@ 4KiB deduction) which should be
-- queued towards one of the replica send streams, while the other has a send
-- queue prevented from forming. Lastly, we will unblock admission and stress
-- the raft in-flights tracker as the queue is drained.
```

Part of: #132614
Release note: None

135758: sql/schemachanger: make create sequence revertible r=fqazi a=fqazi

Previously, the concept of reversibility was constant and decided inside the declarative schema changer opgen. While this works fine for simple single statement transactions more complex schema changes involving multiple objects may have different parameters. For example ADD COLUMN SERIAL needs to be able to revert plans after a sequence is created, since that sequence will not be accessible. To help support this patch does the following

1) Revertibility can now be either static or determined by the elements
   within a given plan.
2) Adding Column elements is now considered revertible if the object is
   newly created.
3) Sequence creation is now always considered revertible, since the
   object cannot be for other schema changes until the full transaction
   is complete.

Informs: #126900

Release note: None

135842: storage: bump table format of backup sstables r=itsbilal a=jbowens

**storage: remove SSTWriter.supportsRangeKeys**

Remove the SSTWriter's supportsRangeKeys field. All table formats in use
support range keys.

Epic: none
Release note: none

**storage: bump table format of backup sstables**

Bump the default table format of backup sstables from Pebblev2 to Pebblev4.
Additionally, when a cluster has columnar blocks enabled, allow backup sstables
to be constructed in Pebblev5 which uses column-oriented sstable blocks.

Close #135730.
Epic: [CRDB-37482](https://cockroachlabs.atlassian.net/browse/CRDB-37482)
Release note: none


135970: stats: fix broken memory accounting in SampleReservoir r=yuzefovich a=yuzefovich

This commit fixes the memory accounting that is done in the SampleReservoir in an edge case. In particular, previously, in `copyRow` we would update the destination row (which might be the one coming from the "samples" slice) directly when processing each datum one at a time; if we happen to reach the memory limit in the middle of the row, we would have a "corrupt" row stored in the samples, which later on could lead to incorrect adjustment of the memory account when evicting that row. In fact, we had a comment warning about this situation, but we probably didn't appreciate the memory accounting drift that could fail the stats collection job in extreme cases (and would trigger the "no bytes in account to release" sentry report).

This problem is now fixed by first calculating the "before" and "after" sizes of the modified row via copying it into the staging scratch area, then asking for the necessary memory reservation, and only if that is approved copying the row into the destination. This way at every point in time the memory accounting is precise. This extra copy per sampled row should be negligible in the grand scheme of things. Additionally, I don't think we need to clear out the scratch row after each call because the datums will be overwritten on the next call to `copyRow`, so we'll lose the references to datums shortly (if there is no next call, then the last row is kept in the samples, so we just double the count of references to the datums by not clearing the scratch).

Fixes: #128241.

Release note (bug fix): Table statistics collection in CockroachDB could previously run into `no bytes in account to release` errors in some edge cases (when the SQL memory budget, configured via `--max-sql-memory` flag, was close to being exhausted). The bug has been present since 21.2 and is now fixed.

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 26, 2024
Add a flow control integration test
`TestFlowControlSendQueueRangeRelocate`. The test has 6 variations,
either transferring or not transferring the lease on-top:

```
We use three relocate variations (*=lh,^=send_queue):
- [n1*,n2 ,n3^,n4 ,n5] -> [n2 ,n3^,n4 ,n5 ,n6*] (transfer_lease)
  - The leader and leaseholder is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n4 ,n5 ,n6 ]
  - The replica with a send queue is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n3^,n4 ,n6 ]
  - The replica without a send queue is relocated.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 26, 2024
Add a flow control integration test
`TestFlowControlSendQueueRangeRelocate`. The test has 6 variations,
either transferring or not transferring the lease on-top:

```
We use three relocate variations (*=lh,^=send_queue):
- [n1*,n2 ,n3^,n4 ,n5] -> [n2 ,n3^,n4 ,n5 ,n6*] (transfer_lease)
  - The leader and leaseholder is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n4 ,n5 ,n6 ]
  - The replica with a send queue is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n3^,n4 ,n6 ]
  - The replica without a send queue is relocated.
```

Part of: cockroachdb#132614
Release note: None
craig bot pushed a commit that referenced this issue Nov 26, 2024
135570: kvserver: add TestFlowControlSendQueueRangeRelocate test r=sumeerbhola a=kvoli

Add a flow control integration test
`TestFlowControlSendQueueRangeRelocate`. The test has 6 variations, either transferring or not transferring the lease ontop:

```
We three relocate variations (*=lh,^=send_queue):
- [n1*,n2 ,n3^,n4 ,n5] -> [n2 ,n3^,n4 ,n5 ,n6*] (transfer_lease)
  - The leader and leaseholder is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n4 ,n5 ,n6 ]
  - The replica with a send queue is relocated.
- [n1*,n2 ,n3^,n4 ,n5] -> [n1*,n2 ,n3^,n4 ,n6 ]
  - The replica without a send queue is relocated.
```

Part of: #132614
Release note: None

Co-authored-by: Austen McClernon <[email protected]>
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 26, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeSplitMerge`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, split it, write a 1 MiB put to the
-- RHS range, merge the ranges, and write a 1 MiB put to the merged range. We
-- expect that at each stage where a send queue develops n1->s3, the send queue
-- will be flushed by the range merge and range split range operations.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 27, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeMigrate`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, migrate the range, and write a 1 MiB
-- put to the migrated range. We expect that the migration will trigger a force
-- flush of the send queue.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 27, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeMigrate`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, migrate the range, and write a 1 MiB
-- put to the migrated range. We expect that the migration will trigger a force
-- flush of the send queue.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 27, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeSplitMerge`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, split it, write a 1 MiB put to the
-- RHS range, merge the ranges, and write a 1 MiB put to the merged range. We
-- expect that at each stage where a send queue develops n1->s3, the send queue
-- will be flushed by the range merge and range split range operations.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 27, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeMigrate`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, migrate the range, and write a 1 MiB
-- put to the migrated range. We expect that the migration will trigger a force
-- flush of the send queue.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeSplitMerge`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, split it, write a 1 MiB put to the
-- RHS range, merge the ranges, and write a 1 MiB put to the merged range. We
-- expect that at each stage where a send queue develops n1->s3, the send queue
-- will be flushed by the range merge and range split range operations.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeMigrate`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, migrate the range, and write a 1 MiB
-- put to the migrated range. We expect that the migration will trigger a force
-- flush of the send queue.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeMigrate`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, migrate the range, and write a 1 MiB
-- put to the migrated range. We expect that the migration will trigger a force
-- flush of the send queue.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeSplitMerge`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, split it, write a 1 MiB put to the
-- RHS range, merge the ranges, and write a 1 MiB put to the merged range. We
-- expect that at each stage where a send queue develops n1->s3, the send queue
-- will be flushed by the range merge and range split range operations.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
This commit introduces a new field on `kvflowinspectpb.Handle`,
`ForceFlushIndex`.

`ForceFlushIndex` is an index up to (and including) which the range
controller running in pull mode must force-flush all send streams.

This is intended for use in live debugging and asserting on state in
testing.

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add a new column `force_flush_index` to
`crdb_internal.kv_flow_control_handles_v2`.

The column value represents an index up to (and including) which the
range controller running in pull mode must force-flush all send streams.

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add new echo testdata assertions on the `force_flush_index` of the test
ranges pre-split, post-split and post-merge, in
`TestFlowControlSendQueueRangeSplitMerge`.

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add a new rac2 flow control integration test,
`TestFlowControlSendQueueRangeMigrate`.

This test takes the following steps:

```sql
-- We will exhaust the tokens across all streams while admission is blocked on
-- n3, using a single 4 MiB (deduction, the write itself is small) write. Then,
-- we will write a 1 MiB put to the range, migrate the range, and write a 1 MiB
-- put to the migrated range. We expect that the migration will trigger a force
-- flush of the send queue.
```

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
This commit introduces a new field on `kvflowinspectpb.Handle`,
`ForceFlushIndex`.

`ForceFlushIndex` is an index up to (and including) which the range
controller running in pull mode must force-flush all send streams.

This is intended for use in live debugging and asserting on state in
testing.

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add a new column `force_flush_index` to
`crdb_internal.kv_flow_control_handles_v2`.

The column value represents an index up to (and including) which the
range controller running in pull mode must force-flush all send streams.

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add new echo testdata assertions on the `force_flush_index` of the test
ranges pre-split, post-split and post-merge, in
`TestFlowControlSendQueueRangeSplitMerge`.

Part of: cockroachdb#132614
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 2, 2024
Add new echo testdata assertions on the force_flush_index of the test
ranges pre-migrate and post-migrate in
`TestFlowControlSendQueueRangeMigrate`.

Part of: cockroachdb#132614
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-replication-admission-control-v2 Related to introduction of replication AC v2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants