Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission,replication: replication admission control for regular work #123509

Open
24 of 26 tasks
sumeerbhola opened this issue May 2, 2024 · 2 comments
Open
24 of 26 tasks
Assignees
Labels
A-admission-control A-kv-replication Relating to Raft, consensus, and coordination. A-replication-admission-control-v2 Related to introduction of replication AC v2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented May 2, 2024

The current replication AC is enabled only for elastic work, since it shapes traffic at the pace of the slowest replica store (which would be risky for regular (foreground) work). So regular work can (a) overload a store via follower writes, (b) break performance isolation by follower writes stealing tokens from work that needs to evaluate locally.

We desire replication AC to:

Internal design doc https://docs.google.com/document/d/1Qf6uteFRlbScLdWIrTfqgbrKRHfUsoMGatRLWYQgAi8

Tracked issues:

Jira issue: CRDB-38383

Epic CRDB-37515

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. A-admission-control labels May 2, 2024
Copy link

blathers-crl bot commented May 2, 2024

cc @cockroachdb/replication

@aadityasondhi
Copy link
Collaborator

@exalate-issue-sync exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 5, 2024
Informs cockroachdb#123509

Epic: none

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 6, 2024
Informs cockroachdb#123509

Epic: none

Release note: None
craig bot pushed a commit that referenced this issue Aug 6, 2024
128307: raftpb: add Priority for use in RACv2 r=kvoli,pav-kv a=sumeerbhola

Informs #123509

Epic: none

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
@kvoli kvoli added the A-replication-admission-control-v2 Related to introduction of replication AC v2 label Aug 13, 2024
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Sep 17, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Sep 18, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Sep 18, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
kvoli pushed a commit to sumeerbhola/cockroach that referenced this issue Sep 23, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
kvoli pushed a commit to sumeerbhola/cockroach that referenced this issue Sep 23, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
kvoli pushed a commit to sumeerbhola/cockroach that referenced this issue Sep 24, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
kvoli pushed a commit to sumeerbhola/cockroach that referenced this issue Sep 25, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
kvoli pushed a commit to sumeerbhola/cockroach that referenced this issue Sep 25, 2024
The new raftEventForReplica encapsulates all that is needed to handle the
RaftEvent, including (re)creating a replicaSendStream. The complexity is
encapsulated in the construction of this event, which also pays attention
to regressions, jumps, and other anomalies, all of which cause the
existing replicaSendStream (if any) to be recreated. The recreaton is a
significant simplification from the prototype, which was doing
unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since
there is a send-queue, we separately track the eval.tokensDeducted counts.
Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs cockroachdb#123509

Release note: None
craig bot pushed a commit that referenced this issue Sep 25, 2024
129547: kv,storage: add new OriginTimestamp option to ConditionalPut r=tbg a=stevendanna

To improve the performance of the logical replication ingestion
processor, we would like to be able to manually generate KV batches
rather than writing via SQL.  However, we also only want to write an
ingested row if it is clearly newer than an existing row.

Two new ConditionaPutRequest parameters allow for this:

- `OriginTimestamp`: When set, any existing value must be older than
  this timestamp. The time comparison is made against the
  OriginTimestamp field in the values MVCCValueHeader if it is set or
  the keys MVCCTimestamp if it isn't.

- `ShouldWinOriginTimestampTie`: When true, indicates that the
  proposed value should be accepted even the inbound OriginTimestamp is
  exactly equal to the existing value's timestamp.

Additionally, two new error states have been added to
ConditionFailedError:

- `OriginTimestampOlderThan`: An error with this value set indicates
  that the ConditionalPutRequest failed because its OriginTimestamp was
  too old.

- `HadNewerOriginTimestamp`: When set, this indicates that while the
  expected value did not match the existing value, the provided origin
  timestamp was newer.

The expectation is that callers who receive an error with
OriginTimestampOlderThan may abort their transaction and not attempt a
retry because their proposed value is too old.

A caller who gets an error with HadNewerOriginTimestamp may choose to
abort their transaction but retry with the value provided in the
ActualValue field. Note a value expectation mismatch is likely for
our proposed caller who is constructing its expected values using
rangefeed events from another table.

Epic: none
Release note: None

130908: rac2: send-queue tracking in push mode r=kvoli a=sumeerbhola

The new raftEventForReplica encapsulates all that is needed to handle the RaftEvent, including (re)creating a replicaSendStream. The complexity is encapsulated in the construction of this event, which also pays attention to regressions, jumps, and other anomalies, all of which cause the existing replicaSendStream (if any) to be recreated. The recreaton is a significant simplification from the prototype, which was doing unnecessarily complicated work in fixing up the state.

The send-queue tracking is simple, [indexToSend, nextRaftIndex). Since there is a send-queue, we separately track the eval.tokensDeducted counts. Tracker.UntrackGE is no longer needed, which is another simplification.

Epic: CRDB-37515

Informs #123509

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 14, 2024
Add `String()` and `SafeFormat()` methods for `RangeSendStreamStats` and
`ReplicaSendStreamStats`.

Part of: cockroachdb#123509
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 14, 2024
Previously, we would only `VEventf(ctx, 5, ...)` when a replica was
being excluded due to having a send queue, or not being in
`StateReplicate`.

Now also:
- `V(4)` log when a replica is included due to missing stats.
- `V(6)` log when a replica is included due to passing stats.

Part of: cockroachdb#123509
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 14, 2024
Add `String()` and `SafeFormat()` methods for `RangeSendStreamStats` and
`ReplicaSendStreamStats`.

Part of: cockroachdb#123509
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 14, 2024
Previously, we would only `VEventf(ctx, 5, ...)` when a replica was
being excluded due to having a send queue, or not being in
`StateReplicate`.

Now also:
- `V(4)` log when a replica is included due to missing stats.
- `V(6)` log when a replica is included due to passing stats.

Part of: cockroachdb#123509
Release note: None
craig bot pushed a commit that referenced this issue Oct 24, 2024
132603: allocatorimpl: vlog on all excl. repl due to catchup conditions  r=sumeerbhola a=kvoli

Previously, we would only `VEventf(ctx, 5, ...)` when a replica was
being excluded due to having a send queue, or not being in
`StateReplicate`.

Now also:
- `V(4)` log when a replica is included due to missing stats.
- `V(6)` log when a replica is included due to passing stats.

Also, stop shadowing the range stat declaration with the replica
stat declaration within the loop.

Part of: #123509
Release note: None

133144: log: add unsafeWrapper in log package r=aa-joshi a=aa-joshi

Previously, we are using `redact.Unsafe()` to mark a particular parameter as
unsafe. We are updating log redaction flow as part of CRDB-37533. This change
introduces `unsafeWrapper` which implements `SafeFormatter`. The wrapper is
introduced so that future work can properly infer whether objects which are
being logged are safe or unsafe. This is a `no-op` change which is necessary in
order to mark redaction operation from within CRDB.

Epic: CRDB-37533
Part of: CRDB-42344
Release note: None

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Akshay Joshi <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Oct 25, 2024
Add `String()` and `SafeFormat()` methods for `RangeSendStreamStats` and
`ReplicaSendStreamStats`.

Part of: #123509
Release note: None
blathers-crl bot pushed a commit that referenced this issue Oct 25, 2024
Previously, we would only `VEventf(ctx, 5, ...)` when a replica was
being excluded due to having a send queue, or not being in
`StateReplicate`.

Now also:
- `V(4)` log when a replica is included due to missing stats.
- `V(6)` log when a replica is included due to passing stats.

Part of: #123509
Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.

Release note (ops change):
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 30, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.
kvoli added a commit to kvoli/cockroach that referenced this issue Nov 26, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.
kvoli added a commit to kvoli/cockroach that referenced this issue Dec 4, 2024
With testing now in place, this commit changes the default cluster
setting value of `kvadmission.flow_control.mode` from
`apply_to_elastic`, to `apply_to_all`.

Now by default, regular work will be subject to replication admission
control, whereby a quorum will be allowed to proceed, queuing entries
to any non-quorum required replica, when send tokens are exhausted.

For more details, see cockroachdb#123509.

Resolves: cockroachdb#133838
Release note (performance improvement): Regular writes are now subject
to admission control by default, meaning that non-quorum required
replicas may not be told about new writes from the leader if they are
unable to keep up. This brings a large performance improvement during
instances where there is a large backlog of replication work towards a
subset of node(s), such as node restarts. The setting can be reverted to
the <=24.3 default by setting `kvadmission.flow_control.mode` to
`apply_to_elastic`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control A-kv-replication Relating to Raft, consensus, and coordination. A-replication-admission-control-v2 Related to introduction of replication AC v2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

5 participants