Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: replan rangefeeds with chronic closed ts lag #8

Merged
merged 397 commits into from
Dec 19, 2024

Conversation

mohini-crl
Copy link
Owner

When a rangefeed's closed timestamp lags behind the current time, any writes that have occurred in-between will not be emitted. This is problematic in cases where the lag is significant and chronic, as consumers (changefeeds, logical data replication, physical cluster replication) are likewise delayed in their processing. Observing a rangefeed with a chronic lagging closed timestamp will become relatively more likely with quorum replication flow control, as entries are deliberately queued, instead of being sent, to stores which do not have sufficient send tokens.

This commit (re)introduces the concept of cancelling lagging rangefeeds, so that they may be replanned and retried on another replica. The other replica may also have this issue, however there should be at least a quorum of voting replicas with a similar closed timestamp that would be suitable.

The replanning on a different replica is handled already by existing machinery. This commit introduces an observer which generates a signal indicating that the rangefeed should be cancelled. The signal also encapsulates the existing logic to nudge a rangefeed as well.

The criteria for cancelling a rangefeed is influenced by two thresholds, defined as cluster settings:

kv.rangefeed.lagging_closed_timestamp_cancel_multiple
(default = 20 x closed ts target duration = 60s)
kv.rangefeed.lagging_closed_timestamp_cancel_min_lagging_duration
(default = 60s)

When a replica's closed timestamp has sustained lag greater than:

kv.rangefeed.lagging_closed_timestamp_cancel_multiple * kv.closed_timestamp.target_duration

For at least:

`kv.rangefeed.lagging_closed_timestamp_cancel_min_lagging_duration`

duration, the rangefeed will be cancelled and then re-planned on the client. This can be visualized in the following diagram, where there is an initial spike over the lag threshold, which is recovered from so the rangefeed wouldn't be cancelled. The second drop below the lag threshold is sustained for greater than the duration threshold, so the rangefeed is then cancelled for replanning:

lag=0          ─────────────────────────────────────────────────────

observed lag   ─────────┐
                        │
                        │
                        │     ┌───────┐
lag threshold  ─────────┼─────┼───────┼──────────────────────────────
                        │     │       └───┐
                        │     │           └─────┐
                        └─────┘                 └──────┐
                                                       └────────────
                                      ◄────────────────────────────►
                                         exceeds duration threshold

Note we could also prevent accepting a rangefeed registration if the lag were sufficient, however the behavior change here applies only to lag which as been observed to be sustained over time, without historical data, we cannot apply identical decision logic on registration.

Fixes: cockroachdb#136214
Release note: None

andy-kimball and others added 30 commits December 6, 2024 16:46
inMemoryLock wraps a read-write lock, adding support for reentrancy and
ownership tracking. This will be used to provide per-partition locking
for the InMemoryStore.

Epic: CRDB-42943

Release note: None
Previously, the in-memory store only supported locking the entire store
using a read-write lock. This meant that split/merge transactions could
never run concurrently with insert or search transactions. In addition,
even insert and search operations were serialized with a mutex.

This PR adds per-partition locking capability to the in-memory store. Now,
each partition has its own read-write lock. Split, merge, and insert
operations acquire a write lock before modifying a partition. Search
operations can run in parallel on a partition.

While only one split or merge operation can run at a time, it can run in
parallel with insert and search operations. Because of this, and because
insert and search operations acquire at most one partition lock at a time,
deadlocks are not possible.

Epic: CRDB-42943

Release note: None
136784: vecstore: add per-partition locking to InMemoryStore r=drewkimball,mw5h a=andy-kimball

Previously, the in-memory store only supported locking the entire store
using a read-write lock. This meant that split/merge transactions could
never run concurrently with insert or search transactions. In addition,
even insert and search operations were serialized with a mutex.

This PR adds per-partition locking capability to the in-memory store. Now,
each partition has its own read-write lock. Split, merge, and insert
operations acquire a write lock before modifying a partition. Search
operations can run in parallel on a partition.

While only one split or merge operation can run at a time, it can run in
parallel with insert and search operations. Because of this, and because
insert and search operations acquire at most one partition lock at a time,
deadlocks are not possible.

Epic: CRDB-42943

Release note: None


Co-authored-by: Andrew Kimball <[email protected]>
136809: CODEOWNERS: fine tune roachtest package-level ownership r=tbg,rafiss,herkolategan a=srosenberg

Previously, `/pkg/cmd/roachtest/` was being matched as the catch-all, resulting in noisy notifications. Now, testeng gets notified only when the actual framework code is updated.

Epic: none

Release note: None

Co-authored-by: Stan Rosenberg <[email protected]>
Remove 24.3 gates and corresponding code that is now obsolete.

Epic: none
Release note: None
136076: sql: check for multiple mutations to the same table by triggers r=DrewKimball a=DrewKimball

#### sql: refactor some cascade/trigger logic

This commit refactors some of the logic shared between cascades and
AFTER triggers. This will make the following commit easier to understand.

Epic: None

Release note: None

#### sql: check for multiple mutations to the same table by triggers

There are currently some situations where a query that modifies the
same table in multiple locations may cause index corruption (cockroachdb#70731).
To avoid this, we disallow query structures that may lead to a problematic
combination of mutations.

Triggers require special handling to make this check work, because they
can execute arbitrary SQL statements, which can mutate a table directly
or through routines, FK cascades, or other triggers. BEFORE triggers on
the main query "just work" because they are built as UDF invocations as
part of the main query. AFTER triggers and BEFORE triggers fired on
cascades are more difficult, because they are planned lazily only if
the post-query has rows to process.

This commit adds logic to track invalid mutations for both types of
triggers. We now propagate the "ancestor" mutated tables whenever planning
a post-query, so that any triggers planned as part of the post-query
can detect conflicting mutations. See the "After Triggers" section in
`statement_tree.go` for additional explanation.

Informs cockroachdb#70731

Release note (bug fix): Previously, it was possible to cause index
corruption using AFTER-triggers that fire within a routine. In order
for the bug to manifest, both the AFTER-trigger and the statement that
invokes the routine must mutate the same row of a table with a mutation
other than `INSERT`.

Co-authored-by: Drew Kimball <[email protected]>
The test was written in 2015 and was providing no value anymore. It was
also the only test keeping the Node InternalServer mock. This commit
deletes both.

Epic: None
Release note: None
Updated the regression.diffs file to reflect changes due to improved
TRIGGER and collation support. Filed issues for the panic and trigger
misbehavior.

Fixes: cockroachdb#132515
Informs: cockroachdb#135311
Informs: cockroachdb#135131

Release note: None
135313: roachtest: update regression.diffs r=mw5h a=mw5h

Updated the regression.diffs file to reflect changes due to improved TRIGGER and collation support. Filed issues for the panic and trigger misbehavior.

Fixes: cockroachdb#132515
Informs: cockroachdb#135311
Informs: cockroachdb#135131

Release note: None

Co-authored-by: Matt White <[email protected]>
135335: roachtest: increase kv.closed_timestamp.target_duration for copyfrom r=mw5h a=mw5h

Per PR cockroachdb#117091, the roachtest copyfrom test frequently runs in an overloaded environment, causing transactions to time out. That PR attempted to add a retry loop around the transaction, but the test is failed by a log watcher anyway.

Consulting the CRDB documentation on "Transaction Retry Error Reference" reveals a suggested course of action of increasing kv.closed_timestamp.target_duration, with some caveats regarding AS OF SYSTEM TIME and changefeeds that we don't care about for the purposes of this test. The most recent test failure indicates a 40s overage, so we set the target_duration to 60s (up from a default of 3s) and hope for the best.

Fixes: cockroachdb#133797
Release note: None

Co-authored-by: Matt White <[email protected]>
Move all Store methods that take a Txn parameter to the Txn interface. This
cleans up the interface and prepares the way to have a Txn implementation
that buffers up changes rather than actually writing them to the store.

Epic: CRDB-42943

Release note: None
This is a purely mechanical commit that moves Txn code into a new file.

Epic: CRDB-42943

Release note: None
Rather tahn crashing the whole process, we can use an assertion error
for this. The log.Fatal was initially added in 5d205ed, in a time
far before we were careful about avoiding node crashes.

Release note: None
…tRegressExpiration

This commit deflakes
TestLeaseRequestFromExpirationToEpochOrLeaderDoesNotRegressExpiration by
establishing the node liveness record before pausing node liveness
heartbeats.

Fixes: cockroachdb#136549

Release Note: None
136849: rpc: increase timeout for TestClockOffsetInPingRequest r=miraradeva a=andrewbaptist

Previously this test had the `RPCHeartbeatInterval` and `RPCHeartbeatTimeout` both set at 100ms. Normally the `RPCHeartbeatTimeout` is set at 3x the interval. During race builds this provides enough extra cushion that the test should pass.

Epic: none
Fixes: cockroachdb#136703

Release note: None

Co-authored-by: Andrew Baptist <[email protected]>
Until now, all stack traces were treated as untyped strings and hence
redacted. Stack traces don't need to be redacted. Because they don't
contain any PII.

This commit introduces a new function, Stack(), in the debugutil package
as a replacement for the standard debug.Stack(). The new function wraps
the stack trace contents in a []byte-aliased type "SafeStack" that
implements the SafeValue interface, ensuring it is not redacted. This
function can be used as a drop-in replacement for debug.Stack().

Part of: CRDB-15292
Epic: CRDB-37533
Release note: None
A previous commit introduced the debugutil.Stack() function which
wraps the debug.Stack() output in "SafeStack" type to make sure that it
does not get redacted.

This commit replaces all the current uses of debug.Stack() with
debugutil.Stack(). With one exception - there is still a call to
debug.Stack() in pkg/util/log/clog.go. Changing this is not that
straightforward. But it's ok keep it for now because this does not go
through the redaction flow.

Part of: CRDB-15292
Epic: CRDB-37533
Release note: None
Some of the previous commits addressed the unnecessary redaction of
stack traces by replacing uses of debug.Stack() with debugutil.Stack().
The new function ensures proper handling of redaction.

This commit future-proofs the solution by adding a linter rule to block
future uses of debug.Stack(). The linter error will prompt users to use
debugutil.Stack() instead.

Part of: CRDB-15292
Epic: CRDB-37533
Release note: None
136869: authors: add <sadaf-crl> to authors r=sadaf-crl a=sadaf-crl

Epic: None
Release note: None

Co-authored-by: Sadaf Qureshi <[email protected]>
136288: pkg/util/debugutil: introduce debug.Stack() wrapper r=dhartunian a=arjunmahishi

Until now, all stack traces were treated as untyped strings and hence redacted. Stack traces don't need to be redacted. Because they don't contain any PII. This PR fixes that with the following commits:

1. Introduce a new function `Stack()` in the `debugutil` package as a replacement for the standard `debug.Stack()`. This new function wraps the stack trace contents in `redact.Safe()` to make sure that it does not get redacted.

2. Replace the all the current uses of `debug.Stack()` across the codebase with `debugutil.Stack()`

3. Add a linter rule to block future uses of `debug.Stack()`. The linter error will nudge users to use `debugutil.Stack()` instead.

Part of: CRDB-15292
Epic: CRDB-37533
Release note: None

Co-authored-by: Arjun Mahishi <[email protected]>
The current implementation of the resize operation fails after
creating new VMs. This PR addresses the errors encountered when
joining a new node to the cluster. The changes include:
- Read start flags of cockroach by passing a config file.
- Introduces flag `workload-cluster` which enables operation to
  perform certain actions like updating certificates on workload nodes.

Sample yaml config for roachtest operation:
```yaml
startopts:
  roachprodopts:
    storecount: 4
    enablefluentsink: true
    walfailover: among-stores
    sqlport: 26257
```

Epic: none

Release note: None
136787: roachtest: fix resize operation in run-operation command r=vidit-bhat a=shailendra-patel

The current implementation of the resize operation fails after creating new VMs. This PR addresses the errors encountered when joining a new node to the cluster. The changes include:
- Read start flags of cockroach by passing a config file.
- Introduces flag `workload-cluster` which enables operation to perform certain actions like updating certificates on workload nodes.

Sample yaml config for roachtest operation:
```yaml
startopts:
  roachprodopts:
    storecount: 4
    enablefluentsink: true
    walfailover: among-stores
    sqlport: 26257
```

Epic: none

Release note: None

Co-authored-by: Shailendra Patel <[email protected]>
136797: rangefeed: track testing caughtUp state with an atomic r=andrewbaptist a=stevendanna

The caughtUp variable is used in tests to assure that the rangefeed has sent all of its messages to the sender. However, it was subject to a race condition in which the testing-only blocking write path would set catchUp to false _after_ the event had been handled because this testing variable and other variables shared the same mutex.

We increment the counter on every push into the buffer and decrement it after the stream send has succeeded. In the blocking case that was buggy before, it is possible that the counter is temporarily -1, but this doesn't pose a problem.

Epic: none

Fixes cockroachdb#136544

Release note: None

Co-authored-by: Steven Danna <[email protected]>
The functions Get() and GetWithBuf() internally call runtime.Stack() to
fetch the calling go routine's stack trace.

This commit changes the return type of both the function from []byte to
debugutil.SafeStack (introduced in cockroachdb#136288). This will ensure that the
stack traces generated using these function are not redacted.

Part of: CRDB-15292
Epic: CRDB-37533
Release note: None
136926: sql/server: Fix license disable bug for single node clusters r=spilchen a=spilchen

License enforcement is intended to be disabled for single-node setups. However, when starting CockroachDB with start-single-node, license enforcement was only disabled on the initial startup. Subsequent restarts of the cluster did not correctly disable licensing. This fix addresses and resolves that issue.

This should be backported all the way back to 23.1.

Epic: None
Release note (bug fix): Fixed an issue where license enforcement was not consistently disabled for single-node clusters started with start-single-node, ensuring proper behavior on cluster restarts.

Co-authored-by: Matt Spilchen <[email protected]>
Previously, the mutex in buffered registration was exposed as a sync.Locker
interface, which is not a common practice in our codebase. This patch changes it
to wrap syncutil.Mutex directly.

Release note: none
Epic: none
137007: kvserver/rangefeed: change sync.Locker to sync.Mutex r=stevendanna a=wenyihu6

Previously, the mutex in buffered registration was exposed as a sync.Locker
interface, which is not a common practice in our codebase. This patch changes it
to wrap syncutil.Mutex directly.

Release note: none
Epic: none

Co-authored-by: Wenyi Hu <[email protected]>
136942: vecstore: move methods from Store to Txn r=mw5h a=andy-kimball

Move all Store methods that take a Txn parameter to the Txn interface. This
cleans up the interface and prepares the way to have a Txn implementation
that buffers up changes rather than actually writing them to the store.

Epic: CRDB-42943

Release note: None


136968: kv: delete TestSendToOneClient r=nvanbenschoten a=nvanbenschoten

The test was written in 2015 and was providing no value anymore. It was also the only test keeping the `Node` `InternalServer` mock in use. This commit deletes both.

Epic: None
Release note: None

Co-authored-by: Andrew Kimball <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
This patch adds unbufferedRegistration.

UnbufferedRegistration is like BufferedRegistration but uses BufferedSender to
buffer live raft updates instead of a using fixed size channel and having a
dedicated per-registration goroutine to volley events to underlying gRPC
stream. Instead, there is only one BufferedSender for each incoming
node.MuxRangefeed gRPC call. BufferedSender is responsible for buffering and
sending its updates to the underlying gRPC stream in a dedicated goroutine
O(node).

Resolved: cockroachdb#110432

Release note: none

Co-authored-by: Steven Danna <[email protected]>
andyyang890 and others added 29 commits December 13, 2024 13:50
This patch marks `jobspb.ResolvedSpan_BoundaryType` as a SafeValue
so that it won't be redacted in the logs.

Release note: None
137383: roachtest: use new ruby version for activerecord test r=rafiss a=rafiss

The latest version of this package needs ruby3.3 in order to pick up this change:
ruby/ruby@7e5c662

Otherwise, these tests fail with:
```
ArgumentError: wrong number of arguments (given 2, expected 0..1)
    /usr/local/lib/ruby/3.1.0/random/formatter.rb:213:in `alphanumeric'
    /usr/local/lib/ruby/gems/3.1.0/bundler/gems/rails-6cfa24f9dfdd/activesupport/lib/active_support/core_ext/securerandom.rb:20:in `base58'
    ...
```

fixes cockroachdb#136562
fixes cockroachdb#136876
fixes cockroachdb#136560
Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
TestChangefeedIdleness checks that the changefeed job state correctly
moves in and out of the idle state. This change fixes a race condition
when validating that jobs move out of idle state after an insert, since
it's possible that the job becomes active and then idle again before the
test is able to validate that the state was not idle.

Epic: none

Fixes: cockroachdb#134304
Fixes: cockroachdb#133660

Release note: None
This is implemented using the same approach we took to validate type
changes in ALTER COLUMN TYPE when the size of the type changed.

Release note (bug fix): Previously, if a VIRTUAL computed column was
added and it was a fixed-size type such as VARCHAR(2), the computed
values would not be checked to make sure they were not too large for the
type. Now this validation is performed, which prevents an invalid
computed column definition from being added to a table.
This resolves a TODO that required logic to validate computed columns to
be in place.

Release note: None
This is just a regression test to make sure the behavior is as expected,
since I was working on related validations for DEFAULT and computed
columns.

Release note: None
137375: jobspb: mark ResolvedSpan_BoundaryType as SafeValue r=rharding6373 a=andyyang890

This patch marks `jobspb.ResolvedSpan_BoundaryType` as a SafeValue
so that it won't be redacted in the logs.

Informs cockroachdb#128597

Release note: None

137410: kvserver: deflake TestReplicaCircuitBreaker_RangeFeed r=iskettaneh a=iskettaneh

This commit extends the timeout for the function tc.UntripsSoon(). I noticed that with leader leases, this function could take a little longer than before to finish successfully. Especially if we just restart the server.

Fixes: cockroachdb#137372

Release note: None

Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Ibrahim Kettaneh <[email protected]>
It was timing out, though it completes within 6s on an M1 macbook.

Fixes cockroachdb#137380

Epic: none

Release note: None
134331: changefeedccl: eliminate race in test validation r=wenyihu6 a=rharding6373

TestChangefeedIdleness checks that the changefeed job state correctly moves in and out of the idle state. This change fixes a race condition when validating that jobs move out of idle state after an insert, since it's possible that the job becomes active and then idle again before the test is able to validate that the state was not idle.

Epic: none

Fixes: cockroachdb#134304
Fixes: cockroachdb#133660

Release note: None

137311: roachtest: remove RequiresLicense test spec r=srosenberg a=DarrylWong

As of cockroachdb#135326, roachprod will now generate an
enterprise license if one is not supplied, making
the test spec no longer need.

Release note: none
Epic: none
Fixes: none

Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: DarrylWong <[email protected]>
137299: schemachanger,scbuild: add transient constraint to validate computed column  r=rafiss a=rafiss

fixes cockroachdb#81698

### schemaexpr: add assignment cast to computed column if types differ

In 4235a7a we changed default
expressions to use an assignment cast if the type differs. This made it
so values that are inserted during a backfill are validated to fit
within the size constraint of the data type (e.g. VARCHAR(2)).

This patch does the same for computed columns, which also need the same
validation.

Release note (bug fix): Previously, if a STORED computed column was
added and it was a fixed-size type such as VARCHAR(2), the computed
values would not be checked to make sure they were not too large for the
type. Now this validation is performed, which prevents an invalid
computed column definition from being added to a table.

### schemaexpr: remove unused evalCtx parameter

###  scbuild: add transient constraint to validate comoputed column

This is implemented using the same approach we took to validate type
changes in ALTER COLUMN TYPE when the size of the type changed.

Release note (bug fix): Previously, if a VIRTUAL computed column was
added and it was a fixed-size type such as VARCHAR(2), the computed
values would not be checked to make sure they were not too large for the
type. Now this validation is performed, which prevents an invalid
computed column definition from being added to a table.

###  optbuilder: use assignment cast for computed columns

This resolves a TODO that required logic to validate computed columns to
be in place.

Co-authored-by: Rafi Shamim <[email protected]>
See cockroachdb/logtags#4

Epic: none
Release note: None
There can be stale MsgApps even when the replica is in StateReplicate,
and these are not relevant and serve only to confuse the downstream
code.

Fixes cockroachdb#136322

Epic: CRDB-37515

Release note: None
This commit introduces a custom context type that can be used to more
efficiently associate values to contexts.

Instead of using arbitrary objects as keys, the keys are a small set
of integers. Any key must be globally registered first, and there is a
limited number that can be registered.

The `fastValuesCtx` stores *all* values for these keys, so traversing
the context hierarchy is not necessary. We also provide a way to add
multiple values at the same time, which saves on allocations.

I looked at before and after profiles in sysbench. Before:
 - CPU usage:
  context.value 0.6%
  context.WithValue 0.3%
  total 0.9%
 - Allocs:
  context.WithValue: 3.5%

After:
 - CPU usage:
  context.value + ctxutil.FastValue 0.5%
  context.WithValue 0.1%
  ctxutil.WithFastValue(s) 0.1%
  total 0.7%
 - Allocs:
  context.WithValue: 0.2%
  context.WithFastValue: 0.6%
  ctxutil.FastValuesBuilder.Finish: 1.4%
  total 2.2%

I will investigate improving on this, perhaps with providing a
pre-allocated context in codepaths where this is possible.

Informs: cockroachdb#136581
Release note: None
137359: changefeedccl/kvfeed: add log for table that encountered schema change r=rharding6373 a=andyyang890

This patch adds a log documenting the table(s) that encountered schema
changes and caused the kv feed to restart/exit, which will be useful
for debugging.

Fixes cockroachdb#136624
Closes cockroachdb#134963

Release note: None

137432: docgen: remove unused BEGIN diagram r=yuzefovich a=taroface

 cockroachdb#137135 added `begin_stmt` to the .bzl files for SQL diagram generation. However, this diagram is no longer used, and doesn't seem to be generated by `docgen` upon local testing. Removing `begin_stmt` and `begin_stmt.bnf` worked in cockroachdb#137111, so this PR applies the same update to `master` and backports to 24.1-3. (cc `@rickystewart)`

Epic: none
Release note: None

Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Ryan Kuo <[email protected]>
Implicit `FOR UPDATE` locks are now added to the reads in
`INSERT .. ON CONFLICT`, `UPSERT`, `UPDATE`, and `DELETE` queries when
those mutations use generic query plans.

Fixes cockroachdb#137352

Release note: None
137447: sql/catalog: remove isMutable flag toggle in getObjectPrefix r=mgartner a=mgartner

Epic: None

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
137265: sql/tests: fix secondary index key in sysbenchKV.DeleteInsert r=nvanbenschoten a=nvanbenschoten

`sysbenchKV.DeleteInsert` was previously re-inserting the secondary index key that it had just deleted, instead of inserting the secondary index key that corresponds to the new primary index key.

To prevent this kind of bug in the future, we add some checks for secondary index integrity, where appropriate.

Epic: None
Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Split between each table and index to ensure cross-range transactions.

Epic: None
Release note: None
137469: sql/tests: manually split ranges in `BenchmarkSysbench/KV` r=nvanbenschoten a=nvanbenschoten

Split between each table and index to ensure cross-range transactions.

Epic: None
Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
137440: kvserver: make TestRaftReceiveQueuesEnforceMaxLenConcurrency cheaper r=miraradeva a=sumeerbhola

It was timing out, though it completes within 6s on an M1 macbook.

Fixes cockroachdb#137380

Epic: none

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
137360: opt: add implicit FOR UPDATE locks for generic query plans r=yuzefovich a=mgartner

Implicit `FOR UPDATE` locks are now added to the reads in
`INSERT .. ON CONFLICT`, `UPSERT`, `UPDATE`, and `DELETE` queries when
those mutations use generic query plans.

Fixes cockroachdb#137352

Release note: None


Co-authored-by: Marcus Gartner <[email protected]>
137344: ctxutil: introduce fast values r=RaduBerinde a=RaduBerinde

#### go.mod: update logtag dependency

See cockroachdb/logtags#4

Epic: none
Release note: None

#### serverident: unexport ServerIdentificationContextKey

Epic: none
Release note: None

#### ctxutil: introduce fast values

This commit introduces a custom context type that can be used to more
efficiently associate values to contexts.

Instead of using arbitrary objects as keys, the keys are a small set
of integers. Any key must be globally registered first, and there is a
limited number that can be registered.

The `fastValuesCtx` stores *all* values for these keys, so traversing
the context hierarchy is not necessary. We also provide a way to add
multiple values at the same time, which saves on allocations.

I looked at before and after profiles in sysbench. Before:
 - CPU usage:
  context.value 0.6%
  context.WithValue 0.3%
  total 0.9%
 - Allocs:
  context.WithValue: 3.5%

After:
 - CPU usage:
  context.value + ctxutil.FastValue 0.5%
  context.WithValue 0.1%
  ctxutil.WithFastValue(s) 0.1%
  total 0.7%
 - Allocs:
  context.WithValue: 0.2%
  context.WithFastValue: 0.6%
  ctxutil.FastValuesBuilder.Finish: 1.4%
  total 2.2%

I will investigate improving on this, perhaps with providing a
pre-allocated context in codepaths where this is possible.

Informs: cockroachdb#136581
Informs: cockroachdb#135904
Release note: None

Co-authored-by: Radu Berinde <[email protected]>
137455: rac2: ignore MsgApps when constructing RaftEvent in MsgAppPull mode r=pav-kv a=sumeerbhola

There can be stale MsgApps even when the replica is in StateReplicate, and these are not relevant and serve only to confuse the downstream code.

Fixes cockroachdb#136322

Epic: CRDB-37515

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
When a rangefeed's closed timestamp lags behind the current time, any
writes that have occurred in-between will not be emitted. This is
problematic in cases where the lag is significant and chronic, as
consumers (changefeeds, logical data replication, physical cluster
replication) are likewise delayed in their processing. Observing a
rangefeed with a chronic lagging closed timestamp will become relatively
more likely with quorum replication flow control, as entries are
deliberately queued, instead of being sent, to stores which do not have
sufficient send tokens.

This commit (re)introduces the concept of cancelling lagging rangefeeds,
so that they may be replanned and retried on another replica. The other
replica may also have this issue, however there should be at least a
quorum of voting replicas with a similar closed timestamp that would be
suitable.

The replanning on a different replica is handled already by existing
machinery. This commit introduces an observer which generates a signal
indicating that the rangefeed should be cancelled. The signal also
encapsulates the existing logic to nudge a rangefeed as well.

The criteria for cancelling a rangefeed is influenced by two thresholds,
defined as cluster settings:

```
kv.rangefeed.lagging_closed_timestamp_cancel_multiple
(default = 20 x closed ts target duration = 60s)
```

```
kv.rangefeed.lagging_closed_timestamp_cancel_min_lagging_duration
(default = 60s)
```

When a replica's closed timestamp has sustained lag greater than:

```
kv.rangefeed.lagging_closed_timestamp_cancel_multiple * kv.closed_timestamp.target_duration
```

For at least:

```
`kv.rangefeed.lagging_closed_timestamp_cancel_min_lagging_duration`
```

duration, the rangefeed will be cancelled and then re-planned on the
client. This can be visualized in the following diagram, where there is
an initial spike over the lag threshold, which is recovered from so the
rangefeed wouldn't be cancelled. The second drop below the lag
threshold is sustained for greater than the duration threshold, so the
rangefeed is then cancelled for replanning:

```
lag=0          ─────────────────────────────────────────────────────

observed lag   ─────────┐
                        │
                        │
                        │     ┌───────┐
lag threshold  ─────────┼─────┼───────┼──────────────────────────────
                        │     │       └───┐
                        │     │           └─────┐
                        └─────┘                 └──────┐
                                                       └────────────
                                      ◄────────────────────────────►
                                         exceeds duration threshold
```

Note we could also prevent accepting a rangefeed registration if the lag
were sufficient, however the behavior change here applies only to lag
which as been observed to be sustained over time, without historical
data, we cannot apply identical decision logic on registration.

Fixes: cockroachdb#136214
Release note: None
@mohini-crl mohini-crl merged commit 43ebaa0 into master Dec 19, 2024
1 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: mitigate chronic lagging rangefeeds due to rac2 send queues