forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: replan rangefeeds with chronic closed ts lag #8
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
inMemoryLock wraps a read-write lock, adding support for reentrancy and ownership tracking. This will be used to provide per-partition locking for the InMemoryStore. Epic: CRDB-42943 Release note: None
Previously, the in-memory store only supported locking the entire store using a read-write lock. This meant that split/merge transactions could never run concurrently with insert or search transactions. In addition, even insert and search operations were serialized with a mutex. This PR adds per-partition locking capability to the in-memory store. Now, each partition has its own read-write lock. Split, merge, and insert operations acquire a write lock before modifying a partition. Search operations can run in parallel on a partition. While only one split or merge operation can run at a time, it can run in parallel with insert and search operations. Because of this, and because insert and search operations acquire at most one partition lock at a time, deadlocks are not possible. Epic: CRDB-42943 Release note: None
136784: vecstore: add per-partition locking to InMemoryStore r=drewkimball,mw5h a=andy-kimball Previously, the in-memory store only supported locking the entire store using a read-write lock. This meant that split/merge transactions could never run concurrently with insert or search transactions. In addition, even insert and search operations were serialized with a mutex. This PR adds per-partition locking capability to the in-memory store. Now, each partition has its own read-write lock. Split, merge, and insert operations acquire a write lock before modifying a partition. Search operations can run in parallel on a partition. While only one split or merge operation can run at a time, it can run in parallel with insert and search operations. Because of this, and because insert and search operations acquire at most one partition lock at a time, deadlocks are not possible. Epic: CRDB-42943 Release note: None Co-authored-by: Andrew Kimball <[email protected]>
136809: CODEOWNERS: fine tune roachtest package-level ownership r=tbg,rafiss,herkolategan a=srosenberg Previously, `/pkg/cmd/roachtest/` was being matched as the catch-all, resulting in noisy notifications. Now, testeng gets notified only when the actual framework code is updated. Epic: none Release note: None Co-authored-by: Stan Rosenberg <[email protected]>
Remove 24.3 gates and corresponding code that is now obsolete. Epic: none Release note: None
136076: sql: check for multiple mutations to the same table by triggers r=DrewKimball a=DrewKimball #### sql: refactor some cascade/trigger logic This commit refactors some of the logic shared between cascades and AFTER triggers. This will make the following commit easier to understand. Epic: None Release note: None #### sql: check for multiple mutations to the same table by triggers There are currently some situations where a query that modifies the same table in multiple locations may cause index corruption (cockroachdb#70731). To avoid this, we disallow query structures that may lead to a problematic combination of mutations. Triggers require special handling to make this check work, because they can execute arbitrary SQL statements, which can mutate a table directly or through routines, FK cascades, or other triggers. BEFORE triggers on the main query "just work" because they are built as UDF invocations as part of the main query. AFTER triggers and BEFORE triggers fired on cascades are more difficult, because they are planned lazily only if the post-query has rows to process. This commit adds logic to track invalid mutations for both types of triggers. We now propagate the "ancestor" mutated tables whenever planning a post-query, so that any triggers planned as part of the post-query can detect conflicting mutations. See the "After Triggers" section in `statement_tree.go` for additional explanation. Informs cockroachdb#70731 Release note (bug fix): Previously, it was possible to cause index corruption using AFTER-triggers that fire within a routine. In order for the bug to manifest, both the AFTER-trigger and the statement that invokes the routine must mutate the same row of a table with a mutation other than `INSERT`. Co-authored-by: Drew Kimball <[email protected]>
The test was written in 2015 and was providing no value anymore. It was also the only test keeping the Node InternalServer mock. This commit deletes both. Epic: None Release note: None
Updated the regression.diffs file to reflect changes due to improved TRIGGER and collation support. Filed issues for the panic and trigger misbehavior. Fixes: cockroachdb#132515 Informs: cockroachdb#135311 Informs: cockroachdb#135131 Release note: None
135313: roachtest: update regression.diffs r=mw5h a=mw5h Updated the regression.diffs file to reflect changes due to improved TRIGGER and collation support. Filed issues for the panic and trigger misbehavior. Fixes: cockroachdb#132515 Informs: cockroachdb#135311 Informs: cockroachdb#135131 Release note: None Co-authored-by: Matt White <[email protected]>
135335: roachtest: increase kv.closed_timestamp.target_duration for copyfrom r=mw5h a=mw5h Per PR cockroachdb#117091, the roachtest copyfrom test frequently runs in an overloaded environment, causing transactions to time out. That PR attempted to add a retry loop around the transaction, but the test is failed by a log watcher anyway. Consulting the CRDB documentation on "Transaction Retry Error Reference" reveals a suggested course of action of increasing kv.closed_timestamp.target_duration, with some caveats regarding AS OF SYSTEM TIME and changefeeds that we don't care about for the purposes of this test. The most recent test failure indicates a 40s overage, so we set the target_duration to 60s (up from a default of 3s) and hope for the best. Fixes: cockroachdb#133797 Release note: None Co-authored-by: Matt White <[email protected]>
Move all Store methods that take a Txn parameter to the Txn interface. This cleans up the interface and prepares the way to have a Txn implementation that buffers up changes rather than actually writing them to the store. Epic: CRDB-42943 Release note: None
This is a purely mechanical commit that moves Txn code into a new file. Epic: CRDB-42943 Release note: None
Rather tahn crashing the whole process, we can use an assertion error for this. The log.Fatal was initially added in 5d205ed, in a time far before we were careful about avoiding node crashes. Release note: None
…tRegressExpiration This commit deflakes TestLeaseRequestFromExpirationToEpochOrLeaderDoesNotRegressExpiration by establishing the node liveness record before pausing node liveness heartbeats. Fixes: cockroachdb#136549 Release Note: None
136849: rpc: increase timeout for TestClockOffsetInPingRequest r=miraradeva a=andrewbaptist Previously this test had the `RPCHeartbeatInterval` and `RPCHeartbeatTimeout` both set at 100ms. Normally the `RPCHeartbeatTimeout` is set at 3x the interval. During race builds this provides enough extra cushion that the test should pass. Epic: none Fixes: cockroachdb#136703 Release note: None Co-authored-by: Andrew Baptist <[email protected]>
Until now, all stack traces were treated as untyped strings and hence redacted. Stack traces don't need to be redacted. Because they don't contain any PII. This commit introduces a new function, Stack(), in the debugutil package as a replacement for the standard debug.Stack(). The new function wraps the stack trace contents in a []byte-aliased type "SafeStack" that implements the SafeValue interface, ensuring it is not redacted. This function can be used as a drop-in replacement for debug.Stack(). Part of: CRDB-15292 Epic: CRDB-37533 Release note: None
A previous commit introduced the debugutil.Stack() function which wraps the debug.Stack() output in "SafeStack" type to make sure that it does not get redacted. This commit replaces all the current uses of debug.Stack() with debugutil.Stack(). With one exception - there is still a call to debug.Stack() in pkg/util/log/clog.go. Changing this is not that straightforward. But it's ok keep it for now because this does not go through the redaction flow. Part of: CRDB-15292 Epic: CRDB-37533 Release note: None
Some of the previous commits addressed the unnecessary redaction of stack traces by replacing uses of debug.Stack() with debugutil.Stack(). The new function ensures proper handling of redaction. This commit future-proofs the solution by adding a linter rule to block future uses of debug.Stack(). The linter error will prompt users to use debugutil.Stack() instead. Part of: CRDB-15292 Epic: CRDB-37533 Release note: None
136869: authors: add <sadaf-crl> to authors r=sadaf-crl a=sadaf-crl Epic: None Release note: None Co-authored-by: Sadaf Qureshi <[email protected]>
136288: pkg/util/debugutil: introduce debug.Stack() wrapper r=dhartunian a=arjunmahishi Until now, all stack traces were treated as untyped strings and hence redacted. Stack traces don't need to be redacted. Because they don't contain any PII. This PR fixes that with the following commits: 1. Introduce a new function `Stack()` in the `debugutil` package as a replacement for the standard `debug.Stack()`. This new function wraps the stack trace contents in `redact.Safe()` to make sure that it does not get redacted. 2. Replace the all the current uses of `debug.Stack()` across the codebase with `debugutil.Stack()` 3. Add a linter rule to block future uses of `debug.Stack()`. The linter error will nudge users to use `debugutil.Stack()` instead. Part of: CRDB-15292 Epic: CRDB-37533 Release note: None Co-authored-by: Arjun Mahishi <[email protected]>
The current implementation of the resize operation fails after creating new VMs. This PR addresses the errors encountered when joining a new node to the cluster. The changes include: - Read start flags of cockroach by passing a config file. - Introduces flag `workload-cluster` which enables operation to perform certain actions like updating certificates on workload nodes. Sample yaml config for roachtest operation: ```yaml startopts: roachprodopts: storecount: 4 enablefluentsink: true walfailover: among-stores sqlport: 26257 ``` Epic: none Release note: None
136787: roachtest: fix resize operation in run-operation command r=vidit-bhat a=shailendra-patel The current implementation of the resize operation fails after creating new VMs. This PR addresses the errors encountered when joining a new node to the cluster. The changes include: - Read start flags of cockroach by passing a config file. - Introduces flag `workload-cluster` which enables operation to perform certain actions like updating certificates on workload nodes. Sample yaml config for roachtest operation: ```yaml startopts: roachprodopts: storecount: 4 enablefluentsink: true walfailover: among-stores sqlport: 26257 ``` Epic: none Release note: None Co-authored-by: Shailendra Patel <[email protected]>
136797: rangefeed: track testing caughtUp state with an atomic r=andrewbaptist a=stevendanna The caughtUp variable is used in tests to assure that the rangefeed has sent all of its messages to the sender. However, it was subject to a race condition in which the testing-only blocking write path would set catchUp to false _after_ the event had been handled because this testing variable and other variables shared the same mutex. We increment the counter on every push into the buffer and decrement it after the stream send has succeeded. In the blocking case that was buggy before, it is possible that the counter is temporarily -1, but this doesn't pose a problem. Epic: none Fixes cockroachdb#136544 Release note: None Co-authored-by: Steven Danna <[email protected]>
The functions Get() and GetWithBuf() internally call runtime.Stack() to fetch the calling go routine's stack trace. This commit changes the return type of both the function from []byte to debugutil.SafeStack (introduced in cockroachdb#136288). This will ensure that the stack traces generated using these function are not redacted. Part of: CRDB-15292 Epic: CRDB-37533 Release note: None
136926: sql/server: Fix license disable bug for single node clusters r=spilchen a=spilchen License enforcement is intended to be disabled for single-node setups. However, when starting CockroachDB with start-single-node, license enforcement was only disabled on the initial startup. Subsequent restarts of the cluster did not correctly disable licensing. This fix addresses and resolves that issue. This should be backported all the way back to 23.1. Epic: None Release note (bug fix): Fixed an issue where license enforcement was not consistently disabled for single-node clusters started with start-single-node, ensuring proper behavior on cluster restarts. Co-authored-by: Matt Spilchen <[email protected]>
Previously, the mutex in buffered registration was exposed as a sync.Locker interface, which is not a common practice in our codebase. This patch changes it to wrap syncutil.Mutex directly. Release note: none Epic: none
137007: kvserver/rangefeed: change sync.Locker to sync.Mutex r=stevendanna a=wenyihu6 Previously, the mutex in buffered registration was exposed as a sync.Locker interface, which is not a common practice in our codebase. This patch changes it to wrap syncutil.Mutex directly. Release note: none Epic: none Co-authored-by: Wenyi Hu <[email protected]>
136942: vecstore: move methods from Store to Txn r=mw5h a=andy-kimball Move all Store methods that take a Txn parameter to the Txn interface. This cleans up the interface and prepares the way to have a Txn implementation that buffers up changes rather than actually writing them to the store. Epic: CRDB-42943 Release note: None 136968: kv: delete TestSendToOneClient r=nvanbenschoten a=nvanbenschoten The test was written in 2015 and was providing no value anymore. It was also the only test keeping the `Node` `InternalServer` mock in use. This commit deletes both. Epic: None Release note: None Co-authored-by: Andrew Kimball <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
This patch adds unbufferedRegistration. UnbufferedRegistration is like BufferedRegistration but uses BufferedSender to buffer live raft updates instead of a using fixed size channel and having a dedicated per-registration goroutine to volley events to underlying gRPC stream. Instead, there is only one BufferedSender for each incoming node.MuxRangefeed gRPC call. BufferedSender is responsible for buffering and sending its updates to the underlying gRPC stream in a dedicated goroutine O(node). Resolved: cockroachdb#110432 Release note: none Co-authored-by: Steven Danna <[email protected]>
This patch marks `jobspb.ResolvedSpan_BoundaryType` as a SafeValue so that it won't be redacted in the logs. Release note: None
137383: roachtest: use new ruby version for activerecord test r=rafiss a=rafiss The latest version of this package needs ruby3.3 in order to pick up this change: ruby/ruby@7e5c662 Otherwise, these tests fail with: ``` ArgumentError: wrong number of arguments (given 2, expected 0..1) /usr/local/lib/ruby/3.1.0/random/formatter.rb:213:in `alphanumeric' /usr/local/lib/ruby/gems/3.1.0/bundler/gems/rails-6cfa24f9dfdd/activesupport/lib/active_support/core_ext/securerandom.rb:20:in `base58' ... ``` fixes cockroachdb#136562 fixes cockroachdb#136876 fixes cockroachdb#136560 Release note: None Co-authored-by: Rafi Shamim <[email protected]>
TestChangefeedIdleness checks that the changefeed job state correctly moves in and out of the idle state. This change fixes a race condition when validating that jobs move out of idle state after an insert, since it's possible that the job becomes active and then idle again before the test is able to validate that the state was not idle. Epic: none Fixes: cockroachdb#134304 Fixes: cockroachdb#133660 Release note: None
This is implemented using the same approach we took to validate type changes in ALTER COLUMN TYPE when the size of the type changed. Release note (bug fix): Previously, if a VIRTUAL computed column was added and it was a fixed-size type such as VARCHAR(2), the computed values would not be checked to make sure they were not too large for the type. Now this validation is performed, which prevents an invalid computed column definition from being added to a table.
This resolves a TODO that required logic to validate computed columns to be in place. Release note: None
Release note: None
This is just a regression test to make sure the behavior is as expected, since I was working on related validations for DEFAULT and computed columns. Release note: None
Release note: None
137375: jobspb: mark ResolvedSpan_BoundaryType as SafeValue r=rharding6373 a=andyyang890 This patch marks `jobspb.ResolvedSpan_BoundaryType` as a SafeValue so that it won't be redacted in the logs. Informs cockroachdb#128597 Release note: None 137410: kvserver: deflake TestReplicaCircuitBreaker_RangeFeed r=iskettaneh a=iskettaneh This commit extends the timeout for the function tc.UntripsSoon(). I noticed that with leader leases, this function could take a little longer than before to finish successfully. Especially if we just restart the server. Fixes: cockroachdb#137372 Release note: None Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Ibrahim Kettaneh <[email protected]>
It was timing out, though it completes within 6s on an M1 macbook. Fixes cockroachdb#137380 Epic: none Release note: None
Release note: None
134331: changefeedccl: eliminate race in test validation r=wenyihu6 a=rharding6373 TestChangefeedIdleness checks that the changefeed job state correctly moves in and out of the idle state. This change fixes a race condition when validating that jobs move out of idle state after an insert, since it's possible that the job becomes active and then idle again before the test is able to validate that the state was not idle. Epic: none Fixes: cockroachdb#134304 Fixes: cockroachdb#133660 Release note: None 137311: roachtest: remove RequiresLicense test spec r=srosenberg a=DarrylWong As of cockroachdb#135326, roachprod will now generate an enterprise license if one is not supplied, making the test spec no longer need. Release note: none Epic: none Fixes: none Co-authored-by: rharding6373 <[email protected]> Co-authored-by: DarrylWong <[email protected]>
137299: schemachanger,scbuild: add transient constraint to validate computed column r=rafiss a=rafiss fixes cockroachdb#81698 ### schemaexpr: add assignment cast to computed column if types differ In 4235a7a we changed default expressions to use an assignment cast if the type differs. This made it so values that are inserted during a backfill are validated to fit within the size constraint of the data type (e.g. VARCHAR(2)). This patch does the same for computed columns, which also need the same validation. Release note (bug fix): Previously, if a STORED computed column was added and it was a fixed-size type such as VARCHAR(2), the computed values would not be checked to make sure they were not too large for the type. Now this validation is performed, which prevents an invalid computed column definition from being added to a table. ### schemaexpr: remove unused evalCtx parameter ### scbuild: add transient constraint to validate comoputed column This is implemented using the same approach we took to validate type changes in ALTER COLUMN TYPE when the size of the type changed. Release note (bug fix): Previously, if a VIRTUAL computed column was added and it was a fixed-size type such as VARCHAR(2), the computed values would not be checked to make sure they were not too large for the type. Now this validation is performed, which prevents an invalid computed column definition from being added to a table. ### optbuilder: use assignment cast for computed columns This resolves a TODO that required logic to validate computed columns to be in place. Co-authored-by: Rafi Shamim <[email protected]>
See cockroachdb/logtags#4 Epic: none Release note: None
Epic: none Release note: None
There can be stale MsgApps even when the replica is in StateReplicate, and these are not relevant and serve only to confuse the downstream code. Fixes cockroachdb#136322 Epic: CRDB-37515 Release note: None
This commit introduces a custom context type that can be used to more efficiently associate values to contexts. Instead of using arbitrary objects as keys, the keys are a small set of integers. Any key must be globally registered first, and there is a limited number that can be registered. The `fastValuesCtx` stores *all* values for these keys, so traversing the context hierarchy is not necessary. We also provide a way to add multiple values at the same time, which saves on allocations. I looked at before and after profiles in sysbench. Before: - CPU usage: context.value 0.6% context.WithValue 0.3% total 0.9% - Allocs: context.WithValue: 3.5% After: - CPU usage: context.value + ctxutil.FastValue 0.5% context.WithValue 0.1% ctxutil.WithFastValue(s) 0.1% total 0.7% - Allocs: context.WithValue: 0.2% context.WithFastValue: 0.6% ctxutil.FastValuesBuilder.Finish: 1.4% total 2.2% I will investigate improving on this, perhaps with providing a pre-allocated context in codepaths where this is possible. Informs: cockroachdb#136581 Release note: None
137359: changefeedccl/kvfeed: add log for table that encountered schema change r=rharding6373 a=andyyang890 This patch adds a log documenting the table(s) that encountered schema changes and caused the kv feed to restart/exit, which will be useful for debugging. Fixes cockroachdb#136624 Closes cockroachdb#134963 Release note: None 137432: docgen: remove unused BEGIN diagram r=yuzefovich a=taroface cockroachdb#137135 added `begin_stmt` to the .bzl files for SQL diagram generation. However, this diagram is no longer used, and doesn't seem to be generated by `docgen` upon local testing. Removing `begin_stmt` and `begin_stmt.bnf` worked in cockroachdb#137111, so this PR applies the same update to `master` and backports to 24.1-3. (cc `@rickystewart)` Epic: none Release note: None Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Ryan Kuo <[email protected]>
Implicit `FOR UPDATE` locks are now added to the reads in `INSERT .. ON CONFLICT`, `UPSERT`, `UPDATE`, and `DELETE` queries when those mutations use generic query plans. Fixes cockroachdb#137352 Release note: None
137447: sql/catalog: remove isMutable flag toggle in getObjectPrefix r=mgartner a=mgartner Epic: None Release note: None Co-authored-by: Marcus Gartner <[email protected]>
137265: sql/tests: fix secondary index key in sysbenchKV.DeleteInsert r=nvanbenschoten a=nvanbenschoten `sysbenchKV.DeleteInsert` was previously re-inserting the secondary index key that it had just deleted, instead of inserting the secondary index key that corresponds to the new primary index key. To prevent this kind of bug in the future, we add some checks for secondary index integrity, where appropriate. Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Split between each table and index to ensure cross-range transactions. Epic: None Release note: None
137469: sql/tests: manually split ranges in `BenchmarkSysbench/KV` r=nvanbenschoten a=nvanbenschoten Split between each table and index to ensure cross-range transactions. Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
137440: kvserver: make TestRaftReceiveQueuesEnforceMaxLenConcurrency cheaper r=miraradeva a=sumeerbhola It was timing out, though it completes within 6s on an M1 macbook. Fixes cockroachdb#137380 Epic: none Release note: None Co-authored-by: sumeerbhola <[email protected]>
137360: opt: add implicit FOR UPDATE locks for generic query plans r=yuzefovich a=mgartner Implicit `FOR UPDATE` locks are now added to the reads in `INSERT .. ON CONFLICT`, `UPSERT`, `UPDATE`, and `DELETE` queries when those mutations use generic query plans. Fixes cockroachdb#137352 Release note: None Co-authored-by: Marcus Gartner <[email protected]>
137344: ctxutil: introduce fast values r=RaduBerinde a=RaduBerinde #### go.mod: update logtag dependency See cockroachdb/logtags#4 Epic: none Release note: None #### serverident: unexport ServerIdentificationContextKey Epic: none Release note: None #### ctxutil: introduce fast values This commit introduces a custom context type that can be used to more efficiently associate values to contexts. Instead of using arbitrary objects as keys, the keys are a small set of integers. Any key must be globally registered first, and there is a limited number that can be registered. The `fastValuesCtx` stores *all* values for these keys, so traversing the context hierarchy is not necessary. We also provide a way to add multiple values at the same time, which saves on allocations. I looked at before and after profiles in sysbench. Before: - CPU usage: context.value 0.6% context.WithValue 0.3% total 0.9% - Allocs: context.WithValue: 3.5% After: - CPU usage: context.value + ctxutil.FastValue 0.5% context.WithValue 0.1% ctxutil.WithFastValue(s) 0.1% total 0.7% - Allocs: context.WithValue: 0.2% context.WithFastValue: 0.6% ctxutil.FastValuesBuilder.Finish: 1.4% total 2.2% I will investigate improving on this, perhaps with providing a pre-allocated context in codepaths where this is possible. Informs: cockroachdb#136581 Informs: cockroachdb#135904 Release note: None Co-authored-by: Radu Berinde <[email protected]>
137455: rac2: ignore MsgApps when constructing RaftEvent in MsgAppPull mode r=pav-kv a=sumeerbhola There can be stale MsgApps even when the replica is in StateReplicate, and these are not relevant and serve only to confuse the downstream code. Fixes cockroachdb#136322 Epic: CRDB-37515 Release note: None Co-authored-by: sumeerbhola <[email protected]>
When a rangefeed's closed timestamp lags behind the current time, any writes that have occurred in-between will not be emitted. This is problematic in cases where the lag is significant and chronic, as consumers (changefeeds, logical data replication, physical cluster replication) are likewise delayed in their processing. Observing a rangefeed with a chronic lagging closed timestamp will become relatively more likely with quorum replication flow control, as entries are deliberately queued, instead of being sent, to stores which do not have sufficient send tokens. This commit (re)introduces the concept of cancelling lagging rangefeeds, so that they may be replanned and retried on another replica. The other replica may also have this issue, however there should be at least a quorum of voting replicas with a similar closed timestamp that would be suitable. The replanning on a different replica is handled already by existing machinery. This commit introduces an observer which generates a signal indicating that the rangefeed should be cancelled. The signal also encapsulates the existing logic to nudge a rangefeed as well. The criteria for cancelling a rangefeed is influenced by two thresholds, defined as cluster settings: ``` kv.rangefeed.lagging_closed_timestamp_cancel_multiple (default = 20 x closed ts target duration = 60s) ``` ``` kv.rangefeed.lagging_closed_timestamp_cancel_min_lagging_duration (default = 60s) ``` When a replica's closed timestamp has sustained lag greater than: ``` kv.rangefeed.lagging_closed_timestamp_cancel_multiple * kv.closed_timestamp.target_duration ``` For at least: ``` `kv.rangefeed.lagging_closed_timestamp_cancel_min_lagging_duration` ``` duration, the rangefeed will be cancelled and then re-planned on the client. This can be visualized in the following diagram, where there is an initial spike over the lag threshold, which is recovered from so the rangefeed wouldn't be cancelled. The second drop below the lag threshold is sustained for greater than the duration threshold, so the rangefeed is then cancelled for replanning: ``` lag=0 ───────────────────────────────────────────────────── observed lag ─────────┐ │ │ │ ┌───────┐ lag threshold ─────────┼─────┼───────┼────────────────────────────── │ │ └───┐ │ │ └─────┐ └─────┘ └──────┐ └──────────── ◄────────────────────────────► exceeds duration threshold ``` Note we could also prevent accepting a rangefeed registration if the lag were sufficient, however the behavior change here applies only to lag which as been observed to be sustained over time, without historical data, we cannot apply identical decision logic on registration. Fixes: cockroachdb#136214 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a rangefeed's closed timestamp lags behind the current time, any writes that have occurred in-between will not be emitted. This is problematic in cases where the lag is significant and chronic, as consumers (changefeeds, logical data replication, physical cluster replication) are likewise delayed in their processing. Observing a rangefeed with a chronic lagging closed timestamp will become relatively more likely with quorum replication flow control, as entries are deliberately queued, instead of being sent, to stores which do not have sufficient send tokens.
This commit (re)introduces the concept of cancelling lagging rangefeeds, so that they may be replanned and retried on another replica. The other replica may also have this issue, however there should be at least a quorum of voting replicas with a similar closed timestamp that would be suitable.
The replanning on a different replica is handled already by existing machinery. This commit introduces an observer which generates a signal indicating that the rangefeed should be cancelled. The signal also encapsulates the existing logic to nudge a rangefeed as well.
The criteria for cancelling a rangefeed is influenced by two thresholds, defined as cluster settings:
When a replica's closed timestamp has sustained lag greater than:
For at least:
duration, the rangefeed will be cancelled and then re-planned on the client. This can be visualized in the following diagram, where there is an initial spike over the lag threshold, which is recovered from so the rangefeed wouldn't be cancelled. The second drop below the lag threshold is sustained for greater than the duration threshold, so the rangefeed is then cancelled for replanning:
Note we could also prevent accepting a rangefeed registration if the lag were sufficient, however the behavior change here applies only to lag which as been observed to be sustained over time, without historical data, we cannot apply identical decision logic on registration.
Fixes: cockroachdb#136214
Release note: None