-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: add leader fortification assertions #125352
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
A-leader-leases
Related to the introduction of leader leases
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
arulajmani
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-replication
Relating to Raft, consensus, and coordination.
T-kv
KV Team
A-leader-leases
Related to the introduction of leader leases
labels
Jun 7, 2024
cc @cockroachdb/replication |
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Aug 16, 2024
This patch adds conceptual de-fortification to raft. De-fortification amounts to resetting the leadEpoch or simply forgetting fortification support that was previously provided. Currently, there are two ways a peer may de-fortify: 1. If it learns about a new leader at a higher term. 2. If the leader transfers leadership to it. By adding this explicit de-fortification, we're able to then assert that a peer only ever bumps its term if its no longer supporting a fortified leader. A bunch of datadriven tests that were manually campaigning need to be modified with this new assertion in place. Informs cockroachdb#125351 Informs cockroachdb#125352 Release note: None
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Aug 21, 2024
This patch adds conceptual de-fortification to raft. De-fortification amounts to resetting the leadEpoch or simply forgetting fortification support that was previously provided. Currently, there are two ways a peer may de-fortify: 1. If it learns about a new leader at a higher term. 2. If the leader transfers leadership to it. By adding this explicit de-fortification, we're able to then assert that a peer only ever bumps its term if its no longer supporting a fortified leader. A bunch of datadriven tests that were manually campaigning need to be modified with this new assertion in place. Informs cockroachdb#125351 Informs cockroachdb#125352 Release note: None
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Aug 22, 2024
This patch adds conceptual de-fortification to raft. De-fortification amounts to resetting the leadEpoch or simply forgetting fortification support that was previously provided. Currently, there are two ways a peer may de-fortify: 1. If it learns about a new leader at a higher term. 2. If the leader transfers leadership to it. By adding this explicit de-fortification, we're able to then assert that a peer only ever bumps its term if its no longer supporting a fortified leader. A bunch of datadriven tests that were manually campaigning need to be modified with this new assertion in place. Informs cockroachdb#125351 Informs cockroachdb#125352 Release note: None
hanpen24
added a commit
to hanpen24/cockroach
that referenced
this issue
Sep 7, 2024
There was a bug where comments did not show up in the output of the show create all schemas command. The issue was that we did not have any code to get the comment and add it to the create schema descriptor. Fixes: #127979 Release note (bug fix): SHOW CREATE ALL SCHEMAS now shows corresponding schema comments in it's output. insights: ensure releasing to Insight pool clears slice Ensure that when we return objects into the Insight pool that we release the statmeents in the slice. This fixes an issue in the logic that returned this object to the pool which did not nil the `Statements` slice, making it possible for these slices to grow in capacity in the node's lifetime, holding onto garbage Statement objects. This is a lead up to #128199 which will likely remove this pool and reuse the existing statmentBuf pool. Epic: none Release note: None insights: add testing knobs Add insights specific testing knobs. We'll add some knobs in later commts. Epic: none Release note: None sqlstats/insights: free memory allocated per session on session close The insights locking registry buffers statement insight objects by session id until receiving a transaction insight, when the buffer is emptied. These buffers can leak if the session is closed midway through a transaction since the registry will never receive a transaction event for the session. This commit ensures we clear any memory allocated in insights for a session by sending an event to clear the container's session entry, if it exists, on session close. A testing knob was added, OnCloseSession, which is called when the locking registry clears a session. Epic: none Fixes: #128213 Release note (bug fix): Fixes a memory leak where statement insight objects may leak if the session is closed without the transaction finishing. insights: move insights testing knobs from sqlstats Move some insights testing knobs that were on the sqlstats testing knobs to insights pkg. Epic: none Release note: None cli: update cluster tag in tsdump upload Previously, we are uploading tsdump data to datadog with cluster name with tag "cluster". However, it would coincide with metrics emitted with "cluster" as tag. This was resulting in a large drop down list to filter out cluster in datadog dashboard. This change updates tag from "cluster" to "cluster_label" with tsdump datadog upload. This would result in smaller cluster list in datadog dashboard. Epic: None Release note: None sqlliveness: handle AmbiguousResultError gracefully on insert Previously, when inserting a new SQL liveness session, we extended the expiry between each InitPut to ensure that an expired session was never inserted. This worked well, but if an ambiguous result error was encountered, the row could already exist in the table with the old timestamp leading to a condition failed error. To address this, this patch will also assign a new session ID if we encounter an ambiguous result error. This will ensure that we do not collide with the old session ID. Fixes: #127301 Release note (bug fix): Starting up nodes could fail with: "could not insert session ...: unexpected value", if an ambigous result error was hit inserting into the sqlliveness table. roachtest: mixedversion documentation updates * comments were added to indicate that callers should not close the connection pools returned by helper functions. * documentation for `InMixedVersion` and `AfterUpgradeFinalized` were updated to be more explicit about when the hooks passed to those functions can be called. Epic: none Release note: None roachtest: include step ID in failures to get debug information This commit updates the `mixedversion` framework so that we include the step ID we were preparing to run when an error is found gathering debug information (current binary and cluster versions on the nodes). This makes it a little easier to understand these errors and at what point in th test they happened. Epic: none Release note: None logictest: ignore "Can't find decompressor for snappy" error This error appears to be a rare infrastructural flake. Since it's hard to investigate, we will ignore it rather than trying to track down the cause. Release note: None roachtest: deflake `admission-control/disk-bandwidth-limiter` on azure We tried fixing it earlier but turns out the roachtest mount config is different than the roachprod. Fixes #129022. Release note: None workload/schemachange: be more lenient for DROP DEFAULT error code Since 5eedebc529adedc33b93ca5146aed82479a9c40e was backported to older branches, but does not yet appear in a release, the workload should be able to handle _not_ seeing an error code for the case of dropping a column default for a computed column. Release note: None admission,kvserver: plumbing for RACv2 callbacks The plumbing changes allow for both RACv1 and RACv2 info to flow through admission control queues and be provided in the callback. The admittedLogEntryAdaptor demuxes to the relevant callback handler. kvserver.storesForRACv2 will further route to the relevant Replica -- this code will be added once Replica has a replica_rac2.Processor member. Informs #128309 Epic: CRDB-37515 Release note: None build: update errcheck Update kisielk/errcheck package. Epic: none Release note: None workflows: enable CI for GitHub merge queue builds Epic: none Release note: None Part of: DEVINF-1127 backupccl: remove not null flag from SQLInstanceID field in SSP spec This flag is a noop for repeated fields and was producing a warning during proto generation. Release note: none kvcoord: safe format replica slice Introduce a `SafeFormat` method on `ReplicaSlice`, which can be used to print out the replica slice in logging. Informs: #129031 Release note: None kvcoord: log replica order on nearest routing policy Previously, we would log an event when routing to the nearest replica, which would then show up in traces and could be used to determine the routing policy of a query. It is also useful to know how the nearest replica was selected i.e., latency, locality or by health. Log the event after sorting by the replicas' distance to the sender and include the replica order. Informs: #129031 Release note: None sql: don't allow VOID parameters for UDFs Release note (bug fix): Function input parameters can no longer be of the VOID type. roachtest: upgrade hibernate version under test This fixes an issue with a missing dependency as well. See https://github.com/hibernate/hibernate-orm/commit/e4a0b6988f84e85e427484e65321e9583080ccb5 Release note: None crosscluster/logical: add fallback to SQL to kv writer Release note: none. Epic: none. kvserver: misc partial wiring for RACv2 This adds a bunch of small wiring changes: - raftScheduler: is passed a non-nil PiggybackMsgReader. - Scheduling piggybacked responses: raftScheduler accepts a PiggybackedAdmittedResponseScheduler to process incoming piggybacked responses, which is implemented by storesForRACv2, since it needs to route to the relevant Replica. storesForRACv2 also enqueues a range for processing if it received a piggybacked response. This is done via a small change to raftScheduler to add EnqueueRACv2PiggybackAdmitted which are processed via processRACv2PiggybackedAdmitted, which is plumbed through Store to the Replica to replica_rac2.Processor. - replicaForRACv2 is trivially implemented by Replica. - raftNodeForRACv2 provides a very incomplete implementation of RaftNode, with todos. - The todo to implement storesForRACv2.AdmittedLogEntry is done, and routes to the Processor in Replica. - Replica has two new members, raftMu.flowControlLevel, and flowControlV2, which is the Processor implementation. - Except for eval integration, all methods of Processor are integrated. The eval integration will be a later PR. Informs #129129 Fixes #128309 Informs #128756 Epic: CRDB-37515 Release note: None dev: add temporary directory in sandbox for `compose` Since `--[no]incompatible_sandbox_hermetic_tmp`'s default flipped to `true` in Bazel 7, `compose` tests broke as we stage the built artifacts in a temporary directory. If this wasn't set in `.bazelrc.user`, it needs to be added here or else the test will fail. Closes: #128227 Epic: CRDB-17171 Release note: None rac2: implement wait for eval helper `WaitForEval` (inside `token_counter.go`) will be used to implement the `RangeController.WaitForEval(..)`, providing the ability to wait for: 1. A quorum of streams to have available tokens 2. All required streams to have available tokens Note usage of this helper function is deferred to a later commit, along with support for waiting on send stream state as an added condition. Informs: #128910 Release note: None logictest: deflake cross_version_tenant_backup This makes the test less flaky by making sure finalization is complete before beginning the RESTORE. Release note: None master: Update pkg/testutils/release/cockroach_releases.yaml Update pkg/testutils/release/cockroach_releases.yaml with recent values. Epic: None Release note: None Release justification: test-only updates kvserver/rangefeed: change p.Register to use registration interface Previously, we introduced a registration interface to abstract the implementation details of buffered and unbuffered registration. This patch updates the p.Register function to use this interface as well. Part of: #126560 Release note: none kvserver/rangefeed: move perRangeEventSink to rangefeed pacakge Previously, perRangeEventSink was defined in pkg/server/node.go. This patch relocates it to the rangefeed package to facilitate future commits testing within the rangefeed package itself. Part of: #126560 Release note: none roachprod: reuse start script if start is called with --restart=true Previously, irrespective of the restart flag value we ended up overwriting the startup script which led to defaulting the values of other flags passed previously to the start command. This was inadequate because when restarting, i.e., `restart=true`, it makes more sense that we would reuse the existing startup script by default. This patch changes this behaviour. Epic: none Fixes: #125085 Release note: None roachtest: use spot VM for all tests Currently, we are using spot VMs only to run benchmark tests. The reason for the low adoption of spot VM is because of the test failures due to VM preemption. But, with the recent changes, we have better control on VM preemption changes where if there is a test failure due to preemption, the test is run on an on-demand VM the next time. Also, the current failed tests that are run on spot VM is 148 out of 1093 = 13.5%. If I consider failure out of total tests run is 148 out of 2508 = 5.9%. So, this PR removes the condition to use spot VMs only for benchmark tests and changes the probability to 75%. Fixes: #127236 Epic: None ui: display very large ranges in problematic ranges This commit adds a column in the DB Console's page problematic ranges that displays the ranges that are too large. The threshold is set to be: 8 * the current range max size. Fixes: #127843 Epic: None Release note (ui change): Adding too large ranges to the problematic ranges page in DB Console. If a range is larger than twice the max range size, it will show up in the problematic ranges page. roachtest: multi-region/mixed-version support for shared-process We enable required features on tenants before setting up the TPCC workload on the cluster. Informs: #127378 Release note: None jobs: avoid CTEs in crdb_internal.system_jobs query The CTE in the query used for crdb_internal.system_jobs can prevent a number of useful query optimizations. Informs #122687 Release note (performance improvement): Further improves the performance of job-system related queries. jobs: further simplify crdb_internal.system_jobs query Epic: None Release note: None roachtest: update rebalance-load mixed-version tests for shared-process In this commit, we update the `rebalance/by-load` mixed-version tests so that they can run on shared-process deployments. As usual, we need to enable some features on tenants before initializing the workload. In addition, we pass the virtual cluster cookie when fetching system metrics. Informs: #127378 Release note: None lease: fix goroutine starvation caused by test cluster setting When a test overrides sql.catalog.descriptor_lease_duration to 0, then it was possible for the test to starve itself by creating a timer with zero duration that is repeatedly reset. The bug occurred since after refreshTimer expired, it would call the jitteredLeaseDuration helper to get the new duration for the timer. If a test had overriden sql.catalog.descriptor_lease_duration to 0, this would cause the timer to keep being reset to 0 and starve other goroutines. Release note: None kvserver: deflake TestRangefeedCheckpointsRecoverFromLeaseExpiration This commit does two things to deflake the test: 1) Takes the MinExpiration into consideration when advancing the clock to cause the epoch lease to expire. This is important after the feature was enabled in: ba758c08023949ad4be87bab1762db10fdaa4f0e 2) Waits for N1's view of N2's lease expiration to match N2's view. This is important in the rare case where N1 tries to increase N2's epoch, but it has a stale view of the lease expiration time. Fixes: #124178, #123551 Epic: None Release note: None go.mod: bump Pebble to 3d14906a0e0c Changes: * [`3d14906a`](https://github.com/cockroachdb/pebble/commit/3d14906a) wal: fix reader buffer reuse * [`c2749cc7`](https://github.com/cockroachdb/pebble/commit/c2749cc7) colblk: use crlib/crbytes.CommonPrefix * [`f90b350e`](https://github.com/cockroachdb/pebble/commit/f90b350e) docs/rfc: add RFC for point tombstone density compaction heuristic * [`1a5295f8`](https://github.com/cockroachdb/pebble/commit/1a5295f8) block: move block handle encoding and decoding * [`0e8a11a5`](https://github.com/cockroachdb/pebble/commit/0e8a11a5) metamorphic: no synthetic suffix with overlapping RangeKeySets Release note (bug fix): Fix bug in the public preview WAL failover feature that could prevent a node from starting if it crashed during a failover. Epic: none. kv: fix comment on QueryTxnResponse.QueriedTxn This commit fixes a comment in the QueryTxnResponse.QueriedTxn field. It also addresses a related TODO. Epic: None Release note: None crosscluster, sql: implement SHOW LOGICAL REPLICATION JOBS Implementation of `SHOW LOGICAL REPLICATION JOBS`. An LDR job description column will be added in a follow-up PR. Rows returned by `SHOW LOGICAL REPLICATION JOBS` should include the following columns: - job_id: ID of an LDR job - status: job status - targets: names of tables being replicated in an LDR job - replicated_time: ingestion frontier timestamp An optional clause `WITH DETAILS` can be included in the query. If `WITH DETAILS` option is included, there should be an additional column in the rows returned by the query: - replication_start_time: initial timestamp from which the replication producer job will begin streaming MVCC revisions Epic: CRDB-40869 Release note: None changefeedccl: add more logging around core changefeed completion Release note: None ui: remove unnecessary metric name prefixes Metrics keyed by nodeID and prefixed with long strings become very difficult to read on the dashboard pages since they take up too much horizontal space. In all of these cases, the context provided by the graph title and Y axis label provide enough context to omit verbose legend labels. Resolves: #128046 Release note (ui change): some metric charts on the overview and replication pages have more terse legends to facilitate easier browsing. roachprod: azure GC supports cleanup for multiple subscriptions Previously Azure GC assumed that only one subscription ID would be used at a time. However the current structure of our Azure subscriptions makes a distinction between user "adhoc" roachprod clusters and nightly "infra" clusters. As a result, roachprod clusters can live in two different subscriptions. This change adds the azure-subscription-names flag to support cleaning up in multiple Azure subscriptions. Additionally it also switches the default subscription to e2e-adhoc, which is what users should use. kvclient: improve logging in OptimizeReplicaOrder Epic: none Release note: None changefeedccl: enable vmodule for TestAlterChangefeedAddTargetFamily This patch enables verbose logging (vmodule `helpers_test=1`) for `TestAlterChangefeedAddTargetFamily` so that we can get more information about what, if any, messages that the changefeed sees before the test times out. Release note: None roachprod: gcloud mig template per zone Previously, one instance template, for a managed instance group cluster, was created and reused for every zone a managed instance group cluster spanned over. In this change, the logic is updated to create a different template for each zone. This is useful to extend template flexibility to support per zone configurations. The delete cluster logic now tracks instance templates by the cluster label instead of by name. The GC logic has also been updated to account for multiple instance templates that point to the same cluster, in the case of an empty cluster. In this case we'll only add one "empty" VM to destroy the empty cluster. Epic: None Release note: None roachprod: per zone spot instance config Previously, if managed instance group clusters opted to use spot instances the whole cluster would be on spot instances. This change allows a new flag to be passed to specify which zones should use spot instances. If this flag is passed it is expected that the flag enabling spot instances for the whole cluster should not be passed (results in an error), so that only the select zones use spot instances. Epic: None Release note: None crosscluster/logical: set admission header on kv writes Release note: none. Epic: none. kv/bulk: check gc threshold in TestDuplicateHandling TestDuplicateHandling failed recently because an unexpected number of keys were read. This _may_ have been caused by the gc threshold advancing past old revisions of certain test keys. This patch ensures the test will fail on a gc error instead. If the test fails again due to an unexpected number keys read, the gc story does not explain how this incorrect read could occur. Fixes #126929 Epic: none crosscluster/logical: set the origin ts in kv writer Release note: none. Epic: none. Epic: None Release note: None server: add latest telemetry contact timestamp The `Reporter` struct in the `diagnostics` package now maintains a `LastSuccessfulTelemetryPing` field that's updated with the value of `timeutil.Now()` after every response from the telemetry cluster. This time is updated regardless of the status code of the response. Any error from the HTTP client (this includes timeouts, DNS errors, etc.) will skip updating the timestamp. Resolves: CRDB-41231 Epic: CRDB-40209 Release note: None roachtest: enable assertions in nightlies without metamorphic constants In #114618 we disabled runtime assertions and metamorphic constants due to their flakiness. Here we reenable them but with metamorphic constants disabled, as they are the more disruptive of the two. We also change the framework to take in a --metamorphic-cockroach-ea-probability instead of hardcoding it. roachtest: route runtime assertion build timeouts to test-eng As we determine the stability of runtime assertions, temporarily route all timeout failures to test-eng to reduce noise for other teams. raft: Set MustSync to true if the Commit index changes This commit sets the MustSync field to true if the Commit index changes. This won't change the current behaviour where we sync the HardState fields if the MsgStorageAppend has any HardState fields set. However, this commit sets the MustSync to stay consistent with the other fields. Future work could be done to remove the MustSync field all together. Syncing the commit index shouldn't cause too much overhead because in the normal case, advancing the commit index on a follower is piggybacked on top of sending entries to the follower (which requires syncing anyways). References: #125266 Epic: None Release note: None workflows: update trigger for merge queue event The trigger for the merge queue events for CI was not set up properly. This is the proper way it should be done according to the documentation. Epic: none Part of: DEVINF-1127 Release note: None Release justification: Non-production code changes crosscluster: use src table ID to look up table names Changes in this PR: - Use source table IDs as keys for the map we use to look up fully qualified table names. This fixes the bug where we use destination table IDs as keys and attempt to look up table names with source table IDs. - Use destination table ID to create dlq table names - Minor refactoring Epic: none Release note: none ui: nits to LDR metrics page - A few graph renames - Add tooltip to Replication Lag graph Epic: none Release note: none logical: Optionally filter rangefeed events to avoid replicating TTL deletes Some of our users want the ability to avoid replicating TTL deletes from one side of LDR to the other. This commit adds that ability to filter rangefeed events when the bit is set on the other side. For testing, ttljob_test.go does a lot of job scheduling wrangling to get the row TTL job to run, and that feels unecessary for a unit test on this. This code would benefit from a roachtest that explictly tests the TTL delete functionality with admission control loosened Release note: none Epic: CRDB-41223 roachprod: rename `MaxWait` stop option to `GracePeriod` This better mirrors the terminology we use in our documentation: https://www.cockroachlabs.com/docs/stable/node-shutdown#termination-grace-period Epic: none Release note: None roachprod: stop forcefully after grace period This changes the `Stop` method on roachprod to stop the cockroach process forcefully if the process does not terminate after the provided `gracePeriod`. This brings back the behaviour old behaviour of `StopCockroachGracefullyOnNode` without the need for a custom function for it. We adopt this behaviour because it is the documented procedure on the public documentation, and it's also how process managers/orchestrators (such as Kubernetes) operate. Informs: #129103 Informs: #121455 Release note: None roachtest: update pg_regress diffs Fixes #127556 Release note: None fs,kvserver: fix categorized disk stats metric This patch fixes a bug where the `vfs.DiskWriteStatsCollector` was incorrectly being passed into the disk health check FS in pebble. Fixes #129159. Release note: None sql: remove computed column logic from heuristic planner Vestigial logic in the `UPDATE` code path from the days of the heuristic query planner has been removed. As an added bonus to simplifying the code, this should reduce some allocations. Release note: None sql: rename `enforceLocalColumnConstraints` The `enforceLocalColumnConstraints` function has been renamed to `enforceNotNullConstraints` to clarify its behavior. Release note: None sql: remove column mapping during `UPDATE` New values to write during an `UPDATE` always come directly after the fetch values in the source row, i.e., the input row of an update operator, and they are ordered the same as `ru.UpdateCols`. Therefore, there is no need to map the source values. Instead, the update values can simply be sliced from the source row. This commit removes this unnecessary mapping logic and support data structurres, including the `updateValues` buffer in `updateRun`, `sourceSlot`, and `scalarSlot`. This simplifies the implementation and will reduce allocations. Release note: None server: make telemetry timestamp atomic Since the telemetry send timestamp will be accessed concurrently, we need to make it atomic. Part of: CRDB-41231 Epic: CRDB-40209 Release note: None rac2: implement wait for eval Implement `RangeController.WaitForEval(..)`, which will be used to wait on a configurable number of streams to have available tokens. Note that the functionality is currently incomplete without replica state management, done in `HandleRaftReady`. Resolves: #128910 Release note: None sql: deflake TestConcurrentSchemaChanges Previously, TestConcurrentSchemaChanges was intermittently failing because the connection pool was not correctly being used. Some of operations like selecting if the declarative or legacy schema changer should be used were incorrectly being run on the main connection. To address this, this patch uses the dedicated connections per-thread. Fixes: #129220 Release note: None raft: do not forget the leader within the same term Addresses a TODO. This wasn't causing any issues just yet, but having this safety is nice as the code evolves, given we're relying on it for leader fortification. Epic: none Release note: None raft: reset lead epoch whenever the term is bumped LeadEpoch is only relavent within a term. We should forget the lead epoch in addition to the leader when the term is bumped, as the lead epoch isn't meaningful without an associated leader. Epic: none Release note: None raft: add conceptual de-fortification and some assertions This patch adds conceptual de-fortification to raft. De-fortification amounts to resetting the leadEpoch or simply forgetting fortification support that was previously provided. Currently, there are two ways a peer may de-fortify: 1. If it learns about a new leader at a higher term. 2. If the leader transfers leadership to it. By adding this explicit de-fortification, we're able to then assert that a peer only ever bumps its term if its no longer supporting a fortified leader. A bunch of datadriven tests that were manually campaigning need to be modified with this new assertion in place. Informs https://github.com/cockroachdb/cockroach/issues/125351 Informs https://github.com/cockroachdb/cockroach/issues/125352 Release note: None roachtest: extract draining into a function Previously if the test failed between the start of draining and the stopping of the node, it could leave the cluster with the iptables rule in place. This caused problems for rerunning the test. This change extracts the drainWithIpTables function out and cleans up the iptables rule in a defer. Epic: none Release note: None roachtest: close the db connections Close the connections to avoid a leak on success. Epic: none Release note: None roachtest: don't run SQL against the draining node After #126506 the test incorrectly included the draining node as part of the nodes the workload were run against. This caused the statistics to be off by a factor of 5/6. Note there is a separate issue related to the QPS dropping as part of the drain which is a different issue that is masked by this issue. Epic: none Fixes: #129027 Fixes: #128439 Fixes: #128168 Fixes: #126986 Fixes: #125731 Release note: None roachtest: disable the gossip limit for test This test will fail without it. Figure out why.. Epic: none Release note: None util: add connection metric tracking utility Adds a utility to cidrLookup to created a DialContext that is tracked based on the lookup. This works for any third-party libraries that expose a way to set the DialContext. Epic: none Release note: None backupccl: break read and write metrics by bucket Previously the read and write metrics were aggregate across all containers or buckets. In some situations it is useful to break the statistics out to detemine how much data is going to each. This PR allows breaking the data out for the three main clouds. Epic: CRDB-41142 Release note: None crosscluster/logical: rename kv fallback metric Release note: none. Epic: none. crosscluster/logical: only use kv writes for inserts See comment inline. Release note: none. Epic: none. system: add new table_metadata system table Adds a new system table, `table_metadata` for holding table level statatistics and metadata. This table is intended to be used as a cache and will be periodically updated via a new job. Epic: CRDB-37558 Release note: None roachprod: update roachprod-gc.yaml with azure-subscription-names flag This enables roachprod to clean up multiple subscriptions. roachtest: skip multitenant/distsql Informs #128366 Release note: None roachtest: skip running `admission-control/disk-bandwidth-limiter` azure This test is flaky on Azure due to different disk mounting behavior. Previous attempts at deflaking have not helped. This patch will skip running this test on the azure environment. Fixes #129300. Release note: None kvserver: introduce a hard cap on range sizes This patch introduces a new cluster setting called `kv.range.range_size_hard_cap`. It's configured in bytes and places an absolute value cap on how large we'll allow ranges to get before backpressuring writes. We default it to 8GB. This setting is inteded to be the last defense against absurdly large ranges. In particular, it guards against risks where backpressure is otherwise disabled due to other cluster settings. See `kv.range.backpressure_byte_tolerance` for an explanation of how this is possible. Informs https://github.com/cockroachdb/cockroach/issues/128082 Release note (ops change): this patch introduces a new cluster setting, `kv.range.range_size_hard_cap`, which provides a hard cap on how large we'll let a range get before applying backpressure. It's defaulted to 8 GB, which is 16 times the default max range size. Users who have manually changed the max range size >= 8 GB may either have to increase this cap or reduce their range size. roachtest: update `tpcc/mixed-headroom` for shared-process This commit updates the TPCC mixed-version test so that it works on shared-process deployments. As usual, we enable split/scatter on tenants before initializing the workloads. Epic: none Release note: None sql/logictest: fix incorrect operator in test This is a follow-up to #129013 which changes the incorrectly used similarity operator, `%`, to the regex match operator, `~`. Release note: None sql: support automatic partial statistic collections We previously triggered full stat refreshes when approximately 20% of a table's rows were changed. This commit introduces additional partial statistic 'at extremes' refreshes that occur automatically in new `TypeAutoCreatePartialStats` jobs when roughly 5% of rows change. Various new cluster and table settings to configure the frequency of auto partial stats collections have been introduced, which mimic existing settings for automatic full stats collection (see release note for details). Table settings take precedence over their corresponding cluster settings if set. Partial stats will never be collected automatically in cases where automatic full stats wouldn't happen, such as when auto full stat collection settings are disabled or tables on which auto stats collections aren't allowed. Automatic partial stat refreshes follow a probability-based model similar to automatic full stats to approximate the desired refresh frequency. We previously decided to trigger a full stat refresh with a probability proportional to the number of rows affected by mutations on a table. This has been extended to make a second decision on whether to trigger a partial stat refresh with a higher likelihood whenever we decide not to trigger a full refresh. Unlike auto full stats, multiple `TypeAutoCreatePartialStats` jobs can run at once in a cluster, as long as they are on different tables. We don't start these jobs if any manual stat collection job (full or partial), any auto full stat collection job, or an auto partial stat collection job on the same table is running. We also don't stop any new stats jobs from running if there is an auto partial stats job running in the cluster (except for new auto partial stats jobs on the same table as one that is running). Epic: CRDB-19449 Release note (sql change): Partial statistics can now be automatically collected at the extremes of indexes when a certain fraction and minimum number of rows are stale (5% and 100 by default). These can be configured with new table and cluster settings, and the feature is disabled by default. The new cluster and table settings are: - `sql.stats.automatic_partial_collection .enabled`/`sql_stats_automatic_partial_collection_enabled` - Defaults to `false` - `sql.stats.automatic_partial_collection .min_stale_rows`/`sql_stats_automatic_partial_collection_min_stale_rows` - Defaults to `100` - `sql.stats.automatic_partial_collection .fraction_stale_rows `/`sql_stats_automatic_partial_collection_fraction_stale_rows` - Defaults to `0.05` Co-authored-by: Faizaan Madhani <[email protected]> scpb: modify IndexZoneConfig element This commit modifies the IndexZoneConfig element that will become relevant for supporting `ALTER INDEX ... CONFIGURE ZONE` in the DSC. Release note: None schemachanger/scexec: add support for writing subzone configs This patch adds support for writing subzone configs. Informs: #117574 Fixes: #128594 Release note: None schemachanger: add idx zone config opgen rule This patch adds an opgen rule for idx zone configs in the ADD direction. Informs: #117574 Release note: None sql: add syntax for supporting security definer This patch adds the syntax needed for supporting security definer. Right now, if a routine is specified with `[EXTERNAL] SECURITY {INVOKER | DEFINER}`, it is a no-op. If the security mode is not specified, the default is `SECURITY INVOKER`. This also means that syntax for altering a function supports the above options as well: ``` alter function f(int) external security definer; ``` Fixes: #128844 Epic: CRDB-37477 Release note: None server: allow VIEWACTIVITY users access to /_admin/v1/stmtbundle This commit gives users with the VIEWACTIVITY system grant access to `/_admin/v1/stmtbundle` on the admin server. This allows these users to download stmt bundles via DB console. Epic: none Part of: #121301 Release note (ui change): Users with VIEWACTIVITY can download stmt bundles from DB console. crosscluster/logical: reduce log spam from replanning Epic: none Release note: None ui: use status server apis for stmt bundle ops Since 23.1 we've been using the sql over http api to view, request and cancel stmt bundles for fingerprints. This caused a regression in the sql permissions required to use these features. Prior to using sql over http we only required `VIEWACTIVITY` or `VIEWACTIVITYREDACTED` for these operations. Now `VIEWSYSTEMTABLE` is also required since the sql over http api direclty reads from system.statement_diagnostics_requests as the db console user. Instead of creating a view and builtins on top of the system table, let's simply revert to the existing http apis on the status server that are already properly gated. This change reintroduces the following APIs to DB Console - `POST /_status/stmtdiagreports` to request statement bundles - `GET /_status/stmtdiagreports` to list statement bundles - `POST /_status/stmtdiagreports/cancel` to cancel statement bundles Limitations: This PR does not carry over the added behaviour in 23.1 where we prevent users from making multiple statement bundle reqs for a particular fingerprint. This is not really a common case in the UI since we show 'Waiting' in the overview page for the row when a stmt bundle is in progress for a fingerprint, which prevents another request for the same fingerprint. Epic: none Fixes: #121301 Release note (bug fix): Users with VIEWACTIVITY sql priv can now request, view and cancel stmt bundles in DB console. ui/e2e: add e2e testing for statement bundles This commit adds some e2e cypress tests for statement bundle features in DB Console: * add tests for requesting, cancelling and viewing bundles for privileged (VIEWACTIVITY, ADMIN) and non-privileged users * reformat cypress login command to use a new `users` fixture that defines users by sql privilege * add cypress command getUserWithExactPrivileges which retrieves users from the fixture with the provided privileges Release note: None Part of: #121301 crosscluster/logical: wait for row processors to exit before closing them Release note: none. Epic: none. opt: remove documentation of old flags In #85935 we introduced the `set` flag to set session settings in optimizer tests and a few flags that were already controlling session settings were removed. The documentation of those old flags has been removed. Release note: None opt: disable `GenerateParameterizedJoin` when forcing custom plans The exploration rule `GenerateParameterizedJoin` is now disabled when the `plan_cache_mode` session setting is set to `force_custom_plan`. This prevents possible regressions in plans for this mode in the rare case that a stable function is not folded during normalization (due to an error encountered during folding). In most cases, the check for placeholders and stable functions in `GenerateParameterizedJoin` is sufficient for preventing the rule from firing when a generic query plan is not being built—before optimizing a custom plan placeholders are always replaced with constants and stable functions are usually folded to constants. Release note: None sql: fix minor typo in plan_opt.go Release note: None Revert "sql: override StmtTimeout to 0 for background jobs" This reverts commit 178e453d336e9385a26e2cae75c9de912d98ea5a. sql: override StmtTimeout to 0 for sessionless IE For queries issued by a sessionless internal executor, we inherit values from cluster settings. This means that setting a cluster-wide statement timeout will impact processes that use a sessionless internal executor: background jobs, db console, cloud sql shell, etc. This behavior doesn't make sense in most cases; so, this patch will override the statement timeout to 0s (i.e. none). For the cloud sql shell, this change makes sense as there is a default payload request timeout applied to these queries (5s). Fixes: #126261 Release note (bug fix): Internally issued queries that are not initiated within a SQL session no longer respect a statement timeout. This includes: background jobs, queries issued by the db console that perform introspection, and the cloud sql shell. crosscluster: add conflictResolutionType column This PR adds a column named `conflict_resolution_type` to rows output by the `SHOW LOGICAL REPLICATION JOBS WITH DETAILS` command. `conflict_resolution_type` can be either one of the following string values: - LWW - UDF - DLQ Epic: CRDB-40869 Fixes: #128960 Release note: None kvserver: implement basic raft info methods for RACv2 Epic: none Release note: none tpcc/large-schema-benchmark: increase txn retry limit Previously, this roachtest could intermittently fail due to retry errors on the system.descriptors table during import. To address this, this patch bumps up the txn retry limit to stabilize the test. Fixes: #128251, #129170 Release note: None scripts/ldr: use max-rate on workload Release note: none. Epic: none. build: update golang.org/x/tools and fix some lint failures Update `golang.org/x/tools` and fix new `lintonbuild` failures where we were passing an arbitrary string as a format string. Epic: none Release note: None logical: skip TestLogicalJobResiliency under deadlock This test is having issues with LDR under deadlock, probably due to scattering the table prior to setting up LDR. The other unit tests are fine due to deadlock because they scatter after creating the stream, but that's not possible with this test. Release note: none Fixes: #128972 rangefeed: move rangefeed test helpers into separate file This PR moves some rangefeed test helpers into a separate file so some can be exported. Epic: None Release notes: None rangefeed: updates to rangefeed test helpers Creates a new, exported test helper method GenerateRandomizedSpans that can generate N non-overlapping spans. Also exports GenerateRandomizedTs for use in other tests, and provides an argument to provide the maximum number of nanoseconds the timestamp should have. Epic: None Release notes: None cli: remove accidental checked in file Previously, testDump.raw file was committed as part of #127822. This change removes unwanted checked in file. Epic: None Release note: None roachtest: actually run backup-restore/mixed-version in shared-process An egregious oversight. Epic: none Release note: None crosscluster/logical: use UDF as an "applier" Previously, we were using the UDF as a fallback when an insert hit a conflict. We've subsequently decided that we think running the UDF on every proposed row is more useful. Fixes #129036 Release note: none changefeedccl: minor code move in coreChangefeed Release note: None roachprod: fix cpu reuse check when using workload nodes The reuse cluster check did not account for tests that use the workload node spec, which may have less cpus than the rest of the cluster. Fixes: #129440 Release note: none crosscluster/logical: cancel producer job when consumer cancels Previously, we were leaving the producer job in place when the consumer job was canceled. The producer job would time out after 24 hours. However, until then, the PTS record on the source would remain. In some ways this is nice as it allows the user to resume the stream from a cursor in the case where they have the need to cancel and resume the job for some reason. However, our goal is that there are no such cases (the user should simply need to PAUSE/ALTER/RESUME in such cases). Release note: None Epic: none crosscluster/logical: cleanup logging in tests Nearly every caller of this function was logging before calling it. Here we move that log statement inside the helper function. Epic: none Release note: None crosscluster/logical: cleanup producer job lookup in tests The LDR job has a reference to the producer job, so we can look it up directly rather than relying on listing the jobs in the source cluster. Epic: none Release note: None crosscluster/logical: add mode option for kv vs sql writer Release note: none. Epic: none. crosscluster/logical: skip TestLogicalStreamIngestionJobWithFallbackUDF Epic: none Release note: None changefeedccl: only update pts if behind checkpoint A changefeed updates its PTS if a checkpoint happens and the time interval changefeed.protect_timestamp_interval has passed. However, whenever the changefeed restarts the interval is lost. Previously, we would conservatively update the PTS, but this could lead to overwhelming the PTS and other system tables. This PR introduces a cluster setting `changefeed.protect_timestamp.lag`, which controls how much the changefeed PTS should lag behind the high watermark. Before updating the PTS, the changefeed reads the PTS record and if the old PTS is within `changefeed.protect_timestamp.lag` it skips updating the PTS record. If more than `changefeed.protect_timestamp.lag` has passed between the old PTS and the high watermark, the PTS is updated. Epic: None Fixes: #129509 Release note (enterprise change): Only update the PTS if `changefeed.protect_timestamp.lag` has passed between the last PTS and the changefeed high watermark. sql: deprecate enforce_home_region_follower_reads_enabled Informs: #113765 Release note (sql change): Session variable `enforce_home_region_follower_reads_enabled` is now deprecated, and will be removed in a future release. (Note that related session variable `enforce_home_region` is _not_ deprecated.) kvserver: fix the debugging message for large range Previously the debug message was printing the min size rather than the max size. Epic: none Release note: None roachtest: allow large max range size After the introduction of #129450, there is a default max range size of 8GiB. This test explicitly attempts to set the ranges to 10GiB and fails during setup. This commit increases the max range size for this test. Fixes: #129556 Fixes: #129552 Fixes: #129551 Fixes: #129544 Fixes: #129542 Fixes: #129540 Fixes: #129541 Fixes: #129539 Epic: none Release note: None storage: add additional trace statements for GC; fix counting Currently, when performing a GC of point or range keys, we log a trace statement at completion of the method. If an error is encountered, it would be useful to surface it. Update the point and range key GC methods to print any error encountered during the GC. Improve the trace statements for both the point and range key methods, indicating the number of incoming point / range keys, and the number of resulting point / range keys that were removed, along with the time taken to perform the GC. Add additional trace statements for the range key method, logging the range key fragments that were removed. Fix the `count` in `MVCCGarbageCollectRangeKeys`, capturing the count of range key fragments that were GC'd. Fix #129419. Release note: None. kvserver: add rangekey count to mvccc gc trace statement An batch pertaining to a `GCRequest` can contain zero or more point or range keys. The non-zero case corresponds to the GC of the individual point or range keys, while the zero case relates to bumping the GC threshold. To aid in debugging, add the range key count to the existing trace statement. This makes it simpler to tell what each batch is doing. Release note: None. team: remove replication from teams.yaml The `replication` team was merged into `kv`. Update the `TEAMS.yaml` files (parts `ii` and `iii` from [this doc][1]). [1]: https://cockroachlabs.atlassian.net/wiki/spaces/devinf/pages/3692167169/Merging+teams+into+KV+-+2024-06#GitHub Release note: None. roachtest: move replication into kv; remove replication The Replication team was merged with KV. Update the owners of various tests from `OwnerReplication` to `OwnerKV`. Release note: None. storage: improve code comments in range_key_clear test Update the diagram reflecting the test setup, which was missing a span and a tombstone. Add diagrams reflecting the "after" state of each operation. This makes the data-driven test easier to read. Release note: None. crosscluster/logical: hide key_value_bytes column in dlq The key_value_bytes column can really only be used for internal debugging, so might as well hide it from users running SELECT *. Release note: none roachtest: add PCR mixed version test This patch adds a mixed version PCR roachtest which begins PCR on 24.2, launches a light tpcc workload on the source tenant, upgrades both clusters in a somewhat coordinated dance, runs cutover, and fingeprints both clusters. The test asserts that the stream can catch up at various states as well. Some follow up work includes: - allow upgrades as early as 24.1 and 23.2 - version skipping from 24.1 to 24.3 - running the initial scan and cutover in destination side mixed version states. Epic: none Release note: None replica_rac2: promptly return tokens on descriptor change Epic: CRDB-37515 Release note: None stats: add support for most common values in histograms Fixes #71828 Release note (sql change): Added a new cluster setting to control whether most common values are collected as part of histogram collection for use by the optimizer. The setting is called sql.stats.histogram_buckets.include_most_common_values.enabled. When enabled, the histogram collection logic will ensure that the most common sampled values are represented as histogram bucket upper bounds. Since histograms in CockroachDB track the number of elements equal to the upper bound in addition to the number of elements less, this allows the optimizer to identify the most common values in the histogram and better estimate the rows processed by a query plan. To set the number of most common values to include in a histogram, a second setting sql.stats.histogram_buckets.max_fraction_most_common_values was added. Currently, the default is 0.1, or 10% of the number of buckets. So with a 200 bucket histogram, by default, at most 20 buckets may be adjusted to include a most common value (or "heavy hitter") as the upper bound. kvserver: wiring of RACv2 to eval, and v1 => v2 transition code The replicaFlowControlIntegration object for v1, which is a member of Replica.mu, is destroyed and replaced by a noop implementation. The admissionDemuxHandle implements the new ReplicationAdmissionHandle and handles a switch from v1 to v2 during the wait. At raft log entry encoding time, for the entries that were subject to replication admission control, we look at the latest value of the EnabledWhenLeaderLevel to decide whether we can use the v2 encoding. Fixes #129129 Fixes #128756 Epic: CRDB-37515 Release note: None logictest: re-enable read committed logic tests These were accidentally all skipped when abda1f931fc69a5ee3edf153af73061864ec26f9 was merged. Now we run them all again by fixing the logic test generation script. Release note: None orchestration: released CockroachDB version 24.2.0. Next version: 24.2.1 Release note: None Epic: None Release justification: non-production (release infra) change. server: wrap ctx err on cancelation when iterating over nodes When iterating over all nodes, if the context is done causing the iteration to cancel, wrap the context error to include it for consumers of the error. Epic: None Fixes: #129531 Release note: None roachtest: set min supported version to 23.2 in tpcc/mixed-headroom When trying to use `workload import fixtures`, we need to be in a 23.2 release. This command fails in a lot of 23.1 releases as it was not supported (unimplemented error). Fixes: #129632 Release note: None go.mod: bump Pebble to de41143371aa Changes: * [`de411433`](https://github.com/cockroachdb/pebble/commit/de411433) tool: fix up `wal dump`, add `wal dump-merged` * [`9b79e826`](https://github.com/cockroachdb/pebble/commit/9b79e826) vfs: make Unwrap part of FS, fix metamorphic restart op * [`f8a32c98`](https://github.com/cockroachdb/pebble/commit/f8a32c98) vfs: convert MemFS tests to datadriven * [`3d1f61aa`](https://github.com/cockroachdb/pebble/commit/3d1f61aa) sstable: convert RawWriter into an interface * [`583859f8`](https://github.com/cockroachdb/pebble/commit/583859f8) colblk: add index block writer, reader, iterator * [`de9087ca`](https://github.com/cockroachdb/pebble/commit/de9087ca) sstable: genericize iterators over index block iterator type * [`1b4021bc`](https://github.com/cockroachdb/pebble/commit/1b4021bc) colblk: store shared and bundle prefix length * [`99abcf76`](https://github.com/cockroachdb/pebble/commit/99abcf76) colblk: add UnsafeOffsets.At2 and improve UnsafeUints.At * [`55fcc6e9`](https://github.com/cockroachdb/pebble/commit/55fcc6e9) colblk: special case zero bitmaps * [`8ec212a4`](https://github.com/cockroachdb/pebble/commit/8ec212a4) colblk: use regular slice in PrefixBytesIter * [`94561af6`](https://github.com/cockroachdb/pebble/commit/94561af6) colblk: add some minor comments * [`d4348d08`](https://github.com/cockroachdb/pebble/commit/d4348d08) lint: remove golint * [`aa96305a`](https://github.com/cockroachdb/pebble/commit/aa96305a) sstable: fix nil pointer in twoLevelIterator.NextPrefix * [`9f97b7bf`](https://github.com/cockroachdb/pebble/commit/9f97b7bf) sstable: fix TestBlockProperties[BoundLimited] to use testkeys.Comparer * [`de53a981`](https://github.com/cockroachdb/pebble/commit/de53a981) rowblk: add IndexIter type * [`aa32f0b0`](https://github.com/cockroachdb/pebble/commit/aa32f0b0) db: fix WAL offset in replayWAL errors Release note: none. Epic: none. build: cleanup some stale files in `build` directory `build/variables.mk` is no longer used. For anyone who wants to use the `make` build machinery, `make` will auto-generate the file. We also `gitignore` it. We also delete `werror.sh` which is no longer used. Epic: none Release note: None Release justification: Non-production code changes sql: align the behavior of 'infinity' timestamp with Postgres Previously, "SELECT 'infinity'::timestamp" would return a concrete timestamp instead of "infinity", which is incompatible with Postgres, causing issue #41564. This commit addresses this issue by: - Returning "infinity" for 'infinity'::timestamp, supporting both text and binary formats. This commit also updates the cmp/eval behavior of "infinity" timestamps. The updated behavior is the same as Postgres: - "infinity" is always larger than other timestamps, and "infinity" equals itself. - "-infinity" is always smaller than other timestamps, and "-infinity" equals itself. - "infinity"/"-infinity" add or subtract any duration results in itself. Fixes: #41564 Release note (bug fix): Fixed a bug where 'infinity'::timestamp returned a different result than Postgres. raft: do not campaign if supporting fortified leader This patch ensures that peers that have promised fortification support to a leader do not campaign unless StoreLiveness indicates support for the leader's epoch has expired. This patch only affects campaigning (either directly or in response to the randomized election timeout elapsing) -- in a subsequent patch, we'll handle voting. Note that once StoreLiveness indicates support for the leader has expired, we aim to act swiftly. In particular, care is taken to ensure a follower doesn't need to wait out its election timeout before campaigning. We do preserve randomness to prevent hung elections though. Informs https://github.com/cockroachdb/cockroach/issues/125351 Release note: None roachtest: fix log message in test Fix erroneous log message in previous commit. Epic: none Release note: None kv: don't campaign on non-epoch lease redirect Informs #125254. Doing so is not needed and will be a no-op if the follower is fortified anyway (see next commit). Release note: None kv: comment that manual campaign is no-op for fortified followers Informs #125254. This commit adds a comment mentioning that calls to campaignLocked will be no-ops (as of #129158) for fortified followers. Release note: None kvserver: provide tenant id to Processor in OnDescChangedLocked The tenant id is known at the time when the RangeDescriptor is first set. We simply pass the tenant id with every call to OnDescChangedLocked. Informs #129129 Epic: CRDB-37515 Release note: None crosscluster/logical: check ctx cancel on err/retry/dlq paths We don't want to keep trying batches or DLQ things if execution has been canceled. Release note: none. Epic: none. master: Update pkg/testutils/release/cockroach_releases.yaml Update pkg/testutils/release/cockroach_releases.yaml with recent values. Epic: None Release note: None Release justification: test-only updates workflows: disable cockroach-microbench-ci step for pull requests This change disbales the CI step for pull requests but this still runs for push step into the target branch. Also the count of the microbenchmarks are increased to get more robust results. Epic: none Release note: None workflow: increase test timeout in cockroach microbench ci The step started timing out after https://github.com/cockroachdb/cockroach/pull/129390 because test.count was increased to 10 and the timeout used was 900 seconds. This change bumps it to 1800 seconds (i.e. 30 min) Epic: none Release note: None drtprod: add ttl jobs for ldr cluster. To prevent disk capacity issue configured ttl jobs on tables. Epic: None Release note: None rac2: add token tracker This commit introduces the token `Tracker`, which will be used to track token deductions for in-flight entries at a given `RaftPriority`. Resolves: #128026 Release note: None rpc: add source and destination to the peer labels For certain tooling it is important to differentiate between the locality tag of the local node from the locality tag of the remote node. By adding both the local and the destination, it allows those tools to understand the source and destination of the connections. Epic: CRDB-41138 Release note: None restore: allow empty virtual sst in online restore Previously, only restore would fast fail if any backup spans were empty because pebble could not read from an empty virtual sst. Pebble has gotten smarter, so we can remove this check. Fixes #129610 Release note: none sql: add REPEATABLE READ to parser help text This commit adds "REPEATABLE READ" to the list of supported isolation levels in the parser help text. The isolation level is still behind a cluster setting, but it is not worth the effort to gate this help text on the value of that setting. Epic: None Release note: None sql: add basic parsing for all isolation levels This commit extends the parser tests to cover all isolation levels. Epic: None Release note: None sql: rename snapshot isolation enabled setting This commit renames the setting that enables snapshot/repeatable read isolation from `sql.txn.snapshot_isolation.enabled` to `sql.txn.repeatable_read_isolation.enabled`. This is part of a larger change to favor the term "repeatable read" over "snapshot" at the SQL level. Epic: None Release note: None licenseccl: new license types Epic: CRDB-39988 Closes: CRDB-40069 Release note: none roachtest/grafana: add SQL metrics to CDC dashboard This patch adds a row to the CDC roachtest grafana dashboard with a few SQL metrics (SQL Statements, SQL Byte Traffic, Service Latency) that are of interest while doing performance testing. This addition is valuable because the TestEng Grafana instance currently only works with GCE roachtests, and there are times when we need to run tests on other cloud platforms like AWS. Release note: None upgrades: set exclude_data_from_backup on non-system tenants This commit adds a new migration to set `exclude_data_from_backup` on tenant system tables with low GC TTL, as was done for the system tenant in #120144. Fixes: #129643 Release note: None ui: fix `dev gen js` Since we switched to Bazel 7, `bazel` is more rigorous about disallowing directories as outputs. This means that the `crdb-api-client` implementation, which was always pretty much incorrect, started entirely failing to build. To address this we add code generation to list `.proto` files and use that instead of trying to use the directory `dist` as an output file which Bazel does not want. Closes #129369. Epic: none Release note: None Release justification: Non-production code changes roachtest: mark pgjdbc 'infinity' timestamp tests as passing These pass now that 63f8a13f10f286457ea74ef198589278d6d3d654 has been merged. Release note: None base, licenseccl: Revise resolution strategy for LicenseTTL Previously, for resolving the cluster's license expiry we relied on a metric whose value was regularly ticked down by an async task. This change seeks to only the resolve value at read time by reading directly from the cluster settings when its requested. Release note: None Fixes: CRDB-40592 Epic: CRDB-8035 rac2,replica_rac2: implement Processor.AdmitForEval - RangeController.WaitForEval is changed to additionally return a waited bool, so that it conforms to the callers expectation that (false, nil) will be returned when the RangeController is closed during waiting. This allows the caller to wait on the local store (before eval) and not completely bypass all AC. - Processor.AdmitForEval consults the existing kvadmission.flow_control.mode cluster setting to decide whether a request should be subject to replication AC, and if yes, forwards to the RangeController. Fixes #129522 Epic: CRDB-37515 Release note: None go.mod: bump Pebble to a70d5b3c41b8 and update MemFS usage Changes: * [`a70d5b3c`](https://github.com/cockroachdb/pebble/commit/a70d5b3c) vfs: redesign MemFS strict mode Release note: none. Epic: none. sql: add "local-repeatable-read" logic test config This commit adds a "local-repeatable-read" logic test configuration, which runs all logic tests with a default transaction isolation level of REPEATABLE READ. This provides REPEATABLE READ with significantly more test coverage. Epic: None Release note: None sql: scaffolding for db metadata update job This commit creates a new forever running BG job called UpdateCachedTableMetadata. This is just some scaffolding - the job doesn't do anything right now but it will be used to populate the system table storing cached table metadata used by obs surfaces. Epic: CRDB-37558 Fixes: #127891 Release note: None kvserver,rac2: add and plumb RaftEvent This change updates the RaftEvent structure to reflect the semantics of a raft storage write, and plumbs it into RACv2 code in a few places. Epic: none Release note: none scjob: treat descriptor not found as permanent job error Retrying a schema change after hitting a descriptor not found error just means the job will keep being unable to find it. We mark them as permanent now to avoid causing infinite retries. Note that most errors already cause the schema change job to fail, but if the job is non-cancelable (which it might be in the PostCommit phase), then it needs to be marked in order to not be retried. Release note: None sql: fix checkClusterSettingValuesAreEquivalent for final release fence versions When the `internal` field of a clusterversion is `0`, which it would be for all final cluster versions, the corresponding fence version will have an internal version of `-1`. The logic in checkClusterSettingValuesAreEquivalent did not account for this. It was assuming that one could always decrement the internal version by 1 in order to get the predecessor version. Now, we use ListBetween instead, which does not need this assumption. Release note (bug fix): Fixed a rare bug in SHOW CLUSTER SETTING that could cause it to fail with an error like `timed out: value differs between local setting and KV`. roachpb,clusterversion: add delegating method for FenceVersion Release note: None roachpb: allow negative internal version to roundtrip parse+format Release note: None admission: rename ttl-low-pri to bulk-low-pri Release note (ops change): The lowest admission control priority for the storage layer has been renamed from 'ttl-low-pri' to 'bulk-low-pri' in a handful of metrics. Epic: none. crosscluster/logical: use bulk-low-pri for LDR Release note: none. Epic: none. admission: elastic limit all bulk-low-pri requests Previously bulk low pri (formerly called ttl low pri) requests were only subject to elastic limiting if they were also read-only requests. However writing requests such as those performed by LDR also benefit from elastic limiting. Release note: none. Epic: none. opt/props: add MaxFrequency() function to props.Histogram This commit adds a new function to props.Histogram called MaxFrequency, which returns the maximum value of NumEq across all histogram buckets. This function will be useful to calculate an estimate for the upper bound of the row count for joins. Epic: None Release note: None crosscluster/logical: store resolved options as values Epic: none Release note: none crosscluster/logical: rename FilterRangefeed to IgnoreCDCIgnoredTTLDeletes This clarifies how bespoke the cdc ignored ttl delete logic is. Epic: None Release note: none cloud/externalconn: show pg urls Release note: none. Epic: none. sql, schemachanger: disallow ADD FK if referenced table is locked Release note (bug fix): Fixed a bug where the schema_locked table parameter did not prevent a table from being referenced by a foreign key. roachtest: Update readme stage command While running through the readme I ran into two issues related to the step which fetches a linux binary: 1. The binary version is no longer available at the location listed (21.1.5) 2. The architecture must be specified, otherwise roachtest will not accept it This change addresses this items. Epic: none Release note: None server/license: store cluster init time in the KV system keyspace We need to store cluster-level metadata for license policy enforcement. This metadata must be protected from modification, so it cannot be stored in any system table. This PR introduces a new field in the system keyspace to store the cluster's initialization start time, which will be used in a follow-up PR to track the grace period for clusters without a license. The metadata needs to be stored at the cluster level to align with the license's scope. A new struct, License.Enforcer, is responsible for writing and caching the new timestamp. This struct will be extended in future PRs to check for throttling in clusters that violate our license policies. This update will be backported to versions 23.1, 23.2, 24.1, and 24.2. Epic: CRDB-39988 Informs: CRDB-40064 Release note: None raft: export log mark type Epic: none Release note: none crosscluster/physical: defensively check errCh in span config event stream Select does not choose which channel to read from based on some order, but the span config event stream must return as soon as there is an error. For this reason, this patch rechecks the errCh after select chooses to read from the data channel. Informs #128865 Release note: none replica_rac2: lift routing from admit callback args Epic: none Release note: none sql: don't consider repeatable read to snapshot an upgrade One isolation level is not stronger than th…
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 22, 2024
Informs cockroachdb#125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * cockroachdb#127349 * cockroachdb#127348 Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 22, 2024
Informs cockroachdb#125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * cockroachdb#127349 * cockroachdb#127348 Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 23, 2024
Informs cockroachdb#125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * cockroachdb#127349 * cockroachdb#127348 Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 25, 2024
Informs cockroachdb#125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * cockroachdb#127349 * cockroachdb#127348 Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Oct 28, 2024
Informs cockroachdb#125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * cockroachdb#127349 * cockroachdb#127348 Release note: None
craig bot
pushed a commit
that referenced
this issue
Oct 28, 2024
133169: raft: assert on HardState field modifications r=nvanbenschoten a=nvanbenschoten Informs #125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * #127349 * #127348 The 2nd, 3rd, and 4th commits in the PR address these issues in raft by not assuming that a `MsgSnap` as always coming from the leader of `Message.Term`. Release note: None 133473: sql,kvflowcontrol: include sum of (eval|send) tokens deducted r=sumeerbhola a=kvoli Populate two new `kvflowinspectpb.ConnectedStream` fields: ```go // TotalEvalDeductedTokens represents the total number of tokens deducted for // evaluation, for this connected stream. Only populated on versions >= 24.3. TotalEvalDeductedTokens int64 // TotalSendDeductedTokens represents the total number of tokens deducted // before sending, for this connected stream. Only populated on versions >= // 24.3. TotalSendDeductedTokens int64 ``` These may be useful to gander at the tokens deducted, which are otherwise not tracked. Also, for a quick check. Note that RACv1 does not populate these variables, as it is soon to be deprecated and these fields don't directly relate, unlike RACv2. --- Update `crdb_internal.kv_flow_control_handles_v2` to include: ```sql total_eval_deducted_tokens INT NOT NULL, total_send_deducted_tokens INT NOT NULL, ``` Corresponding to the total number of `eval` and `send` tokens deducted by the connected stream, in RACv2. Note the `handles_v1` virtual table remains unchanged. The DDL is now: ```sql CREATE TABLE crdb_internal.kv_flow_control_handles_v2 ( range_id INT NOT NULL, tenant_id INT NOT NULL, store_id INT NOT NULL, total_tracked_tokens INT NOT NULL, total_eval_deducted_tokens INT NOT NULL, total_send_deducted_tokens INT NOT NULL, INDEX(range_id) ); ``` Resolves: #132777 Release note: None 133497: build: upgrade `rules_go` and `gazelle` r=rail,tbg,stevendanna a=rickystewart We also drop a [commit on top of our `rules_go` fork](RaduBerinde/rules_go@b2411a8) that is in opposition to #102619. Closes #102619 Closes #133289 Closes #133464 Epic: CRDB-17171 Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 28, 2024
Informs #125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * #127349 * #127348 Release note: None
blathers-crl bot
pushed a commit
that referenced
this issue
Oct 28, 2024
Informs #125352. This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways. Some bugs that this change could catch: - term regressions - vote changes - lead changes in a single term - leadEpoch regressions The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP: * #127349 * #127348 Release note: None
We've added a bunch of assertions as we've done the leader leases work, so I'll close this out. I'm sure there will be more that we add here in the future, but this seems like the correct stopping point. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
A-leader-leases
Related to the introduction of leader leases
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Describe the problem
Part of #123847
A leader should never be surprised when it loses fortification support and we should be able to make some assertions using this. There may be more, but one that comes to mind is that it should never hear of a newer term if it still thinks its fortified and hasn't transferred leadership away.
Jira issue: CRDB-39403
Epic CRDB-37522
The text was updated successfully, but these errors were encountered: