Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: add leader fortification assertions #125352

Closed
arulajmani opened this issue Jun 7, 2024 · 2 comments
Closed

raft: add leader fortification assertions #125352

arulajmani opened this issue Jun 7, 2024 · 2 comments
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-leader-leases Related to the introduction of leader leases C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@arulajmani
Copy link
Collaborator

arulajmani commented Jun 7, 2024

Describe the problem

Part of #123847

A leader should never be surprised when it loses fortification support and we should be able to make some assertions using this. There may be more, but one that comes to mind is that it should never hear of a newer term if it still thinks its fortified and hasn't transferred leadership away.

Jira issue: CRDB-39403

Epic CRDB-37522

@arulajmani arulajmani added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. T-kv KV Team A-leader-leases Related to the introduction of leader leases labels Jun 7, 2024
Copy link

blathers-crl bot commented Jun 7, 2024

cc @cockroachdb/replication

@arulajmani arulajmani self-assigned this Aug 16, 2024
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 16, 2024
This patch adds conceptual de-fortification to raft. De-fortification
amounts to resetting the leadEpoch or simply forgetting fortification
support that was previously provided.

Currently, there are two ways a peer may de-fortify:
1. If it learns about a new leader at a higher term.
2. If the leader transfers leadership to it.

By adding this explicit de-fortification, we're able to then assert that
a peer only ever bumps its term if its no longer supporting a fortified
leader. A bunch of datadriven tests that were manually campaigning need
to be modified with this new assertion in place.

Informs cockroachdb#125351
Informs cockroachdb#125352

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 21, 2024
This patch adds conceptual de-fortification to raft. De-fortification
amounts to resetting the leadEpoch or simply forgetting fortification
support that was previously provided.

Currently, there are two ways a peer may de-fortify:
1. If it learns about a new leader at a higher term.
2. If the leader transfers leadership to it.

By adding this explicit de-fortification, we're able to then assert that
a peer only ever bumps its term if its no longer supporting a fortified
leader. A bunch of datadriven tests that were manually campaigning need
to be modified with this new assertion in place.

Informs cockroachdb#125351
Informs cockroachdb#125352

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Aug 22, 2024
This patch adds conceptual de-fortification to raft. De-fortification
amounts to resetting the leadEpoch or simply forgetting fortification
support that was previously provided.

Currently, there are two ways a peer may de-fortify:
1. If it learns about a new leader at a higher term.
2. If the leader transfers leadership to it.

By adding this explicit de-fortification, we're able to then assert that
a peer only ever bumps its term if its no longer supporting a fortified
leader. A bunch of datadriven tests that were manually campaigning need
to be modified with this new assertion in place.

Informs cockroachdb#125351
Informs cockroachdb#125352

Release note: None
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
hanpen24 added a commit to hanpen24/cockroach that referenced this issue Sep 7, 2024
There was a bug where comments did not show up in
the output of the show create all schemas command.
The issue was that we did not have any code to get
the comment and add it to the create schema descriptor.

Fixes: #127979

Release note (bug fix): SHOW CREATE ALL SCHEMAS
now shows corresponding schema comments in it's output.

insights: ensure releasing to Insight pool clears slice

Ensure that when we return objects into the Insight pool
that we release the statmeents in the slice. This fixes
an issue in the logic that returned this object to the pool
which did not nil the `Statements` slice, making it possible
for these slices to grow in capacity in the node's lifetime,
holding onto garbage Statement objects.

This is a lead up to #128199 which will likely remove
this pool and reuse the existing statmentBuf pool.

Epic: none
Release note: None

insights: add testing knobs

Add insights specific testing knobs. We'll add some
knobs in later commts.

Epic: none
Release note: None

sqlstats/insights: free memory allocated per session on session close

The insights locking registry buffers statement insight objects
by session id until receiving a transaction insight, when the
buffer is emptied. These buffers can leak if the session is closed
midway through a transaction since the registry will never
receive a transaction event for the session. This commit ensures
we clear any memory allocated in insights for a session by
sending an event to clear the container's session entry, if it
exists, on session close.

A testing knob was added, OnCloseSession, which is called
when the locking registry clears a session.

Epic: none
Fixes: #128213

Release note (bug fix): Fixes a memory leak where statement
insight objects may leak if the session is closed without
the transaction finishing.

insights: move insights testing knobs from sqlstats

Move some insights testing knobs that were on the
sqlstats testing knobs to insights pkg.

Epic: none

Release note: None

cli: update cluster tag in tsdump upload

Previously, we are uploading tsdump data to datadog with cluster name with tag
"cluster". However, it would coincide with metrics emitted with "cluster" as
tag. This was resulting in a large drop down list to filter out cluster in
datadog dashboard. This change updates tag from "cluster" to "cluster_label"
with tsdump datadog upload. This would result in smaller cluster list in
datadog dashboard.

Epic: None
Release note: None

sqlliveness: handle AmbiguousResultError gracefully on insert

Previously, when inserting a new SQL liveness session, we extended the
expiry between each InitPut to ensure that an expired session was never
inserted. This worked well, but if an ambiguous result error was
encountered, the row could already exist in the table with the old
timestamp leading to a condition failed error.  To address this, this
patch will also assign a new session ID if we encounter an ambiguous
result error. This will ensure that we do not collide with the old
session ID.

Fixes: #127301
Release note (bug fix): Starting up nodes could fail with:
"could not insert session ...: unexpected value", if an ambigous
result error was hit inserting into the sqlliveness table.

roachtest: mixedversion documentation updates

* comments were added to indicate that callers should not close the
  connection pools returned by helper functions.
* documentation for `InMixedVersion` and `AfterUpgradeFinalized` were
  updated to be more explicit about when the hooks passed to those
  functions can be called.

Epic: none

Release note: None

roachtest: include step ID in failures to get debug information

This commit updates the `mixedversion` framework so that we include
the step ID we were preparing to run when an error is found gathering
debug information (current binary and cluster versions on the
nodes). This makes it a little easier to understand these errors and
at what point in th test they happened.

Epic: none

Release note: None

logictest: ignore "Can't find decompressor for snappy" error

This error appears to be a rare infrastructural flake. Since it's hard
to investigate, we will ignore it rather than trying to track down the
cause.

Release note: None

roachtest: deflake `admission-control/disk-bandwidth-limiter` on azure

We tried fixing it earlier but turns out the roachtest mount config is
different than the roachprod.

Fixes #129022.

Release note: None

workload/schemachange: be more lenient for DROP DEFAULT error code

Since 5eedebc529adedc33b93ca5146aed82479a9c40e was backported to older
branches, but does not yet appear in a release, the workload should be
able to handle _not_ seeing an error code for the case of dropping a
column default for a computed column.

Release note: None

admission,kvserver: plumbing for RACv2 callbacks

The plumbing changes allow for both RACv1 and RACv2 info to flow through
admission control queues and be provided in the callback.
The admittedLogEntryAdaptor demuxes to the relevant callback handler.
kvserver.storesForRACv2 will further route to the relevant Replica --
this code will be added once Replica has a replica_rac2.Processor member.

Informs #128309

Epic: CRDB-37515

Release note: None

build: update errcheck

Update kisielk/errcheck package.

Epic: none
Release note: None

workflows: enable CI for GitHub merge queue builds

Epic: none

Release note: None
Part of: DEVINF-1127

backupccl: remove not null flag from SQLInstanceID field in SSP spec

This flag is a noop for repeated fields and was producing a warning during
proto generation.

Release note: none

kvcoord: safe format replica slice

Introduce a `SafeFormat` method on `ReplicaSlice`, which can be used to
print out the replica slice in logging.

Informs: #129031
Release note: None

kvcoord: log replica order on nearest routing policy

Previously, we would log an event when routing to the nearest replica,
which would then show up in traces and could be used to determine the
routing policy of a query.

It is also useful to know how the nearest replica was selected i.e.,
latency, locality or by health. Log the event after sorting by the
replicas' distance to the sender and include the replica order.

Informs: #129031
Release note: None

sql: don't allow VOID parameters for UDFs

Release note (bug fix): Function input parameters can no longer be of
the VOID type.

roachtest: upgrade hibernate version under test

This fixes an issue with a missing dependency as well. See
https://github.com/hibernate/hibernate-orm/commit/e4a0b6988f84e85e427484e65321e9583080ccb5

Release note: None

crosscluster/logical: add fallback to SQL to kv writer

Release note: none.
Epic: none.

kvserver: misc partial wiring for RACv2

This adds a bunch of small wiring changes:
- raftScheduler: is passed a non-nil PiggybackMsgReader.
- Scheduling piggybacked responses: raftScheduler accepts a
  PiggybackedAdmittedResponseScheduler to process incoming piggybacked
  responses, which is implemented by storesForRACv2, since it needs to
  route to the relevant Replica. storesForRACv2 also enqueues a range
  for processing if it received a piggybacked response. This is done
  via a small change to raftScheduler to add
  EnqueueRACv2PiggybackAdmitted which are processed via
  processRACv2PiggybackedAdmitted, which is plumbed through Store to
  the Replica to replica_rac2.Processor.
- replicaForRACv2 is trivially implemented by Replica.
- raftNodeForRACv2 provides a very incomplete implementation of RaftNode,
  with todos.
- The todo to implement storesForRACv2.AdmittedLogEntry is done, and
  routes to the Processor in Replica.
- Replica has two new members, raftMu.flowControlLevel, and
  flowControlV2, which is the Processor implementation.
- Except for eval integration, all methods of Processor are integrated.
  The eval integration will be a later PR.

Informs #129129
Fixes #128309
Informs #128756

Epic: CRDB-37515

Release note: None

dev: add temporary directory in sandbox for `compose`

Since `--[no]incompatible_sandbox_hermetic_tmp`'s default flipped to
`true` in Bazel 7, `compose` tests broke as we stage the built artifacts
in a temporary directory. If this wasn't set in `.bazelrc.user`, it
needs to be added here or else the test will fail.

Closes: #128227
Epic: CRDB-17171
Release note: None

rac2: implement wait for eval helper

`WaitForEval` (inside `token_counter.go`) will be used to implement the
`RangeController.WaitForEval(..)`, providing the ability to wait for:

1. A quorum of streams to have available tokens
2. All required streams to have available tokens

Note usage of this helper function is deferred to a later commit, along
with support for waiting on send stream state as an added condition.

Informs: #128910
Release note: None

logictest: deflake cross_version_tenant_backup

This makes the test less flaky by making sure finalization is complete
before beginning the RESTORE.

Release note: None

master: Update pkg/testutils/release/cockroach_releases.yaml

Update pkg/testutils/release/cockroach_releases.yaml with recent values.

Epic: None
Release note: None
Release justification: test-only updates

kvserver/rangefeed: change p.Register to use registration interface

Previously, we introduced a registration interface to abstract the
implementation details of buffered and unbuffered registration. This patch
updates the p.Register function to use this interface as well.

Part of: #126560
Release note: none

kvserver/rangefeed: move perRangeEventSink to rangefeed pacakge

Previously, perRangeEventSink was defined in pkg/server/node.go. This patch
relocates it to the rangefeed package to facilitate future commits testing
within the rangefeed package itself.

Part of: #126560
Release note: none

roachprod: reuse start script if start is called with --restart=true

Previously, irrespective of the restart flag value we
ended up overwriting the startup script which led to
defaulting the values of other flags passed previously
to the start command. This was inadequate because when restarting,
i.e., `restart=true`, it makes more sense that we would reuse the
existing startup script by default. This patch changes this behaviour.

Epic: none
Fixes: #125085
Release note: None

roachtest: use spot VM for all tests

Currently, we are using spot VMs only to run benchmark tests. The reason
for the low adoption of spot VM is because of the test failures due to
VM preemption.
But, with the recent changes, we have better control on VM preemption
changes where if there is a test failure due to preemption, the test is
run on an on-demand VM the next time.
Also, the current failed tests that are run on spot VM is 148 out of 1093 = 13.5%.
If I consider failure out of total tests run is 148 out of 2508 = 5.9%.

So, this PR removes the condition to use spot VMs only for benchmark tests and changes
the probability to 75%.

Fixes: #127236
Epic: None

ui: display very large ranges in problematic ranges

This commit adds a column in the DB Console's page problematic ranges
that displays the ranges that are too large. The threshold is set
to be: 8 * the current range max size.

Fixes: #127843

Epic: None

Release note (ui change): Adding too large ranges to the problematic ranges
page in DB Console. If a range is larger than twice the max range
size, it will show up in the problematic ranges page.

roachtest: multi-region/mixed-version support for shared-process

We enable required features on tenants before setting up the TPCC
workload on the cluster.

Informs: #127378

Release note: None

jobs: avoid CTEs in crdb_internal.system_jobs query

The CTE in the query used for crdb_internal.system_jobs can prevent a
number of useful query optimizations.

Informs #122687

Release note (performance improvement): Further improves the
performance of job-system related queries.

jobs: further simplify crdb_internal.system_jobs query

Epic: None
Release note: None

roachtest: update rebalance-load mixed-version tests for shared-process

In this commit, we update the `rebalance/by-load` mixed-version tests
so that they can run on shared-process deployments. As usual, we need
to enable some features on tenants before initializing the
workload. In addition, we pass the virtual cluster cookie when
fetching system metrics.

Informs: #127378

Release note: None

lease: fix goroutine starvation caused by test cluster setting

When a test overrides sql.catalog.descriptor_lease_duration to 0, then
it was possible for the test to starve itself by creating a timer with
zero duration that is repeatedly reset.

The bug occurred since after refreshTimer expired, it would call the
jitteredLeaseDuration helper to get the new duration for the timer. If a
test had overriden sql.catalog.descriptor_lease_duration to 0, this
would cause the timer to keep being reset to 0 and starve other
goroutines.

Release note: None

kvserver: deflake TestRangefeedCheckpointsRecoverFromLeaseExpiration

This commit does two things to deflake the test:

1) Takes the MinExpiration into consideration when advancing the clock
to cause the epoch lease to expire. This is important after the feature
was enabled in: ba758c08023949ad4be87bab1762db10fdaa4f0e

2) Waits for N1's view of N2's lease expiration to match N2's view.
This is important in the rare case where N1 tries to increase N2's
epoch, but it has a stale view of the lease expiration time.

Fixes: #124178, #123551

Epic: None

Release note: None

go.mod: bump Pebble to 3d14906a0e0c

Changes:

 * [`3d14906a`](https://github.com/cockroachdb/pebble/commit/3d14906a) wal: fix reader buffer reuse
 * [`c2749cc7`](https://github.com/cockroachdb/pebble/commit/c2749cc7) colblk: use crlib/crbytes.CommonPrefix
 * [`f90b350e`](https://github.com/cockroachdb/pebble/commit/f90b350e) docs/rfc: add RFC for point tombstone density compaction heuristic
 * [`1a5295f8`](https://github.com/cockroachdb/pebble/commit/1a5295f8) block: move block handle encoding and decoding
 * [`0e8a11a5`](https://github.com/cockroachdb/pebble/commit/0e8a11a5) metamorphic: no synthetic suffix with overlapping RangeKeySets

Release note (bug fix): Fix bug in the public preview WAL failover feature that
could prevent a node from starting if it crashed during a failover.
Epic: none.

kv: fix comment on QueryTxnResponse.QueriedTxn

This commit fixes a comment in the QueryTxnResponse.QueriedTxn field.
It also addresses a related TODO.

Epic: None
Release note: None

crosscluster, sql: implement SHOW LOGICAL REPLICATION JOBS

Implementation of `SHOW LOGICAL REPLICATION JOBS`. An LDR job
description column will be added in a follow-up PR.

Rows returned by `SHOW LOGICAL REPLICATION JOBS` should include
the following columns:
- job_id: ID of an LDR job
- status: job status
- targets: names of tables being replicated in an LDR job
- replicated_time: ingestion frontier timestamp

An optional clause `WITH DETAILS` can be included in the query.
If `WITH DETAILS` option is included, there should be an additional
column in the rows returned by the query:
- replication_start_time: initial timestamp from which the
replication producer job will begin streaming MVCC revisions

Epic: CRDB-40869
Release note: None

changefeedccl: add more logging around core changefeed completion

Release note: None

ui: remove unnecessary metric name prefixes

Metrics keyed by nodeID and prefixed with long strings become very
difficult to read on the dashboard pages since they take up too much
horizontal space. In all of these cases, the context provided by the
graph title and Y axis label provide enough context to omit verbose
legend labels.

Resolves: #128046

Release note (ui change): some metric charts on the overview and
replication pages have more terse legends to facilitate easier
browsing.

roachprod: azure GC supports cleanup for multiple subscriptions

Previously Azure GC assumed that only one subscription ID would
be used at a time. However the current structure of our Azure
subscriptions makes a distinction between user "adhoc" roachprod
clusters and nightly "infra" clusters. As a result, roachprod
clusters can live in two different subscriptions.

This change adds the azure-subscription-names flag to support
cleaning up in multiple Azure subscriptions.

Additionally it also switches the default subscription to
e2e-adhoc, which is what users should use.

kvclient: improve logging in OptimizeReplicaOrder

Epic: none

Release note: None

changefeedccl: enable vmodule for TestAlterChangefeedAddTargetFamily

This patch enables verbose logging (vmodule `helpers_test=1`) for
`TestAlterChangefeedAddTargetFamily` so that we can get more information
about what, if any, messages that the changefeed sees before the test
times out.

Release note: None

roachprod: gcloud mig template per zone

Previously, one instance template, for a managed instance group cluster, was
created and reused for every zone a managed instance group cluster spanned over.
In this change, the logic is updated to create a different template for each
zone. This is useful to extend template flexibility to support per zone
configurations.

The delete cluster logic now tracks instance templates by the cluster label
instead of by name.

The GC logic has also been updated to account for multiple instance templates
that point to the same cluster, in the case of an empty cluster. In this case
we'll only add one "empty" VM to destroy the empty cluster.

Epic: None
Release note: None

roachprod: per zone spot instance config

Previously, if managed instance group clusters opted to use spot instances the
whole cluster would be on spot instances. This change allows a new flag to be
passed to specify which zones should use spot instances. If this flag is passed
it is expected that the flag enabling spot instances for the whole cluster
should not be passed (results in an error), so that only the select zones use
spot instances.

Epic: None
Release note: None

crosscluster/logical: set admission header on kv writes

Release note: none.
Epic: none.

kv/bulk: check gc threshold in TestDuplicateHandling

TestDuplicateHandling failed recently because an unexpected number of keys were
read. This _may_ have been caused by the gc threshold advancing past old
revisions of certain test keys. This patch ensures the test will fail on a gc
error instead. If the test fails again due to an unexpected number keys read,
the gc story does not explain how this incorrect read could occur.

Fixes #126929

Epic: none

crosscluster/logical: set the origin ts in kv writer

Release note: none.
Epic: none.

Epic: None
Release note: None

server: add latest telemetry contact timestamp

The `Reporter` struct in the `diagnostics` package now maintains a
`LastSuccessfulTelemetryPing` field that's updated with the value of
`timeutil.Now()` after every response from the telemetry cluster.

This time is updated regardless of the status code of the response.
Any error from the HTTP client (this includes timeouts, DNS errors,
etc.) will skip updating the timestamp.

Resolves: CRDB-41231
Epic: CRDB-40209
Release note: None

roachtest: enable assertions in nightlies without metamorphic constants

In #114618 we disabled runtime assertions and metamorphic constants due
to their flakiness.

Here we reenable them but with metamorphic constants disabled, as they
are the more disruptive of the two. We also change the framework to take
in a --metamorphic-cockroach-ea-probability instead of hardcoding it.

roachtest: route runtime assertion build timeouts to test-eng

As we determine the stability of runtime assertions, temporarily
route all timeout failures to test-eng to reduce noise for other
teams.

raft: Set MustSync to true if the Commit index changes

This commit sets the MustSync field to true if the Commit index
changes. This won't change the current behaviour where we sync the
HardState fields if the MsgStorageAppend has any HardState fields set.
However, this commit sets the MustSync to stay consistent with the
other fields. Future work could be done to remove the MustSync field
all together.

Syncing the commit index shouldn't cause too much overhead because
in the normal case, advancing the commit index on a follower is
piggybacked on top of sending entries to the follower (which
requires syncing anyways).

References: #125266

Epic: None

Release note: None

workflows: update trigger for merge queue event

The trigger for the merge queue events for CI was not set up properly.
This is the proper way it should be done according to the documentation.

Epic: none
Part of: DEVINF-1127
Release note: None
Release justification: Non-production code changes

crosscluster: use src table ID to look up table names

Changes in this PR:
- Use source table IDs as keys for the map we use to look
up fully qualified table names. This fixes the bug where
we use destination table IDs as keys and attempt to look
up table names with source table IDs.
- Use destination table ID to create dlq table names
- Minor refactoring

Epic: none
Release note: none

ui: nits to LDR metrics page

- A few graph renames
- Add tooltip to Replication Lag graph

Epic: none

Release note: none

logical: Optionally filter rangefeed events to avoid replicating TTL deletes

Some of our users want the ability to avoid replicating TTL deletes from one
side of LDR to the other. This commit adds that ability to filter rangefeed
events when the bit is set on the other side. For testing, ttljob_test.go
does a lot of job scheduling wrangling to get the row TTL job to run, and
that feels unecessary for a unit test on this. This code would benefit
from a roachtest that explictly tests the TTL delete functionality with
admission control loosened

Release note: none
Epic: CRDB-41223

roachprod: rename `MaxWait` stop option to `GracePeriod`

This better mirrors the terminology we use in our documentation:

https://www.cockroachlabs.com/docs/stable/node-shutdown#termination-grace-period

Epic: none

Release note: None

roachprod: stop forcefully after grace period

This changes the `Stop` method on roachprod to stop the cockroach
process forcefully if the process does not terminate after the
provided `gracePeriod`. This brings back the behaviour old behaviour
of `StopCockroachGracefullyOnNode` without the need for a custom
function for it.

We adopt this behaviour because it is the documented procedure on the
public documentation, and it's also how process
managers/orchestrators (such as Kubernetes) operate.

Informs: #129103
Informs: #121455

Release note: None

roachtest: update pg_regress diffs

Fixes #127556

Release note: None

fs,kvserver: fix categorized disk stats metric

This patch fixes a bug where the `vfs.DiskWriteStatsCollector` was
incorrectly being passed into the disk health check FS in pebble.

Fixes #129159.

Release note: None

sql: remove computed column logic from heuristic planner

Vestigial logic in the `UPDATE` code path from the days of the heuristic
query planner has been removed. As an added bonus to simplifying the
code, this should reduce some allocations.

Release note: None

sql: rename `enforceLocalColumnConstraints`

The `enforceLocalColumnConstraints` function has been renamed to
`enforceNotNullConstraints` to clarify its behavior.

Release note: None

sql: remove column mapping during `UPDATE`

New values to write during an `UPDATE` always come directly after the
fetch values in the source row, i.e., the input row of an update
operator, and they are ordered the same as `ru.UpdateCols`. Therefore,
there is no need to map the source values. Instead, the update values
can simply be sliced from the source row.

This commit removes this unnecessary mapping logic and support data
structurres, including the `updateValues` buffer in `updateRun`,
`sourceSlot`, and `scalarSlot`. This simplifies the implementation and
will reduce allocations.

Release note: None

server: make telemetry timestamp atomic

Since the telemetry send timestamp will be accessed concurrently, we
need to make it atomic.

Part of: CRDB-41231
Epic: CRDB-40209
Release note: None

rac2: implement wait for eval

Implement `RangeController.WaitForEval(..)`, which will be used to wait
on a configurable number of streams to have available tokens. Note that
the functionality is currently incomplete without replica state
management, done in `HandleRaftReady`.

Resolves: #128910
Release note: None

sql: deflake TestConcurrentSchemaChanges

Previously, TestConcurrentSchemaChanges was intermittently
failing because the connection pool was not correctly being
used. Some of operations like selecting if the declarative or
legacy schema changer should be used were incorrectly being
run on the main connection. To address this, this patch uses
the dedicated connections per-thread.

Fixes: #129220

Release note: None

raft: do not forget the leader within the same term

Addresses a TODO. This wasn't causing any issues just yet, but having
this safety is nice as the code evolves, given we're relying on it for
leader fortification.

Epic: none

Release note: None

raft: reset lead epoch whenever the term is bumped

LeadEpoch is only relavent within a term. We should forget the lead
epoch in addition to the leader when the term is bumped, as the lead
epoch isn't meaningful without an associated leader.

Epic: none

Release note: None

raft: add conceptual de-fortification and some assertions

This patch adds conceptual de-fortification to raft. De-fortification
amounts to resetting the leadEpoch or simply forgetting fortification
support that was previously provided.

Currently, there are two ways a peer may de-fortify:
1. If it learns about a new leader at a higher term.
2. If the leader transfers leadership to it.

By adding this explicit de-fortification, we're able to then assert that
a peer only ever bumps its term if its no longer supporting a fortified
leader. A bunch of datadriven tests that were manually campaigning need
to be modified with this new assertion in place.

Informs https://github.com/cockroachdb/cockroach/issues/125351
Informs https://github.com/cockroachdb/cockroach/issues/125352

Release note: None

roachtest: extract draining into a function

Previously if the test failed between the start of draining and the
stopping of the node, it could leave the cluster with the iptables rule
in place. This caused problems for rerunning the test. This change
extracts the drainWithIpTables function out and cleans up the iptables
rule in a defer.

Epic: none

Release note: None

roachtest: close the db connections

Close the connections to avoid a leak on success.

Epic: none

Release note: None

roachtest: don't run SQL against the draining node

After #126506 the test incorrectly included the draining node as part of
the nodes the workload were run against. This caused the statistics to
be off by a factor of 5/6.

Note there is a separate issue related to the QPS dropping as part of
the drain which is a different issue that is masked by this issue.

Epic: none
Fixes: #129027
Fixes: #128439
Fixes: #128168
Fixes: #126986
Fixes: #125731

Release note: None

roachtest: disable the gossip limit for test

This test will fail without it. Figure out why..

Epic: none

Release note: None

util: add connection metric tracking utility

Adds a utility to cidrLookup to created a DialContext that is tracked
based on the lookup. This works for any third-party libraries that
expose a way to set the DialContext.

Epic: none

Release note: None

backupccl: break read and write metrics by bucket

Previously the read and write metrics were aggregate across all
containers or buckets. In some situations it is useful to break the
statistics out to detemine how much data is going to each. This PR
allows breaking the data out for the three main clouds.

Epic: CRDB-41142

Release note: None

crosscluster/logical: rename kv fallback metric

Release note: none.
Epic: none.

crosscluster/logical: only use kv writes for inserts

See comment inline.

Release note: none.
Epic: none.

system: add new table_metadata system table

Adds a new system table, `table_metadata` for
holding table level statatistics and metadata.
This table is intended to be used as a cache
and will be periodically updated via a new
job.

Epic: CRDB-37558
Release note: None

roachprod: update roachprod-gc.yaml with azure-subscription-names flag

This enables roachprod to clean up multiple subscriptions.

roachtest: skip multitenant/distsql

Informs #128366

Release note: None

roachtest: skip running `admission-control/disk-bandwidth-limiter` azure

This test is flaky on Azure due to different disk mounting behavior.
Previous attempts at deflaking have not helped. This patch will skip
running this test on the azure environment.

Fixes #129300.

Release note: None

kvserver: introduce a hard cap on range sizes

This patch introduces a new cluster setting called
`kv.range.range_size_hard_cap`. It's configured in bytes and places an
absolute value cap on how large we'll allow ranges to get before
backpressuring writes. We default it to 8GB.

This setting is inteded to be the last defense against absurdly large
ranges. In particular, it guards against risks where backpressure is
otherwise disabled due to other cluster settings. See
`kv.range.backpressure_byte_tolerance` for an explanation of how this
is possible.

Informs https://github.com/cockroachdb/cockroach/issues/128082

Release note (ops change): this patch introduces a new cluster setting,
`kv.range.range_size_hard_cap`, which provides a hard cap on how large
we'll let a range get before applying backpressure. It's defaulted to
8 GB, which is 16 times the default max range size. Users who have
manually changed the max range size >= 8 GB may either have to increase
this cap or reduce their range size.

roachtest: update `tpcc/mixed-headroom` for shared-process

This commit updates the TPCC mixed-version test so that it works on
shared-process deployments. As usual, we enable split/scatter on
tenants before initializing the workloads.

Epic: none

Release note: None

sql/logictest: fix incorrect operator in test

This is a follow-up to #129013 which changes the incorrectly used
similarity operator, `%`, to the regex match operator, `~`.

Release note: None

sql: support automatic partial statistic collections

We previously triggered full stat refreshes when approximately 20% of a
table's rows were changed. This commit introduces additional partial
statistic 'at extremes' refreshes that occur automatically in new
`TypeAutoCreatePartialStats` jobs when roughly 5% of rows change.

Various new cluster and table settings to configure the frequency of
auto partial stats collections have been introduced, which mimic
existing settings for automatic full stats collection (see release note
for details).

Table settings take precedence over their corresponding cluster settings
if set. Partial stats will never be collected automatically in cases
where automatic full stats wouldn't happen, such as when auto full stat
collection settings are disabled or tables on which auto stats
collections aren't allowed.

Automatic partial stat refreshes follow a probability-based model
similar to automatic full stats to approximate the desired refresh
frequency. We previously decided to trigger a full stat refresh with a
probability proportional to the number of rows affected by mutations on
a table. This has been extended to make a second decision on whether to
trigger a partial stat refresh with a higher likelihood whenever we
decide not to trigger a full refresh.

Unlike auto full stats, multiple `TypeAutoCreatePartialStats` jobs can
run at once in a cluster, as long as they are on different tables.
We don't start these jobs if any manual stat collection job (full or
partial), any auto full stat collection job, or an auto partial stat
collection job on the same table is running. We also don't stop any
new stats jobs from running if there is an auto partial stats job
running in the cluster (except for new auto partial stats jobs on the
same table as one that is running).

Epic: CRDB-19449

Release note (sql change): Partial statistics can now be automatically
collected at the extremes of indexes when a certain fraction and
minimum number of rows are stale (5% and 100 by default). These can
be configured with new table and cluster settings, and the feature is
disabled by default. The new cluster and table settings are:

- `sql.stats.automatic_partial_collection
.enabled`/`sql_stats_automatic_partial_collection_enabled`
  - Defaults to `false`
- `sql.stats.automatic_partial_collection
.min_stale_rows`/`sql_stats_automatic_partial_collection_min_stale_rows`
  - Defaults to `100`
- `sql.stats.automatic_partial_collection
.fraction_stale_rows
`/`sql_stats_automatic_partial_collection_fraction_stale_rows`
  - Defaults to `0.05`

Co-authored-by: Faizaan Madhani <[email protected]>

scpb: modify IndexZoneConfig element

This commit modifies the IndexZoneConfig element
that will become relevant for supporting
`ALTER INDEX ... CONFIGURE ZONE` in the DSC.

Release note: None

schemachanger/scexec: add support for writing subzone configs

This patch adds support for writing subzone configs.

Informs: #117574
Fixes: #128594
Release note: None

schemachanger: add idx zone config opgen rule

This patch adds an opgen rule for idx zone configs
in the ADD direction.

Informs: #117574
Release note: None

sql: add syntax for supporting security definer

This patch adds the syntax needed for supporting security
definer. Right now, if a routine is specified with
`[EXTERNAL] SECURITY {INVOKER | DEFINER}`, it is a no-op.
If the security mode is not specified, the default is
`SECURITY INVOKER`.

This also means that syntax for altering a function
supports the above options as well:
```
alter function f(int) external security definer;
```

Fixes: #128844
Epic: CRDB-37477

Release note: None

server: allow VIEWACTIVITY users access to /_admin/v1/stmtbundle

This commit gives users with the VIEWACTIVITY system grant
access to `/_admin/v1/stmtbundle` on the admin server. This
allows these users to download stmt bundles via DB console.

Epic: none
Part of: #121301

Release note (ui change): Users with VIEWACTIVITY can
download stmt bundles from DB console.

crosscluster/logical: reduce log spam from replanning

Epic: none
Release note: None

ui: use status server apis for stmt bundle ops

Since 23.1 we've been using the sql over http api to view,
request and cancel stmt bundles for fingerprints.  This
caused a regression in the sql permissions required to
use these features. Prior to using sql over http we only
required `VIEWACTIVITY` or `VIEWACTIVITYREDACTED` for these
operations. Now `VIEWSYSTEMTABLE` is also required since the
sql over http api direclty reads from
system.statement_diagnostics_requests as the db console user.

Instead of creating a view and builtins on top of the system
table, let's simply revert to the existing http apis on the
status server that are already properly gated.

This change reintroduces the following APIs to DB Console
- `POST /_status/stmtdiagreports` to request statement bundles
- `GET /_status/stmtdiagreports` to list statement bundles
- `POST /_status/stmtdiagreports/cancel` to cancel statement bundles

Limitations:
This PR does not carry over the added behaviour in 23.1 where we
prevent users from making multiple statement bundle reqs for a
particular fingerprint. This is not really a common case in the
UI since we show 'Waiting' in the overview page for the row
when a stmt bundle is in progress for a fingerprint, which prevents
another request for the same fingerprint.

Epic: none
Fixes: #121301

Release note (bug fix): Users with VIEWACTIVITY sql priv can now
request, view and cancel stmt bundles in DB console.

ui/e2e: add e2e testing for statement bundles

This commit adds some e2e cypress tests for statement
bundle features in DB Console:
* add tests for requesting, cancelling and viewing bundles
for privileged (VIEWACTIVITY, ADMIN) and non-privileged
users
* reformat cypress login command to use a new `users`
fixture that defines users by sql privilege
* add cypress command getUserWithExactPrivileges which
retrieves users from the fixture with the provided
privileges

Release note: None

Part of: #121301

crosscluster/logical: wait for row processors to exit before closing them

Release note: none.
Epic: none.

opt: remove documentation of old flags

In #85935 we introduced the `set` flag to set session settings in
optimizer tests and a few flags that were already controlling session
settings were removed. The documentation of those old flags has been
removed.

Release note: None

opt: disable `GenerateParameterizedJoin` when forcing custom plans

The exploration rule `GenerateParameterizedJoin` is now disabled when
the `plan_cache_mode` session setting is set to `force_custom_plan`.
This prevents possible regressions in plans for this mode in the rare
case that a stable function is not folded during normalization (due to
an error encountered during folding). In most cases, the check for
placeholders and stable functions in `GenerateParameterizedJoin` is
sufficient for preventing the rule from firing when a generic query plan
is not being built—before optimizing a custom plan placeholders are
always replaced with constants and stable functions are usually folded
to constants.

Release note: None

sql: fix minor typo in plan_opt.go

Release note: None

Revert "sql: override StmtTimeout to 0 for background jobs"

This reverts commit 178e453d336e9385a26e2cae75c9de912d98ea5a.

sql: override StmtTimeout to 0 for sessionless IE

For queries issued by a sessionless internal executor,
we inherit values from cluster settings. This means
that setting a cluster-wide statement timeout will
impact processes that use a sessionless internal
executor: background jobs, db console, cloud sql
shell, etc. This behavior doesn't make sense in most
cases; so, this patch will override the statement timeout
to 0s (i.e. none).

For the cloud sql shell, this change makes sense as there
is a default payload request timeout applied to these
queries (5s).

Fixes: #126261
Release note (bug fix): Internally issued queries that are not
initiated within a SQL session no longer respect a statement timeout.
This includes: background jobs, queries issued by the db
console that perform introspection, and the cloud sql shell.

crosscluster: add conflictResolutionType column

This PR adds a column named `conflict_resolution_type` to rows
output by the `SHOW LOGICAL REPLICATION JOBS WITH DETAILS`
command. `conflict_resolution_type` can be either one of the
following string values:

- LWW
- UDF
- DLQ

Epic: CRDB-40869
Fixes: #128960
Release note: None

kvserver: implement basic raft info methods for RACv2

Epic: none
Release note: none

tpcc/large-schema-benchmark: increase txn retry limit

Previously, this roachtest could intermittently fail due to retry errors
on the system.descriptors table during import. To address this, this
patch bumps up the txn retry limit to stabilize the test.

Fixes: #128251, #129170

Release note: None

scripts/ldr: use max-rate on workload

Release note: none.
Epic: none.

build: update golang.org/x/tools and fix some lint failures

Update `golang.org/x/tools` and fix new `lintonbuild` failures
where we were passing an arbitrary string as a format string.

Epic: none
Release note: None

logical: skip TestLogicalJobResiliency under deadlock

This test is having issues with LDR under deadlock, probably due to
scattering the table prior to setting up LDR. The other unit tests
are fine due to deadlock because they scatter after creating the stream,
but that's not possible with this test.

Release note: none
Fixes: #128972

rangefeed: move rangefeed test helpers into separate file

This PR moves some rangefeed test helpers into a separate file so some
can be exported.

Epic: None

Release notes: None

rangefeed: updates to rangefeed test helpers

Creates a new, exported test helper method GenerateRandomizedSpans that
can generate N non-overlapping spans. Also exports GenerateRandomizedTs
for use in other tests, and provides an argument to provide the maximum
number of nanoseconds the timestamp should have.

Epic: None

Release notes: None

cli: remove accidental checked in file

Previously, testDump.raw file was committed as part of #127822. This change
removes unwanted checked in file.

Epic: None
Release note: None

roachtest: actually run backup-restore/mixed-version in shared-process

An egregious oversight.

Epic: none

Release note: None

crosscluster/logical: use UDF as an "applier"

Previously, we were using the UDF as a fallback when an insert hit a
conflict. We've subsequently decided that we think running the UDF on
every proposed row is more useful.

Fixes #129036
Release note: none

changefeedccl: minor code move in coreChangefeed

Release note: None

roachprod: fix cpu reuse check when using workload nodes

The reuse cluster check did not account for tests that use
the workload node spec, which may have less cpus than the
rest of the cluster.

Fixes: #129440
Release note: none

crosscluster/logical: cancel producer job when consumer cancels

Previously, we were leaving the producer job in place when the
consumer job was canceled. The producer job would time out after 24
hours. However, until then, the PTS record on the source would remain.

In some ways this is nice as it allows the user to resume the stream
from a cursor in the case where they have the need to cancel and
resume the job for some reason.  However, our goal is that there are
no such cases (the user should simply need to PAUSE/ALTER/RESUME in
such cases).

Release note: None
Epic: none

crosscluster/logical: cleanup logging in tests

Nearly every caller of this function was logging before calling
it. Here we move that log statement inside the helper function.

Epic: none
Release note: None

crosscluster/logical: cleanup producer job lookup in tests

The LDR job has a reference to the producer job, so we can look it up
directly rather than relying on listing the jobs in the source
cluster.

Epic: none
Release note: None

crosscluster/logical: add mode option for kv vs sql writer

Release note: none.
Epic: none.

crosscluster/logical: skip TestLogicalStreamIngestionJobWithFallbackUDF

Epic: none
Release note: None

changefeedccl: only update pts if behind checkpoint

A changefeed updates its PTS if a checkpoint happens and the
time interval changefeed.protect_timestamp_interval has passed. However,
whenever the changefeed restarts the interval is lost. Previously, we
would conservatively update the PTS, but this could lead to overwhelming
the PTS and other system tables.

This PR introduces a cluster setting `changefeed.protect_timestamp.lag`,
which controls how much the changefeed PTS should lag behind the high
watermark. Before updating the PTS, the changefeed reads the PTS record
and if the old PTS is within `changefeed.protect_timestamp.lag` it skips
updating the PTS record. If more than `changefeed.protect_timestamp.lag`
has passed between the old PTS and the high watermark, the PTS is
updated.

Epic: None
Fixes: #129509

Release note (enterprise change): Only update the PTS if
`changefeed.protect_timestamp.lag` has passed between the last PTS and
the changefeed high watermark.

sql: deprecate enforce_home_region_follower_reads_enabled

Informs: #113765

Release note (sql change): Session variable
`enforce_home_region_follower_reads_enabled` is now deprecated, and will
be removed in a future release. (Note that related session variable
`enforce_home_region` is _not_ deprecated.)

kvserver: fix the debugging message for large range

Previously the debug message was printing the min size rather than the
max size.

Epic: none

Release note: None

roachtest: allow large max range size

After the introduction of #129450, there is a default max range size of
8GiB. This test explicitly attempts to set the ranges to 10GiB and fails
during setup. This commit increases the max range size for this test.

Fixes: #129556
Fixes: #129552
Fixes: #129551
Fixes: #129544
Fixes: #129542
Fixes: #129540
Fixes: #129541
Fixes: #129539

Epic: none

Release note: None

storage: add additional trace statements for GC; fix counting

Currently, when performing a GC of point or range keys, we log a trace
statement at completion of the method. If an error is encountered, it
would be useful to surface it.

Update the  point and range key GC methods to print any error
encountered during the GC.

Improve the trace statements for both the point and range key methods,
indicating the number of incoming point / range keys, and the number of
resulting point / range keys that were removed, along with the time
taken to perform the GC.

Add additional trace statements for the range key method, logging the
range key fragments that were removed.

Fix the `count` in `MVCCGarbageCollectRangeKeys`, capturing the count of
range key fragments that were GC'd.

Fix #129419.

Release note: None.

kvserver: add rangekey count to mvccc gc trace statement

An batch pertaining to a `GCRequest` can contain zero or more point or
range keys. The non-zero case corresponds to the GC of the individual
point or range keys, while the zero case relates to bumping the GC
threshold.

To aid in debugging, add the range key count to the existing trace
statement. This makes it simpler to tell what each batch is doing.

Release note: None.

team: remove replication from teams.yaml

The `replication` team was merged into `kv`.

Update the `TEAMS.yaml` files (parts `ii` and `iii` from [this doc][1]).

[1]: https://cockroachlabs.atlassian.net/wiki/spaces/devinf/pages/3692167169/Merging+teams+into+KV+-+2024-06#GitHub

Release note: None.

roachtest: move replication into kv; remove replication

The Replication team was merged with KV.

Update the owners of various tests from `OwnerReplication` to `OwnerKV`.

Release note: None.

storage: improve code comments in range_key_clear test

Update the diagram reflecting the test setup, which was missing a span
and a tombstone.

Add diagrams reflecting the "after" state of each operation. This makes
the data-driven test easier to read.

Release note: None.

crosscluster/logical: hide key_value_bytes column in dlq

The key_value_bytes column can really only be used for internal debugging, so
might as well hide it from users running SELECT *.

Release note: none

roachtest: add PCR mixed version test

This patch adds a mixed version PCR roachtest which begins PCR on 24.2,
launches a light tpcc workload on the source tenant, upgrades both clusters in
a somewhat coordinated dance, runs cutover, and fingeprints both clusters. The
test asserts that the stream can catch up at various states as well.

Some follow up work includes:
- allow upgrades as early as 24.1 and 23.2
- version skipping from 24.1 to 24.3
- running the initial scan and cutover in destination side mixed version
  states.

Epic: none

Release note: None

replica_rac2: promptly return tokens on descriptor change

Epic: CRDB-37515

Release note: None

stats: add support for most common values in histograms

Fixes #71828

Release note (sql change): Added a new cluster setting to control
whether most common values are collected as part of histogram collection
for use by the optimizer. The setting is called
sql.stats.histogram_buckets.include_most_common_values.enabled. When enabled,
the histogram collection logic will ensure that the most common sampled values
are represented as histogram bucket upper bounds. Since histograms in
CockroachDB track the number of elements equal to the upper bound in addition
to the number of elements less, this allows the optimizer to identify the most
common values in the histogram and better estimate the rows processed by a
query plan. To set the number of most common values to include in a histogram,
a second setting sql.stats.histogram_buckets.max_fraction_most_common_values
was added. Currently, the default is 0.1, or 10% of the number of buckets. So
with a 200 bucket histogram, by default, at most 20 buckets may be adjusted to
include a most common value (or "heavy hitter") as the upper bound.

kvserver: wiring of RACv2 to eval, and v1 => v2 transition code

The replicaFlowControlIntegration object for v1, which is a member of
Replica.mu, is destroyed and replaced by a noop implementation.

The admissionDemuxHandle implements the new ReplicationAdmissionHandle
and handles a switch from v1 to v2 during the wait.

At raft log entry encoding time, for the entries that were subject to
replication admission control, we look at the latest value of the
EnabledWhenLeaderLevel to decide whether we can use the v2 encoding.

Fixes #129129
Fixes #128756

Epic: CRDB-37515

Release note: None

logictest: re-enable read committed logic tests

These were accidentally all skipped when
abda1f931fc69a5ee3edf153af73061864ec26f9 was merged.

Now we run them all again by fixing the logic test generation script.

Release note: None

orchestration: released CockroachDB version 24.2.0. Next version: 24.2.1

Release note: None
Epic: None
Release justification: non-production (release infra) change.

server: wrap ctx err on cancelation when iterating over nodes

When iterating over all nodes, if the context is done causing the
iteration to cancel, wrap the context error to include it for consumers
of the error.

Epic: None
Fixes: #129531

Release note: None

roachtest: set min supported version to 23.2 in tpcc/mixed-headroom

When trying to use `workload import fixtures`, we need to be in a 23.2
release. This command fails in a lot of 23.1 releases as it was not
supported (unimplemented error).

Fixes: #129632

Release note: None

go.mod: bump Pebble to de41143371aa

Changes:

 * [`de411433`](https://github.com/cockroachdb/pebble/commit/de411433) tool: fix up `wal dump`, add `wal dump-merged`
 * [`9b79e826`](https://github.com/cockroachdb/pebble/commit/9b79e826) vfs: make Unwrap part of FS, fix metamorphic restart op
 * [`f8a32c98`](https://github.com/cockroachdb/pebble/commit/f8a32c98) vfs: convert MemFS tests to datadriven
 * [`3d1f61aa`](https://github.com/cockroachdb/pebble/commit/3d1f61aa) sstable: convert RawWriter into an interface
 * [`583859f8`](https://github.com/cockroachdb/pebble/commit/583859f8) colblk: add index block writer, reader, iterator
 * [`de9087ca`](https://github.com/cockroachdb/pebble/commit/de9087ca) sstable: genericize iterators over index block iterator type
 * [`1b4021bc`](https://github.com/cockroachdb/pebble/commit/1b4021bc) colblk: store shared and bundle prefix length
 * [`99abcf76`](https://github.com/cockroachdb/pebble/commit/99abcf76) colblk: add UnsafeOffsets.At2 and improve UnsafeUints.At
 * [`55fcc6e9`](https://github.com/cockroachdb/pebble/commit/55fcc6e9) colblk: special case zero bitmaps
 * [`8ec212a4`](https://github.com/cockroachdb/pebble/commit/8ec212a4) colblk: use regular slice in PrefixBytesIter
 * [`94561af6`](https://github.com/cockroachdb/pebble/commit/94561af6) colblk: add some minor comments
 * [`d4348d08`](https://github.com/cockroachdb/pebble/commit/d4348d08) lint: remove golint
 * [`aa96305a`](https://github.com/cockroachdb/pebble/commit/aa96305a) sstable: fix nil pointer in twoLevelIterator.NextPrefix
 * [`9f97b7bf`](https://github.com/cockroachdb/pebble/commit/9f97b7bf) sstable: fix TestBlockProperties[BoundLimited] to use testkeys.Comparer
 * [`de53a981`](https://github.com/cockroachdb/pebble/commit/de53a981) rowblk: add IndexIter type
 * [`aa32f0b0`](https://github.com/cockroachdb/pebble/commit/aa32f0b0) db: fix WAL offset in replayWAL errors

Release note: none.
Epic: none.

build: cleanup some stale files in `build` directory

`build/variables.mk` is no longer used. For anyone who wants to use the
`make` build machinery, `make` will auto-generate the file. We also
`gitignore` it. We also delete `werror.sh` which is no longer used.

Epic: none
Release note: None
Release justification: Non-production code changes

sql: align the behavior of 'infinity' timestamp with Postgres

Previously, "SELECT 'infinity'::timestamp" would return a concrete
timestamp instead of "infinity", which is incompatible with
Postgres, causing issue #41564.

This commit addresses this issue by:

- Returning "infinity" for 'infinity'::timestamp, supporting both text
  and binary formats.

This commit also updates the cmp/eval behavior of "infinity" timestamps.
The updated behavior is the same as Postgres:

- "infinity" is always larger than other timestamps, and "infinity"
  equals itself.
- "-infinity" is always smaller than other timestamps, and "-infinity"
  equals itself.
- "infinity"/"-infinity" add or subtract any duration results in
  itself.

Fixes: #41564

Release note (bug fix): Fixed a bug where 'infinity'::timestamp
returned a different result than Postgres.

raft: do not campaign if supporting fortified leader

This patch ensures that peers that have promised fortification support
to a leader do not campaign unless StoreLiveness indicates support for
the leader's epoch has expired. This patch only affects campaigning
(either directly or in response to the randomized election timeout
elapsing) -- in a subsequent patch, we'll handle voting.

Note that once StoreLiveness indicates support for the leader has
expired, we aim to act swiftly. In particular, care is taken to
ensure a follower doesn't need to wait out its election timeout before
campaigning. We do preserve randomness to prevent hung elections though.

Informs https://github.com/cockroachdb/cockroach/issues/125351

Release note: None

roachtest: fix log message in test

Fix erroneous log message in previous commit.

Epic: none

Release note: None

kv: don't campaign on non-epoch lease redirect

Informs #125254.

Doing so is not needed and will be a no-op if the follower is fortified
anyway (see next commit).

Release note: None

kv: comment that manual campaign is no-op for fortified followers

Informs #125254.

This commit adds a comment mentioning that calls to campaignLocked will
be no-ops (as of #129158) for fortified followers.

Release note: None

kvserver: provide tenant id to Processor in OnDescChangedLocked

The tenant id is known at the time when the RangeDescriptor is first set.
We simply pass the tenant id with every call to OnDescChangedLocked.

Informs #129129

Epic: CRDB-37515

Release note: None

crosscluster/logical: check ctx cancel on err/retry/dlq paths

We don't want to keep trying batches or DLQ things if execution has been canceled.

Release note: none.
Epic: none.

master: Update pkg/testutils/release/cockroach_releases.yaml

Update pkg/testutils/release/cockroach_releases.yaml with recent values.

Epic: None
Release note: None
Release justification: test-only updates

workflows: disable cockroach-microbench-ci step for pull requests

This change disbales the CI step for pull requests but this still runs
for push step into the target branch. Also the count of the
microbenchmarks are increased to get more robust results.

Epic: none

Release note: None

workflow: increase test timeout in cockroach microbench ci

The step started timing out after
https://github.com/cockroachdb/cockroach/pull/129390 because test.count
was increased to 10 and the timeout used was 900 seconds. This change
bumps it to 1800 seconds (i.e. 30 min)

Epic: none

Release note: None

drtprod: add ttl jobs for ldr cluster.

To prevent disk capacity issue configured ttl jobs on tables.

Epic: None

Release note: None

rac2: add token tracker

This commit introduces the token `Tracker`, which will be used to track
token deductions for in-flight entries at a given `RaftPriority`.

Resolves: #128026
Release note: None

rpc: add source and destination to the peer labels

For certain tooling it is important to differentiate between the
locality tag of the local node from the locality tag of the remote node.
By adding both the local and the destination, it allows those tools to
understand the source and destination of the connections.

Epic: CRDB-41138

Release note: None

restore: allow empty virtual sst in online restore

Previously, only restore would fast fail if any backup spans were empty because
pebble could not read from an empty virtual sst. Pebble has gotten smarter, so
we can remove this check.

Fixes #129610

Release note: none

sql: add REPEATABLE READ to parser help text

This commit adds "REPEATABLE READ" to the list of supported isolation
levels in the parser help text. The isolation level is still behind a
cluster setting, but it is not worth the effort to gate this help text
on the value of that setting.

Epic: None
Release note: None

sql: add basic parsing for all isolation levels

This commit extends the parser tests to cover all isolation levels.

Epic: None
Release note: None

sql: rename snapshot isolation enabled setting

This commit renames the setting that enables snapshot/repeatable read
isolation from `sql.txn.snapshot_isolation.enabled` to
`sql.txn.repeatable_read_isolation.enabled`. This is part of a larger
change to favor the term "repeatable read" over "snapshot" at the SQL
level.

Epic: None
Release note: None

licenseccl: new license types

Epic: CRDB-39988
Closes: CRDB-40069
Release note: none

roachtest/grafana: add SQL metrics to CDC dashboard

This patch adds a row to the CDC roachtest grafana dashboard with a few
SQL metrics (SQL Statements, SQL Byte Traffic, Service Latency) that
are of interest while doing performance testing. This addition is
valuable because the TestEng Grafana instance currently only works with
GCE roachtests, and there are times when we need to run tests on other
cloud platforms like AWS.

Release note: None

upgrades: set exclude_data_from_backup on non-system tenants

This commit adds a new migration to set `exclude_data_from_backup` on
tenant system tables with low GC TTL, as was done for the system
tenant in #120144.

Fixes: #129643

Release note: None

ui: fix `dev gen js`

Since we switched to Bazel 7, `bazel` is more rigorous about disallowing
directories as outputs. This means that the `crdb-api-client`
implementation, which was always pretty much incorrect, started entirely
failing to build. To address this we add code generation to list
`.proto` files and use that instead of trying to use the directory
`dist` as an output file which Bazel does not want.

Closes #129369.

Epic: none
Release note: None
Release justification: Non-production code changes

roachtest: mark pgjdbc 'infinity' timestamp tests as passing

These pass now that 63f8a13f10f286457ea74ef198589278d6d3d654 has been
merged.

Release note: None

base, licenseccl: Revise resolution strategy for LicenseTTL

Previously, for resolving the cluster's license expiry we relied on a metric
whose value was regularly ticked down by an async task. This change seeks to
only the resolve value at read time by reading directly from the cluster
settings when its requested.

Release note: None

Fixes: CRDB-40592
Epic: CRDB-8035

rac2,replica_rac2: implement Processor.AdmitForEval

- RangeController.WaitForEval is changed to additionally return a waited
  bool, so that it conforms to the callers expectation that (false, nil)
  will be returned when the RangeController is closed during waiting.
  This allows the caller to wait on the local store (before eval) and
  not completely bypass all AC.
- Processor.AdmitForEval consults the existing
  kvadmission.flow_control.mode cluster setting to decide whether a
  request should be subject to replication AC, and if yes, forwards to
  the RangeController.

Fixes #129522

Epic: CRDB-37515

Release note: None

go.mod: bump Pebble to a70d5b3c41b8 and update MemFS usage

Changes:

 * [`a70d5b3c`](https://github.com/cockroachdb/pebble/commit/a70d5b3c) vfs: redesign MemFS strict mode

Release note: none.
Epic: none.

sql: add "local-repeatable-read" logic test config

This commit adds a "local-repeatable-read" logic test configuration,
which runs all logic tests with a default transaction isolation level of
REPEATABLE READ. This provides REPEATABLE READ with significantly more
test coverage.

Epic: None
Release note: None

sql: scaffolding for db metadata update job

This commit creates a new forever running BG job called
UpdateCachedTableMetadata. This is just some scaffolding -
the job doesn't do anything right now but it will be used
to populate the system table storing cached table metadata
used by obs surfaces.

Epic: CRDB-37558
Fixes: #127891

Release note: None

kvserver,rac2: add and plumb RaftEvent

This change updates the RaftEvent structure to reflect the semantics of
a raft storage write, and plumbs it into RACv2 code in a few places.

Epic: none
Release note: none

scjob: treat descriptor not found as permanent job error

Retrying a schema change after hitting a descriptor not found error just
means the job will keep being unable to find it. We mark them as
permanent now to avoid causing infinite retries.

Note that most errors already cause the schema change job to fail, but
if the job is non-cancelable (which it might be in the PostCommit
phase), then it needs to be marked in order to not be retried.

Release note: None

sql: fix checkClusterSettingValuesAreEquivalent for final release fence versions

When the `internal` field of a clusterversion is `0`, which
it would be for all final cluster versions, the corresponding
fence version will have an internal version of `-1`.

The logic in checkClusterSettingValuesAreEquivalent did not account for
this. It was assuming that one could always decrement the internal
version by 1 in order to get the predecessor version. Now, we use
ListBetween instead, which does not need this assumption.

Release note (bug fix): Fixed a rare bug in SHOW CLUSTER SETTING that
could cause it to fail with an error like `timed out: value differs
between local setting and KV`.

roachpb,clusterversion: add delegating method for FenceVersion

Release note: None

roachpb: allow negative internal version to roundtrip parse+format

Release note: None

admission: rename ttl-low-pri to bulk-low-pri

Release note (ops change): The lowest admission control priority for the storage layer has
been renamed from 'ttl-low-pri' to 'bulk-low-pri' in a handful of metrics.

Epic: none.

crosscluster/logical: use bulk-low-pri for LDR

Release note: none.
Epic: none.

admission: elastic limit all bulk-low-pri requests

Previously bulk low pri (formerly called ttl low pri) requests were only subject
to elastic limiting if they were also read-only requests. However writing
requests such as those performed by LDR also benefit from elastic limiting.

Release note: none.
Epic: none.

opt/props: add MaxFrequency() function to props.Histogram

This commit adds a new function to props.Histogram called MaxFrequency,
which returns the maximum value of NumEq across all histogram buckets.

This function will be useful to calculate an estimate for the upper bound
of the row count for joins.

Epic: None
Release note: None

crosscluster/logical: store resolved options as values

Epic: none

Release note: none

crosscluster/logical: rename FilterRangefeed to IgnoreCDCIgnoredTTLDeletes

This clarifies how bespoke the cdc ignored ttl delete logic is.

Epic: None

Release note: none

cloud/externalconn: show pg urls

Release note: none.
Epic: none.

sql, schemachanger: disallow ADD FK if referenced table is locked

Release note (bug fix): Fixed a bug where the schema_locked table
parameter did not prevent a table from being referenced by a foreign
key.

roachtest: Update readme stage command

While running through the readme I ran into two issues related to the step
which fetches a linux binary:
1. The binary version is no longer available at the location listed (21.1.5)
2. The architecture must be specified, otherwise roachtest will not accept it

This change addresses this items.

Epic: none
Release note: None

server/license: store cluster init time in the KV system keyspace

We need to store cluster-level metadata for license policy enforcement.
This metadata must be protected from modification, so it cannot be
stored in any system table. This PR introduces a new field in the system
keyspace to store the cluster's initialization start time, which will be
used in a follow-up PR to track the grace period for clusters without a
license. The metadata needs to be stored at the cluster level to align
with the license's scope.

A new struct, License.Enforcer, is responsible for writing and caching
the new timestamp. This struct will be extended in future PRs to check
for throttling in clusters that violate our license policies.

This update will be backported to versions 23.1, 23.2, 24.1, and 24.2.

Epic: CRDB-39988
Informs: CRDB-40064
Release note: None

raft: export log mark type

Epic: none
Release note: none

crosscluster/physical: defensively check errCh in span config event stream

Select does not choose which channel to read from based on some order, but the
span config event stream must return as soon as there is an error. For this
reason, this patch rechecks the errCh after select chooses to read from the
data channel.

Informs #128865

Release note: none

replica_rac2: lift routing from admit callback args

Epic: none
Release note: none

sql: don't consider repeatable read to snapshot an upgrade

One isolation level is not stronger than th…
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 22, 2024
Informs cockroachdb#125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch),
allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as
we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239.
These are tracked in the following issues, which we may need to address ASAP:
* cockroachdb#127349
* cockroachdb#127348

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 22, 2024
Informs cockroachdb#125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch),
allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as
we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239.
These are tracked in the following issues, which we may need to address ASAP:
* cockroachdb#127349
* cockroachdb#127348

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 23, 2024
Informs cockroachdb#125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch),
allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as
we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239.
These are tracked in the following issues, which we may need to address ASAP:
* cockroachdb#127349
* cockroachdb#127348

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 25, 2024
Informs cockroachdb#125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch),
allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as
we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239.
These are tracked in the following issues, which we may need to address ASAP:
* cockroachdb#127349
* cockroachdb#127348

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 28, 2024
Informs cockroachdb#125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch),
allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as
we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239.
These are tracked in the following issues, which we may need to address ASAP:
* cockroachdb#127349
* cockroachdb#127348

Release note: None
craig bot pushed a commit that referenced this issue Oct 28, 2024
133169: raft: assert on HardState field modifications r=nvanbenschoten a=nvanbenschoten

Informs #125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch), allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239. These are tracked in the following issues, which we may need to address ASAP:
* #127349
* #127348

The 2nd, 3rd, and 4th commits in the PR address these issues in raft by not assuming that a `MsgSnap` as always coming from the leader of `Message.Term`.

Release note: None

133473: sql,kvflowcontrol: include sum of (eval|send) tokens deducted r=sumeerbhola a=kvoli

Populate two new `kvflowinspectpb.ConnectedStream` fields:

```go
// TotalEvalDeductedTokens represents the total number of tokens deducted for
// evaluation, for this connected stream. Only populated on versions >= 24.3.
TotalEvalDeductedTokens int64
// TotalSendDeductedTokens represents the total number of tokens deducted
// before sending, for this connected stream. Only populated on versions >=
// 24.3.
TotalSendDeductedTokens int64
```

These may be useful to gander at the tokens deducted, which are
otherwise not tracked. Also, for a quick check.

Note that RACv1 does not populate these variables, as it is soon to be
deprecated and these fields don't directly relate, unlike RACv2.

---

Update `crdb_internal.kv_flow_control_handles_v2` to include:

```sql
total_eval_deducted_tokens INT NOT NULL,
total_send_deducted_tokens INT NOT NULL,
```

Corresponding to the total number of `eval` and `send` tokens deducted
by the connected stream, in RACv2. Note the `handles_v1` virtual table
remains unchanged.

The DDL is now:

```sql
CREATE TABLE crdb_internal.kv_flow_control_handles_v2 (
  range_id                   INT NOT NULL,
  tenant_id                  INT NOT NULL,
  store_id                   INT NOT NULL,
  total_tracked_tokens       INT NOT NULL,
  total_eval_deducted_tokens INT NOT NULL,
  total_send_deducted_tokens INT NOT NULL,
  INDEX(range_id)
);
```

Resolves: #132777
Release note: None

133497: build: upgrade `rules_go` and `gazelle` r=rail,tbg,stevendanna a=rickystewart

We also drop a [commit on top of our `rules_go` fork](RaduBerinde/rules_go@b2411a8) that is in opposition to #102619.

Closes #102619
Closes #133289
Closes #133464

Epic: CRDB-17171
Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Oct 28, 2024
Informs #125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch),
allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as
we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239.
These are tracked in the following issues, which we may need to address ASAP:
* #127349
* #127348

Release note: None
blathers-crl bot pushed a commit that referenced this issue Oct 28, 2024
Informs #125352.

This commit adds setters for the HardState fields (Term, Vote, lead, leadEpoch),
allowing us to assert that these fields are only modified in expected ways.

Some bugs that this change could catch:
- term regressions
- vote changes
- lead changes in a single term
- leadEpoch regressions

The "lead changes in a single term" class of bugs is especially interesting, as
we have found two of these in this thread: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1720274557285239.
These are tracked in the following issues, which we may need to address ASAP:
* #127349
* #127348

Release note: None
@arulajmani
Copy link
Collaborator Author

We've added a bunch of assertions as we've done the leader leases work, so I'll close this out. I'm sure there will be more that we add here in the future, but this seems like the correct stopping point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-leader-leases Related to the introduction of leader leases C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

2 participants