merge release-23.2.12-rc to release-23.2: released CockroachDB version 23.2.12. Next version: 23.2.13 #131582

cockroach-teamcity · 2024-09-30T07:02:53Z

Release note: None
Epic: None
Release justification: non-production (release infra) change.

Introduce the `ranges.decommissioning` gauge metric, which represents the number of ranges with at least one replica on a decommissioning node. The metric is reported by the leaseholder, or if there is no valid leaseholder, the first live replica in the descriptor, similar to (under|over)-replication metrics. The metric can be used to approximately identify the distribution of decommissioning work remaining across nodes, as the leaseholder replica is responsible for triggering the replacement of decommissioning replicas for its own range. Informs: cockroachdb#130085 Release note (ops change): The `ranges.decommissioning` metric is added, representing the number of ranges which have a replica on a decommissioning node.

When `kv.enqueue_in_replicate_queue_on_problem.interval` is set to a positive non-zero value, leaseholder replicas of ranges which have decommissioning replicas will be enqueued into the replicate queue every `kv.enqueue_in_replicate_queue_on_problem.interval` interval. When `kv.enqueue_in_replicate_queue_on_problem.interval` is set to 0, no enqueueing on decommissioning will take place, outside of the regular replica scanner. A recommended value for users enabling the enqueue (non-zero), is at least 15 minutes e.g., ``` SET CLUSTER SETTING kv.enqueue_in_replicate_queue_on_problem.interval='900s' ``` Resolves: cockroachdb#130085 Informs: cockroachdb#130199 Release note: None

…2.11-rc-130117 release-23.2.11-rc: kvserver: enqueue decom ranges at an interval behind a setting

This reverts commit 1a5279e. Release note: None.

This change extends response for `uiconfig` endpoint which now contains information about license type and time until license expires. Release note: None

With this change, new alert message is shown in Db Console when license is expired or less than 15 days left before it will expire. This change doesn't affect clusters that doesn't have any license set. Release note (ui change): show alert message in Db Console when license is expired or less than 15 days left before it expires.

This change adds a dismissable alert to the Overview page of DB Console that informs users about upcoming license changes. This popup is only shown if the cluster does not have an active "Enterprise" license The popup links to this page: "https://www.cockroachlabs.com/enterprise-license-update/" When the popup is dismissed, the dismissal is stored in the DB for this user and they don't see this notification again. Resolves: CRDB-40939 Release note (ui change): DB Console will show a notification alerting customers without an Enterprise license, to upcoming license changes with a link to more information.

…comparator-revert release-23.2.11-rc: Revert "release-23.2: storage: fix comparison of suffixes"

…-rc-120475-120490-129420 release-23.2.11-rc: ui: add license change notification to db console

Previously the cidr metrics were only started for the system tenant. This was problematic for SQL tenants since the mapping wouldn't be updated. Fixes: cockroachdb#130708 Release note: None

This commit adds a cluster setting (turned off by default) that sets the period at which manual liveness range compactions are done. This is done in a goroutine rather than in MVCC GC queue because: 1) This is meant to be a stop gap as this in not needed in 24.3 onwards. Therefore, a simple change like this should achieve the goal. 2) The MVCC GC queue runs against leaseholder replicas only. This means that we need to send a compaction request to the other liveness replicas. Fixes: cockroachdb#128968 Epic: None Release note: None

Previously we didn't include the locality of the remote node when we dialed a node. This prevented us from capturing locality aware stats for the connections. Epic: CRDB-41138 Release note: none

This commit adds the nodes locality information into the ContextOptions. This allows metrics to consult this to determine if a connection is from a remote locality. Epic: CRDB-41138 Release note: None

Some of the places that call UnvalidatedDial have the locality. By passing it in when it is known they will more accurately update the statistics. Epic: CRDB-41138 Release note: None

Extract a constant to make it easier to change the expected count. Epic: CRDB-41138 Release note: None

Previously we didn't track bytes sent and received per node. This commit adds the metrics for these. Additionally it adds a metric for the connected count from a TCP perspective as this may be different than the healthy or unhealty counts. Epic: CRDB-41138 Release note (ops change): Adds three new network tracking metrics. `rpc.connection.connected` is the number of rRPC TCP level connections established to remote nodes. `rpc.client.bytes.egress` is the number of TCP bytes sent via gRPC on connections we initiated. `rpc.client.bytes.ingress` is the number of TCP bytes received via gRPC on connections we initiated.

Previously the metric was not threadsafe and this prevented it from being shared by multiple connections. The value is only updated on heartbeat messages, so adding syncronization here should not cause any performance issues. Epic: CRDB-41138 Release note: None

The Metrics object stores all the metrics for the connection and was previously passed by value. This PR passes it by pointer instead to allow more complex state within the Metrics object in future commits. Epic: CRDB-41138 Release note: None

Previously the metrics for ConnectionHealthy, ConnectionUnhealthy and ConnectionInactive were manually set to 0 or 1. This prevented easily aggregating the peer metrics by something (like locality). This PR changes the way those three metrics are handled to only increment or decrement rather than setting to 0/1. Epic: CRDB-41138 Release note: None

Retain metrics across dropped/re-established connections. If we delete and unlink the counters if we later recreate it the counts will start at zero after a reconnect. Instead we track the counters in a map and reuse them later if the key matches. Epic: CRDB-41138 Release note: None

Previously we would publish network stats broken down by every remote node. This would result in a large number of stats for large clusters. In practice we can aggregate them by remote localities. This reduces the number of stats with only a minimal loss of visibility into how the network is being used. Epic: CRDB-41138 Release note: None

This test flakes under stress/race conditions due to the use of network ports. Skipping under stress. Epic: None Release note: None

For certain tooling it is important to differentiate between the locality tag of the local node from the locality tag of the remote node. By adding both the local and the destination, it allows those tools to understand the source and destination of the connections. Epic: CRDB-41138 Release note: None

This commit adds a new utility which can store and efficiently process a large number of CIDR records by mapping them to a unique name. Epic: CRDB-41142 Release note: None

This commit constructs the cidr.Lookup into the sql server ExecutorConfig and evalContext. Additionally this enables the cidr mapping and adds a new configuration parameter for it. Epic: CRDB-41142 Release note (ops change): Adds a new configuration parameters server.cidr_mapping_url which maps IPv4 CIDR blocks to arbitrary tag names.

Previously the writeBuffer required a specific Metric implementation as part of its paramater. This made it more complicated to change the metric type that was passed in. Epic: none Release note: None

This commit changes the SQL byte metrics to be broken down by cidr block of the source. Epic: CRDB-41142 Release note (ops change): Modifies metrics sql.bytesin and sql.bytesout to be agg metrics if child metrics are enabled.

Previously the test was assuming the setting would propagate syncronously. This could fail under stress and race conditions. Epic: none Release note: None

Adds a utility to cidrLookup to created a DialContext that is tracked based on the lookup. This works for any third-party libraries that expose a way to set the DialContext. Epic: none Release note: None

Unfortunately, the WithHTTPClient option overrides all other options when constructing a GCS client. As a result, it appears we can not both set credentials options and set an HTTP client with custom configs via the primary API. Here, we construct a transport using the SDK which allows us to attach the relevant credential options to the transport directly before making the HTTP client. The downside here is that the SDK's NewTransport's documentation says that it is not intended for end-user use -- so we may expect breakage in the future. Epic: none Release note: None

Previously the cidr http check did not return true when it completed. This adds the true return and additionally adds logging for other failure cases. Epic: none Release note: None

Previously if the `server.cidr_mapping_url` was set and a node restarted, there was a race condition where `SetOnChange` for the setting could be called before the `Start` was called. This could result in it blocking while attempting to submit to the channel. Fixes: cockroachdb#130589 Release note: None

For changefeeds we need some additional network wrapping methods. This commit also adds testing to Wrap, WrapTLS and WrapDialer. Part of: cockroachdb#130097 Epic: none Release note: None

Adds network tracing infrasturcture to changefeeds. This commit adds the metrics but does not populate them for any of the existing changefeeds. Part of: cockroachdb#130097 Epic: none Release note (ops change): This commit adds two metrics changefeed.network.bytes_in and changefeed.network.bytes_out. These metrics track the number of bytes sent by the individual changefeeds to different sinks.

Add support for the cidr network metrics to the kafka v1 sink. Part of: cockroachdb#130097 Release note (enterprise change): Added network metrics to the kafka v1 sink.

Add network metrics to cdc webhook sinks. Part of cockroachdb#130097 Release note (enterprise change): Added network metrics to webhook sinks.

Add support for the cidr network metrics to the pubsub sinks. Part of: cockroachdb#130097 Release note (enterprise change): Added network metrics to the pubsub sinks.

Add support for the cidr network metrics to the sql sink. Part of: cockroachdb#130097 Release note (enterprise change): Added network metrics to the sql sink.

This commit adds the network metrics for the kafka sink. Epic: none Release note: None

….12-rc-130521-130528-130664 release-23.2.12-rc: all of the network metrics

…2.12 Release note: None Epic: None Release justification: non-production (release infra) change.

…ort-release-23.2.12-rc-130709 release-23.2.12-rc: util: publish cidr metrics for tenants

The `TestServerController` test server stops quickly (due to deferred stop) after executing `CREATE TENANT hello` while the creation of the tenant is ongoing in `newTenantServer`. This causes `baseCfg.CidrLookup.Start` in `newTenantServer` to fail with `ErrUnavailable` because `s.runPrelude()` in `stopper.RunAsyncTask` returns true if a server is stopping: https://github.com/cockroachdb/cockroach/blob/3bf34dc3a192d7efeee8aa97e46bf73f817b2b9b/pkg/util/stop/stopper.go#L469-L471. Fixes: cockroachdb#130757 Epic: CRDB-42208 Release note: None

…-rc-129827 release-23.2.12-rc: kvserver: compact liveness range periodically

Narrowed down scope of counter filters in order to not catch stray increment events from background queries. Resolves: cockroachdb#128045, cockroachdb#128171 Release note: None

This test flakes in cases where we run a query and expect the `sql.plan.type.force-custom` to not get incremented. This can't be guaranteed as this is the default counter and it occasionally gets bumped by background operations. There's no easy way to prevent these from happening so these cases are removed from this suite. Resolves: cockroachdb#128523, cockroachdb#128640 Epic: None Release note: None

…e-23.2.12-rc-128383-128715 release-23.2.12-rc: telemetryccl_test: fix TestTelemetry

…ort-release-23.2.12-rc-130850 release-23.2.12-rc: server, util: fix failing TestServerController

This commit adds a new changefeed testing knob, AsyncFlushSync, which can be used to introduce a synchronization point between goroutines during an async flush. It's currently only used in the cloud storage sink. Epic: none Release note: none

Adds a test that reproduces a memory leak from pgzip, the library used for fast gzip compression for changefeeds using cloud storage sinks. The leak was caused by a race condition between Flush/flushTopicVerions and the async flusher: if the Flush clears files before the async flusher closes the compression codec as part of flushing the files, and the flush returns an error, the compression codec will not be closed properly. This test uses the AsyncFlushSync testing knob to introduce synchronization points between these two goroutines to trigger the regression. Co-authored by: wenyihu6 Epic: none Release note: none

When using the cloud storage sink with fast gzip and async flush enabled, changefeeds could leak memory from the pgzip library if a write error to the sink occurred. This was due to a race condition when flushing, if the goroutine initiating the flush cleared the files before the async flusher had cleaned up the compression codec and received the error from the sink. This fix clears the files after waiting for the async flusher to finish flushing the files, so that if an error occurs the files can be closed when the sink is closed. Co-authored by: wenyihu6 Epic: none Fixes: cockroachdb#129947 Release note(bug fix): Fixes a potential memory leak in changefeeds using a cloud storage sink. The memory leak could occur if both changefeed.fast_gzip.enabled and changefeed.cloudstorage.async_flush.enabled are true and the changefeed received an error while attempting to write to the cloud storage sink.

…12-rc-130204 release-23.2.12-rc: changefeedccl: fix memory leak in cloud storage sink with fast gzip

…ort-release-23.2.12-rc-130789 release-23.2.12-rc: released CockroachDB version 23.2.11. Next version: 23.2.12

Previously the code would encounter an index out of bounds if the cidr mapping file had a cidr length greater than 32 bits. This could only happen with IPv6 addresses. Note that if there any invalid mappings the code will display the error in the logs but won't process any of the file. The code already handled mapping lookups for IPv6, but these code changes also make that more explicit. Epic: none Informs: cockroachdb#130814 Release note: None

We don't use the WrapTLS method and it is better to remove it and if we need it in the future bring it back. Epic: none Release note: None

…ort-release-23.2.12-rc-131221 release-23.2.12-rc: util: don't panic on IPv6 entries in cidr mapping

…n 23.2.12. Next version: 23.2.13 Release note: None Epic: None Release justification: non-production (release infra) change.

cockroach-teamcity · 2024-09-30T07:03:02Z

This change is

kvoli and others added 30 commits September 10, 2024 09:10

Merge pull request cockroachdb#130413 from kvoli/backport-release-23.…

c70f434

…2.11-rc-130117 release-23.2.11-rc: kvserver: enqueue decom ranges at an interval behind a setting

Revert "release-24.2: storage: fix comparison of suffixes"

cf9752b

This reverts commit 1a5279e. Release note: None.

server: include license info into uiconfig endpoint response

cb1bb64

This change extends response for `uiconfig` endpoint which now contains information about license type and time until license expires. Release note: None

Merge pull request cockroachdb#130456 from nicktrav/nickt.23.2.11-rc-…

701e687

…comparator-revert release-23.2.11-rc: Revert "release-23.2: storage: fix comparison of suffixes"

Merge pull request cockroachdb#130509 from dhartunian/backport23.2.11…

174a895

…-rc-120475-120490-129420 release-23.2.11-rc: ui: add license change notification to db console

util: publish cidr metrics for tenants

b0f14af

Previously the cidr metrics were only started for the system tenant. This was problematic for SQL tenants since the mapping wouldn't be updated. Fixes: cockroachdb#130708 Release note: None

rpc: add Locality to dialing

e557d20

Previously we didn't include the locality of the remote node when we dialed a node. This prevented us from capturing locality aware stats for the connections. Epic: CRDB-41138 Release note: none

rpc: add Locality to the ContextOptions

387e6b2

This commit adds the nodes locality information into the ContextOptions. This allows metrics to consult this to determine if a connection is from a remote locality. Epic: CRDB-41138 Release note: None

rpc: add Locality to UnvalidatedDial

e1c0b12

Some of the places that call UnvalidatedDial have the locality. By passing it in when it is known they will more accurately update the statistics. Epic: CRDB-41138 Release note: None

rpc: extract constant for expected count

fe27e18

Extract a constant to make it easier to change the expected count. Epic: CRDB-41138 Release note: None

rpc: pass Metrics by pointer

bbb47f9

The Metrics object stores all the metrics for the connection and was previously passed by value. This PR passes it by pointer instead to allow more complex state within the Metrics object in future commits. Epic: CRDB-41138 Release note: None

rpc: skip TestTenantGRPCServices under stress

71ddb92

This test flakes under stress/race conditions due to the use of network ports. Skipping under stress. Epic: None Release note: None

util: add utility to process CIDR records

d12572b

This commit adds a new utility which can store and efficiently process a large number of CIDR records by mapping them to a unique name. Epic: CRDB-41142 Release note: None

sql: pass an Inc function to writeBuffer

2614dd0

Previously the writeBuffer required a specific Metric implementation as part of its paramater. This made it more complicated to change the metric type that was passed in. Epic: none Release note: None

sql: restructure sql byte metrics to be add metrics

30ebd93

This commit changes the SQL byte metrics to be broken down by cidr block of the source. Epic: CRDB-41142 Release note (ops change): Modifies metrics sql.bytesin and sql.bytesout to be agg metrics if child metrics are enabled.

util: deflake cidr.TestRefresh

f204626

Previously the test was assuming the setting would propagate syncronously. This could fail under stress and race conditions. Epic: none Release note: None

util: add connection metric tracking utility

647cc2d

Adds a utility to cidrLookup to created a DialContext that is tracked based on the lookup. This works for any third-party libraries that expose a way to set the DialContext. Epic: none Release note: None

andrewbaptist and others added 27 commits September 13, 2024 16:47

util: fix return values for cidr http

da28b99

Previously the cidr http check did not return true when it completed. This adds the true return and additionally adds logging for other failure cases. Epic: none Release note: None

util: add additional wrapping methods to cidr

75b8ba8

For changefeeds we need some additional network wrapping methods. This commit also adds testing to Wrap, WrapTLS and WrapDialer. Part of: cockroachdb#130097 Epic: none Release note: None

changefeedccl: add network metrics to kafka v1 sink

de43d36

Add support for the cidr network metrics to the kafka v1 sink. Part of: cockroachdb#130097 Release note (enterprise change): Added network metrics to the kafka v1 sink.

changefeedccl: add network metrics to webhook sinks

67f8be6

Add network metrics to cdc webhook sinks. Part of cockroachdb#130097 Release note (enterprise change): Added network metrics to webhook sinks.

changefeedccl: add network metrics for the pubsub sinks

eb2024e

Add support for the cidr network metrics to the pubsub sinks. Part of: cockroachdb#130097 Release note (enterprise change): Added network metrics to the pubsub sinks.

changefeedccl: add network metrics to sql sink

2b44109

Add support for the cidr network metrics to the sql sink. Part of: cockroachdb#130097 Release note (enterprise change): Added network metrics to the sql sink.

changefeedccl: add NetMetrics for kafka_v2

870a43c

This commit adds the network metrics for the kafka sink. Epic: none Release note: None

Merge pull request cockroachdb#130712 from andrewbaptist/backport23.2…

9612ab9

….12-rc-130521-130528-130664 release-23.2.12-rc: all of the network metrics

release-23.2: released CockroachDB version 23.2.11. Next version: 23.…

1ecf33f

…2.12 Release note: None Epic: None Release justification: non-production (release infra) change.

Merge pull request cockroachdb#130753 from cockroachdb/blathers/backp…

bf1ecaa

…ort-release-23.2.12-rc-130709 release-23.2.12-rc: util: publish cidr metrics for tenants

Merge pull request cockroachdb#130711 from iskettaneh/backport23.2.12…

42a7a91

…-rc-129827 release-23.2.12-rc: kvserver: compact liveness range periodically

telemetry: deflake generic query plan telemetry test

d14b328

Narrowed down scope of counter filters in order to not catch stray increment events from background queries. Resolves: cockroachdb#128045, cockroachdb#128171 Release note: None

Merge pull request cockroachdb#130877 from kyle-a-wong/backportreleas…

d186c29

…e-23.2.12-rc-128383-128715 release-23.2.12-rc: telemetryccl_test: fix TestTelemetry

Merge pull request cockroachdb#130883 from cockroachdb/blathers/backp…

c2e8070

…ort-release-23.2.12-rc-130850 release-23.2.12-rc: server, util: fix failing TestServerController

Merge pull request cockroachdb#130624 from rharding6373/backport23.2.…

776f2ce

…12-rc-130204 release-23.2.12-rc: changefeedccl: fix memory leak in cloud storage sink with fast gzip

Merge pull request cockroachdb#131188 from cockroachdb/blathers/backp…

8318cbd

…ort-release-23.2.12-rc-130789 release-23.2.12-rc: released CockroachDB version 23.2.11. Next version: 23.2.12

util: remove unused cidr method

7af2609

We don't use the WrapTLS method and it is better to remove it and if we need it in the future bring it back. Epic: none Release note: None

Merge pull request cockroachdb#131237 from cockroachdb/blathers/backp…

c3ddfa7

…ort-release-23.2.12-rc-131221 release-23.2.12-rc: util: don't panic on IPv6 entries in cidr mapping

merge release-23.2.12-rc to release-23.2: released CockroachDB versio…

3cf0d11

…n 23.2.12. Next version: 23.2.13 Release note: None Epic: None Release justification: non-production (release infra) change.

vidit-bhat approved these changes Sep 30, 2024

View reviewed changes

vidit-bhat merged commit 830c239 into cockroachdb:release-23.2 Sep 30, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge release-23.2.12-rc to release-23.2: released CockroachDB version 23.2.12. Next version: 23.2.13 #131582

merge release-23.2.12-rc to release-23.2: released CockroachDB version 23.2.12. Next version: 23.2.13 #131582

cockroach-teamcity commented Sep 30, 2024

cockroach-teamcity commented Sep 30, 2024

merge release-23.2.12-rc to release-23.2: released CockroachDB version 23.2.12. Next version: 23.2.13 #131582

merge release-23.2.12-rc to release-23.2: released CockroachDB version 23.2.12. Next version: 23.2.13 #131582

Conversation

cockroach-teamcity commented Sep 30, 2024

cockroach-teamcity commented Sep 30, 2024