diff --git a/_config_base.yml b/_config_base.yml index f3277939214..cbc1cbb8121 100644 --- a/_config_base.yml +++ b/_config_base.yml @@ -138,12 +138,12 @@ release_info: start_time: 2022-10-13 17:45:03.909127 +0000 UTC version: v21.2.17 v22.1: - build_time: 2023-04-24 00:00:00 (go1.19) + build_time: 2023-05-12 00:00:00 (go1.19) crdb_branch_name: release-22.1 docker_image: cockroachdb/cockroach - name: v22.1.19 - start_time: 2023-04-24 14:56:42.617860 +0000 UTC - version: v22.1.19 + name: v22.1.20 + start_time: 2023-05-10 12:58:20.269520 +0000 UTC + version: v22.1.20 v22.2: build_time: 2023-05-08 00:00:00 (go1.19) crdb_branch_name: release-22.2 diff --git a/_data/releases.yml b/_data/releases.yml index 88098cbfa48..cfb2ecec35d 100644 --- a/_data/releases.yml +++ b/_data/releases.yml @@ -4333,4 +4333,25 @@ docker_image: cockroachdb/cockroach docker_arm: true source: true - previous_release: v23.1.0-rc.2 \ No newline at end of file + previous_release: v23.1.0-rc.2 + +- release_name: v22.1.20 + major_version: v22.1 + release_date: '2023-05-12' + release_type: Production + go_version: go1.19 + sha: c091e9bdfdff6fd6888a2c514b78e57abbb6119d + has_sql_only: true + has_sha256sum: true + mac: + mac_arm: false + windows: true + linux: + linux_arm: false + linux_intel_fips: false + linux_arm_fips: false + docker: + docker_image: cockroachdb/cockroach + docker_arm: false + source: true + previous_release: v22.1.19 diff --git a/_includes/releases/cloud/2023-05-10.md b/_includes/releases/cloud/2023-05-10.md new file mode 100644 index 00000000000..e323354223c --- /dev/null +++ b/_includes/releases/cloud/2023-05-10.md @@ -0,0 +1,5 @@ +## May 10, 2023 + +

Security updates

+ +- [Egress Perimeter Controls](/docs/cockroachcloud/egress-perimeter-controls.html), which allow you to restrict egress from a {{ site.data.products.dedicated }} cluster to a list of specified external destinations, are now generally available for {{ site.data.products.dedicated }} advanced clusters. diff --git a/_includes/releases/v22.1/v22.1.20.md b/_includes/releases/v22.1/v22.1.20.md new file mode 100644 index 00000000000..da520b16a4c --- /dev/null +++ b/_includes/releases/v22.1/v22.1.20.md @@ -0,0 +1,34 @@ +## v22.1.20 + +Release Date: May 12, 2023 + +{% include releases/release-downloads-docker-image.md release=include.release %} + +

Bug fixes

+ +- Fixed a rare bug where [replica rebalancing](../v22.1/architecture/replication-layer.html) during write heavy workloads could cause keys to be deleted unexpectedly from a [local store](../v22.1/cockroach-start.html#flags-store). [#102190][#102190] +- Fixed a bug introduced in v22.1.19, v22.2.8, and pre-release versions of 23.1 that could cause queries to return spurious insufficient [privilege](../v22.1/security-reference/authorization.html#privileges) errors. For the bug to occur, two databases would need to have duplicate tables, each with a [foreign key](../v22.1/foreign-key.html) reference to another table. The error would then occur if the same SQL string was executed against both databases concurrently by users that have privileges over only one of the tables. [#102653][#102653] +- Fixed a bug where a [backup](../v22.1/backup-and-restore-overview.html) with a key's [revision history](../v22.1/take-backups-with-revision-history-and-restore-from-a-point-in-time.html) split across multiple [SST files](../v22.1/architecture/storage-layer.html#ssts) may not have correctly restored the proper revision of the key. [#102372][#102372] +- Fixed a bug present since v21.1 that allowed values to be inserted into an [`ARRAY`](../v22.1/array.html)-type column that did not conform to the inner-type of the array. For example, it was possible to insert `ARRAY['foo']` into a column of type `CHAR(1)[]`. This could cause incorrect results when querying the table. The [`INSERT`](../v22.1/insert.html) now errors, which is expected. [#102811][#102811] +- Fixed a bug where [backup and restore](../v22.1/backup-and-restore-overview.html) would panic if the target is a synthetic public [schema](../v22.1/schema-design-overview.html), such as `system.public`. [#102783][#102783] +- Fixed an issue since v20.2.0 where running [`SHOW HISTOGRAM`](../v22.1/show-columns.html) to see the histogram for an [`ENUM`](../v22.1/enum.html)-type column would panic and crash the cockroach process. [#102829][#102829] + +

SQL language changes

+ +- Added two views to the [`crdb_internal`](../v22.1/crdb-internal.html) catalog: `crdb_internal.statement_statistics_persisted`, which surfaces data in the persisted `system.statement_statistics` table, and `crdb_internal.transaction_statistics_persisted`, which surfaces the `system.transaction_statistics` table. [#99272][#99272] + +
+ +

Contributors

+ +This release includes 13 merged PRs by 14 authors. + +
+ +[#102190]: https://github.com/cockroachdb/cockroach/pull/102190 +[#102372]: https://github.com/cockroachdb/cockroach/pull/102372 +[#102653]: https://github.com/cockroachdb/cockroach/pull/102653 +[#102783]: https://github.com/cockroachdb/cockroach/pull/102783 +[#102811]: https://github.com/cockroachdb/cockroach/pull/102811 +[#102829]: https://github.com/cockroachdb/cockroach/pull/102829 +[#99272]: https://github.com/cockroachdb/cockroach/pull/99272 diff --git a/_includes/v22.2/misc/note-egress-perimeter-cdc-backup.md b/_includes/v22.2/misc/note-egress-perimeter-cdc-backup.md index d573b88dfae..1246632b696 100644 --- a/_includes/v22.2/misc/note-egress-perimeter-cdc-backup.md +++ b/_includes/v22.2/misc/note-egress-perimeter-cdc-backup.md @@ -1,3 +1,3 @@ {{site.data.alerts.callout_info}} -We recommend enabling Egress Perimeter Controls on {{ site.data.products.dedicated }} clusters to mitigate the risk of data exfiltration when accessing external resources, such as cloud storage for change data capture or backup and restore operations. See [Egress Perimeter Controls for CockroachDB Dedicated (Preview)](../cockroachcloud/egress-perimeter-controls.html) for detail and setup instructions. -{{site.data.alerts.end}} \ No newline at end of file +Cockroach Labs recommends enabling Egress Perimeter Controls on {{ site.data.products.dedicated }} clusters to mitigate the risk of data exfiltration when accessing external resources, such as cloud storage for change data capture or backup and restore operations. See [Egress Perimeter Controls](../cockroachcloud/egress-perimeter-controls.html) for detail and setup instructions. +{{site.data.alerts.end}} diff --git a/_includes/v23.1/cdc/metrics-labels.md b/_includes/v23.1/cdc/metrics-labels.md index 71ad95c90ee..5c26db8eb32 100644 --- a/_includes/v23.1/cdc/metrics-labels.md +++ b/_includes/v23.1/cdc/metrics-labels.md @@ -1,9 +1,8 @@ -To measure metrics per changefeed, define a "metrics label" to which one or multiple changefeed(s) will increment each [changefeed metric](monitor-and-debug-changefeeds.html#metrics). Metrics label information is sent with time-series metrics to `http://{host}:{http-port}/_status/vars`, viewable via the [Prometheus endpoint](monitoring-and-alerting.html#prometheus-endpoint). An aggregated metric of all changefeeds is also measured. +To measure metrics per changefeed, you can define a "metrics label" for one or multiple changefeed(s). The changefeed(s) will increment each [changefeed metric](monitor-and-debug-changefeeds.html#metrics). Metrics label information is sent with time-series metrics to `http://{host}:{http-port}/_status/vars`, viewable via the [Prometheus endpoint](monitoring-and-alerting.html#prometheus-endpoint). An aggregated metric of all changefeeds is also measured. It is necessary to consider the following when applying metrics labels to changefeeds: -- Metrics labels are **not** available in {{ site.data.products.db }}. -- The `COCKROACH_EXPERIMENTAL_ENABLE_PER_CHANGEFEED_METRICS` [environment variable](cockroach-commands.html#environment-variables) must be specified to use this feature. +- Metrics labels are **not** available in {{ site.data.products.serverless }}. - The `server.child_metrics.enabled` [cluster setting](cluster-settings.html) must be set to `true` before using the `metrics_label` option. - Metrics label information is sent to the `_status/vars` endpoint, but will **not** show up in [`debug.zip`](cockroach-debug-zip.html) or the [DB Console](ui-overview.html). - Introducing labels to isolate a changefeed's metrics can increase cardinality significantly. There is a limit of 1024 unique labels in place to prevent cardinality explosion. That is, when labels are applied to high-cardinality data (data with a higher number of unique values), each changefeed with a label then results in more metrics data to multiply together, which will grow over time. This will have an impact on performance as the metric-series data per changefeed quickly populates against its label. diff --git a/_includes/v23.1/client-transaction-retry.md b/_includes/v23.1/client-transaction-retry.md index 2cae1347a18..cd12adf566f 100644 --- a/_includes/v23.1/client-transaction-retry.md +++ b/_includes/v23.1/client-transaction-retry.md @@ -1,3 +1,3 @@ {{site.data.alerts.callout_info}} -With the default `SERIALIZABLE` [isolation level](transactions.html#isolation-levels), CockroachDB may require the client to [retry a transaction](transactions.html#transaction-retries) in case of read/write [contention]({{ link_prefix }}performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention). CockroachDB provides a [generic retry function](transactions.html#client-side-intervention) that runs inside a transaction and retries it as needed. The code sample below shows how it is used. +With the default `SERIALIZABLE` [isolation level](transactions.html#isolation-levels), CockroachDB may require the client to [retry a transaction](transactions.html#transaction-retries) in case of read/write [contention]({{ link_prefix }}performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention). CockroachDB provides a [generic retry function](transaction-retry-error-reference.html#client-side-retry-handling) that runs inside a transaction and retries it as needed. The code sample below shows how it is used. {{site.data.alerts.end}} diff --git a/_includes/v23.1/metric-names.md b/_includes/v23.1/metric-names.md index 23e9d8c2893..48aefb2df43 100644 --- a/_includes/v23.1/metric-names.md +++ b/_includes/v23.1/metric-names.md @@ -8,8 +8,8 @@ Name | Description `capacity.reserved` | Capacity reserved for snapshots `capacity.used` | Used storage capacity `capacity` | Total storage capacity -`changefeed.failures` | Total number of changefeed jobs which have failed -`changefeed.running` | Number of currently running changefeeds, including sinkless +`changefeed.failures` | Total number of [changefeed jobs](show-jobs.html#show-changefeed-jobs) which have failed. +`changefeed.running` | Number of currently running changefeeds, including sinkless. `clock-offset.meannanos` | Mean clock offset with other nodes in nanoseconds `clock-offset.stddevnanos` | Std dev clock offset with other nodes in nanoseconds `cluster.preserve-downgrade-option.last-updated` | Unix timestamp of last updated time for cluster.preserve_downgrade_option @@ -39,6 +39,10 @@ Name | Description `intentage` | Cumulative age of intents in seconds `intentbytes` | Number of bytes in intent KV pairs `intentcount` | Count of intent keys +`jobs.changefeed.expired_pts_records` | Number of expired [protected timestamp](architecture/storage-layer.html#protected-timestamps) records owned by [changefeed jobs](show-jobs.html#show-changefeed-jobs). +`jobs.{job_type}.currently_paused` | Number of `{job_type}` [jobs](show-jobs.html) currently considered paused. See the [`/_status/vars`](monitoring-and-alerting.html#prometheus-endpoint) endpoint for all job types. +`jobs.{job_type}.protected_age_sec` | The age of the oldest [protected timestamp](architecture/storage-layer.html#protected-timestamps) record protecting `{job_type}` [jobs](show-jobs.html). See the [`/_status/vars`](monitoring-and-alerting.html#prometheus-endpoint) endpoint for all job types. +`jobs.{job_type}.protected_record_count` | Number of [protected timestamp](architecture/storage-layer.html#protected-timestamps) records held by `{job_type}` [jobs](show-jobs.html). See the [`/_status/vars`](monitoring-and-alerting.html#prometheus-endpoint) endpoint for all job types. `jobs.row_level_ttl.num_active_spans` | Number of active spans the TTL job is deleting from `jobs.row_level_ttl.span_total_duration` | Duration for processing a span during row level TTL `keybytes` | Number of bytes taken up by keys @@ -222,6 +226,8 @@ Name | Description `round-trip-latency` | Distribution of round-trip latencies with other nodes in nanoseconds `security.certificate.expiration.ca` | Expiration timestamp in seconds since Unix epoch for the CA certificate. 0 means no certificate or error. `security.certificate.expiration.node` | Expiration timestamp in seconds since Unix epoch for the node certificate. 0 means no certificate or error. +`schedules.BACKUP.protected_age_sec` | The age of the oldest [protected timestamp](architecture/storage-layer.html#protected-timestamps) record protected by `BACKUP` schedules. +`schedules.BACKUP.protected_record_count` | Number of [protected timestamp](architecture/storage-layer.html#protected-timestamps) records held by `BACKUP` schedules. `sql.bytesin` | Number of sql bytes received `sql.bytesout` | Number of sql bytes sent `sql.conns` | Number of active sql connections diff --git a/_includes/v23.1/misc/client-side-intervention-example.md b/_includes/v23.1/misc/client-side-intervention-example.md deleted file mode 100644 index d0bbfc33695..00000000000 --- a/_includes/v23.1/misc/client-side-intervention-example.md +++ /dev/null @@ -1,28 +0,0 @@ -The Python-like pseudocode below shows how to implement an application-level retry loop; it does not require your driver or ORM to implement [advanced retry handling logic](advanced-client-side-transaction-retries.html), so it can be used from any programming language or environment. In particular, your retry loop must: - -- Raise an error if the `max_retries` limit is reached -- Retry on `40001` error codes -- [`COMMIT`](commit-transaction.html) at the end of the `try` block -- Implement [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff) logic as shown below for best performance - -~~~ python -while true: - n++ - if n == max_retries: - throw Error("did not succeed within N retries") - try: - # add logic here to run all your statements - conn.exec('COMMIT') - break - catch error: - if error.code != "40001": - throw error - else: - # This is a retry error, so we roll back the current transaction - # and sleep for a bit before retrying. The sleep time increases - # for each failed transaction. Adapted from - # https://colintemple.com/2017/03/java-exponential-backoff/ - conn.exec('ROLLBACK'); - sleep_ms = int(((2**n) * 100) + rand( 100 - 1 ) + 1) - sleep(sleep_ms) # Assumes your sleep() takes milliseconds -~~~ diff --git a/_includes/v23.1/misc/database-terms.md b/_includes/v23.1/misc/database-terms.md index 72e6d290205..d890285b103 100644 --- a/_includes/v23.1/misc/database-terms.md +++ b/_includes/v23.1/misc/database-terms.md @@ -4,7 +4,7 @@ The requirement that a transaction must change affected data only in allowed ways. CockroachDB uses "consistency" in both the sense of [ACID semantics](https://en.wikipedia.org/wiki/ACID) and the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem), albeit less formally than either definition. ### Isolation -The degree to which a transaction may be affected by other transactions running at the same time. CockroachDB provides the [`SERIALIZABLE`](https://en.wikipedia.org/wiki/Serializability) isolation level, which is the highest possible and guarantees that every committed transaction has the same result as if each transaction were run one at a time. +The degree to which a transaction may be affected by other transactions running at the same time. CockroachDB provides the [`SERIALIZABLE`](https://en.wikipedia.org/wiki/Serializability) isolation level, which is the highest possible and guarantees that every committed transaction has the same result as if each transaction were run one at a time. ### Consensus The process of reaching agreement on whether a transaction is committed or aborted. CockroachDB uses the [Raft consensus protocol](#architecture-raft). In CockroachDB, when a range receives a write, a quorum of nodes containing replicas of the range acknowledge the write. This means your data is safely stored and a majority of nodes agree on the database's current state, even if some of the nodes are offline. @@ -17,8 +17,13 @@ The process of creating and distributing copies of data, as well as ensuring tha ### Transaction A set of operations performed on a database that satisfy the requirements of [ACID semantics](https://en.wikipedia.org/wiki/ACID). This is a crucial feature for a consistent system to ensure developers can trust the data in their database. For more information about how transactions work in CockroachDB, see [Transaction Layer](transaction-layer.html). -### Contention - A state of conflict that occurs when a [transaction](../transactions.html) is unable to complete due to another concurrent or recent transaction attempting to write to the same data. When CockroachDB experiences transaction contention, it will [automatically attempt to retry the failed transaction](../transactions.html#automatic-retries) without involving the client (i.e., silently). If the automatic retry is not possible or fails, a [transaction retry error](../transaction-retry-error-reference.html) is emitted to the client. The client application can be configured to [retry the transaction](../transaction-retry-error-reference.html#client-side-retry-handling) after receiving such an error, and to [minimize transaction retry errors](../transaction-retry-error-reference.html#minimize-transaction-retry-errors) in the first place where possible. +### Transaction contention + A [state of conflict](../performance-best-practices-overview.html#transaction-contention) that occurs when: + +- A [transaction](../transactions.html) is unable to complete due to another concurrent or recent transaction attempting to write to the same data. This is also called *lock contention*. +- A transaction is [automatically retried](../transactions.html#automatic-retries) because it could not be placed into a [serializable ordering](../demo-serializable.html) among all of the currently executing transactions. If the automatic retry is not possible or fails, a [*transaction retry error*](../transaction-retry-error-reference.html) is emitted to the client, requiring the client application to [retry the transaction](../transaction-retry-error-reference.html#client-side-retry-handling). + +Steps should be taken to [reduce transaction contention](../performance-best-practices-overview.html#reduce-transaction-contention) in the first place. ### Multi-active availability A consensus-based notion of high availability that lets each node in the cluster handle reads and writes for a subset of the stored data (on a per-range basis). This is in contrast to _active-passive replication_, in which the active node receives 100% of request traffic, and _active-active_ replication, in which all nodes accept requests but typically cannot guarantee that reads are both up-to-date and fast. diff --git a/_includes/v23.1/misc/note-egress-perimeter-cdc-backup.md b/_includes/v23.1/misc/note-egress-perimeter-cdc-backup.md index d573b88dfae..1246632b696 100644 --- a/_includes/v23.1/misc/note-egress-perimeter-cdc-backup.md +++ b/_includes/v23.1/misc/note-egress-perimeter-cdc-backup.md @@ -1,3 +1,3 @@ {{site.data.alerts.callout_info}} -We recommend enabling Egress Perimeter Controls on {{ site.data.products.dedicated }} clusters to mitigate the risk of data exfiltration when accessing external resources, such as cloud storage for change data capture or backup and restore operations. See [Egress Perimeter Controls for CockroachDB Dedicated (Preview)](../cockroachcloud/egress-perimeter-controls.html) for detail and setup instructions. -{{site.data.alerts.end}} \ No newline at end of file +Cockroach Labs recommends enabling Egress Perimeter Controls on {{ site.data.products.dedicated }} clusters to mitigate the risk of data exfiltration when accessing external resources, such as cloud storage for change data capture or backup and restore operations. See [Egress Perimeter Controls](../cockroachcloud/egress-perimeter-controls.html) for detail and setup instructions. +{{site.data.alerts.end}} diff --git a/_includes/v23.1/misc/session-vars.md b/_includes/v23.1/misc/session-vars.md index 714abcbb0f2..c85eb63b287 100644 --- a/_includes/v23.1/misc/session-vars.md +++ b/_includes/v23.1/misc/session-vars.md @@ -9,6 +9,7 @@ | `database` | The [current database](sql-name-resolution.html#current-database). | Database in connection string, or empty if not specified. | Yes | Yes | | `datestyle` | The input string format for [`DATE`](date.html) and [`TIMESTAMP`](timestamp.html) values. Accepted values include `ISO,MDY`, `ISO,DMY`, and `ISO,YMD`. | The value set by the `sql.defaults.datestyle` [cluster setting](cluster-settings.html) (`ISO,MDY`, by default). | Yes | Yes | | `default_int_size` | The size, in bytes, of an [`INT`](int.html) type. | `8` | Yes | Yes | +| `default_text_search_config` | New in v23.1: The dictionary used to normalize tokens and eliminate stop words when calling a [full-text search function](functions-and-operators.html#full-text-search-functions) without a configuration parameter. See [Full-Text Search](full-text-search.html). | `english` | Yes | Yes | | `default_transaction_isolation` | All transactions execute with `SERIALIZABLE` isolation. See [Transactions: Isolation levels](transactions.html#isolation-levels). | `SERIALIZABLE` | No | Yes | | `default_transaction_priority` | The default transaction priority for the current session. The supported options are `low`, `normal`, and `high`. | `normal` | Yes | Yes | | `default_transaction_quality_of_service` | The default transaction quality of service for the current session. The supported options are `regular`, `critical`, and `background`. See [Set quality of service level](admission-control.html#set-quality-of-service-level-for-a-session). | `regular` | Yes | Yes | @@ -22,6 +23,7 @@ | `enable_insert_fast_path` | Indicates whether CockroachDB will use a specialized execution operator for inserting into a table. We recommend leaving this setting `on`. | `on` | Yes | Yes | | `enable_zigzag_join` | Indicates whether the [cost-based optimizer](cost-based-optimizer.html) will plan certain queries using a zig-zag merge join algorithm, which searches for the desired intersection by jumping back and forth between the indexes based on the fact that after constraining indexes, they share an ordering. | `on` | Yes | Yes | | `enforce_home_region` | If set to `on`, queries return an error and in some cases a suggested resolution if they cannot run entirely in their home region. This can occur if a query has no home region (for example, if it reads from different home regions in a [regional by row table](multiregion-overview.html#regional-by-row-tables)) or a query's home region differs from the [gateway](architecture/life-of-a-distributed-transaction.html#gateway) region. Note that only tables with `ZONE` [survivability](when-to-use-zone-vs-region-survival-goals.html) can be scanned without error when this is enabled. For more information about home regions, see [Table localities](multiregion-overview.html#table-localities).

This feature is in preview. It is subject to change. | `off` | Yes | Yes | +| `enforce_home_region_follower_reads_enabled` | If `on` while the [`enforce_home_region`](cost-based-optimizer.html#control-whether-queries-are-limited-to-a-single-region) setting is `on`, allows `enforce_home_region` to perform `AS OF SYSTEM TIME` [follower reads](follower-reads.html) to detect and report a query's [home region](multiregion-overview.html#table-localities), if any.

This feature is in preview. It is subject to change. | `off` | Yes | Yes | | `extra_float_digits` | The number of digits displayed for floating-point values. Only values between `-15` and `3` are supported. | `0` | Yes | Yes | | `force_savepoint_restart` | When set to `true`, allows the [`SAVEPOINT`](savepoint.html) statement to accept any name for a savepoint. | `off` | Yes | Yes | | `foreign_key_cascades_limit` | Limits the number of [cascading operations](foreign-key.html#use-a-foreign-key-constraint-with-cascade) that run as part of a single query. | `10000` | Yes | Yes | diff --git a/_includes/v23.1/misc/tooling.md b/_includes/v23.1/misc/tooling.md index 4f146670535..bc8034ef5c7 100644 --- a/_includes/v23.1/misc/tooling.md +++ b/_includes/v23.1/misc/tooling.md @@ -7,7 +7,7 @@ Cockroach Labs has partnered with open-source projects, vendors, and individuals - **Partner supported** indicates that Cockroach Labs has a partnership with a third-party vendor that provides support for the CockroachDB integration with their tool. {{site.data.alerts.callout_info}} -Unless explicitly stated, support for a [driver](#drivers) or [data access framework](#data-access-frameworks-e-g-orms) does not include [automatic, client-side transaction retry handling](transactions.html#client-side-intervention). For client-side transaction retry handling samples, see [Example Apps](example-apps.html). +Unless explicitly stated, support for a [driver](#drivers) or [data access framework](#data-access-frameworks-e-g-orms) does not include [automatic, client-side transaction retry handling](transaction-retry-error-reference.html#client-side-retry-handling). For client-side transaction retry handling samples, see [Example Apps](example-apps.html). {{site.data.alerts.end}} If you encounter problems using CockroachDB with any of the tools listed on this page, please [open an issue](https://github.com/cockroachdb/cockroach/issues/new) with details to help us make progress toward better support. diff --git a/_includes/v23.1/performance/contention-indicators.md b/_includes/v23.1/performance/contention-indicators.md deleted file mode 100644 index 7797130a4ab..00000000000 --- a/_includes/v23.1/performance/contention-indicators.md +++ /dev/null @@ -1,5 +0,0 @@ -* In the [**Transaction Executions** view](ui-insights-page.html) on the **Insights** page, transaction executions display the **High Contention** insight. -* Your application is experiencing degraded performance with transaction errors like `SQLSTATE: 40001`, `RETRY_WRITE_TOO_OLD`, and `RETRY_SERIALIZABLE`. See [Transaction Retry Error Reference](transaction-retry-error-reference.html). -* The [SQL Statement Contention graph](ui-sql-dashboard.html#sql-statement-contention) is showing spikes over time. -SQL Statement Contention graph in DB Console -* The [Transaction Restarts graph](ui-sql-dashboard.html) is showing spikes in retries over time. diff --git a/_includes/v23.1/performance/increase-server-side-retries.md b/_includes/v23.1/performance/increase-server-side-retries.md new file mode 100644 index 00000000000..de70c95d582 --- /dev/null +++ b/_includes/v23.1/performance/increase-server-side-retries.md @@ -0,0 +1,3 @@ +- [Send statements in transactions as a single batch](transactions.html#batched-statements). Batching allows CockroachDB to [automatically retry](transactions.html#automatic-retries) a transaction when [previous reads are invalidated](architecture/transaction-layer.html#read-refreshing) at a [pushed timestamp](architecture/transaction-layer.html#timestamp-cache). When a multi-statement transaction is not batched, and takes more than a single round trip, CockroachDB cannot automatically retry the transaction. For an example showing how to break up large transactions in an application, see [Break up large transactions into smaller units of work](build-a-python-app-with-cockroachdb-sqlalchemy.html#break-up-large-transactions-into-smaller-units-of-work). + +- Limit the size of the result sets of your transactions to under 16KB, so that CockroachDB is more likely to [automatically retry](transactions.html#automatic-retries) when [previous reads are invalidated](architecture/transaction-layer.html#read-refreshing) at a [pushed timestamp](architecture/transaction-layer.html#timestamp-cache). When a transaction returns a result set over 16KB, even if that transaction has been sent as a single batch, CockroachDB cannot automatically retry the transaction. You can change the results buffer size for all new sessions using the `sql.defaults.results_buffer.size` [cluster setting](cluster-settings.html), or for a specific session using the `results_buffer_size` [session variable](set-vars.html). \ No newline at end of file diff --git a/_includes/v23.1/performance/reduce-contention.md b/_includes/v23.1/performance/reduce-contention.md new file mode 100644 index 00000000000..391f036dbaa --- /dev/null +++ b/_includes/v23.1/performance/reduce-contention.md @@ -0,0 +1,13 @@ +- Limit the number of affected rows by following [optimizing queries](apply-statement-performance-rules.html) (e.g., avoiding full scans, creating secondary indexes, etc.). Not only will transactions run faster, lock fewer rows, and hold locks for a shorter duration, but the chances of [read invalidation](architecture/transaction-layer.html#read-refreshing) when the transaction's [timestamp is pushed](architecture/transaction-layer.html#timestamp-cache), due to a conflicting write, are decreased because of a smaller read set (i.e., a smaller number of rows read). + +- Break down larger transactions (e.g., [bulk deletes](bulk-delete-data.html)) into smaller ones to have transactions hold locks for a shorter duration. For example, use [common table expressions](common-table-expressions.html) to group multiple clauses together in a single SQL statement. This will also decrease the likelihood of [pushed timestamps](architecture/transaction-layer.html#timestamp-cache). For instance, as the size of writes (number of rows written) decreases, the chances of the transaction's timestamp getting bumped by concurrent reads decreases. + +- Use [`SELECT FOR UPDATE`](select-for-update.html) to aggressively lock rows that will later be updated in the transaction. Updates must operate on the most recent version of a row, so a concurrent write to the row will cause a retry error ([`RETRY_WRITE_TOO_OLD`](transaction-retry-error-reference.html#retry_write_too_old)). Locking early in the transaction forces concurrent writers to block until the transaction is finished, which prevents the retry error. Note that this locks the rows for the duration of the transaction; whether this is tenable will depend on your workload. For more information, see [When and why to use `SELECT FOR UPDATE` in CockroachDB](https://www.cockroachlabs.com/blog/when-and-why-to-use-select-for-update-in-cockroachdb/). + +- Use historical reads ([`SELECT ... AS OF SYSTEM TIME`](as-of-system-time.html)), preferably [bounded staleness reads](follower-reads.html#when-to-use-bounded-staleness-reads) or [exact staleness with follower reads](follower-reads.html#run-queries-that-use-exact-staleness-follower-reads) when possible to reduce conflicts with other writes. This reduces the likelihood of [`RETRY_SERIALIZABLE`](transaction-retry-error-reference.html#retry_serializable) errors as fewer writes will happen at the historical timestamp. More specifically, writes' timestamps are less likely to be pushed by historical reads as they would [when the read has a higher priority level](architecture/transaction-layer.html#transaction-conflicts). + +- When replacing values in a row, use [`UPSERT`](upsert.html) and specify values for all columns in the inserted rows. This will usually have the best performance under contention, compared to combinations of [`SELECT`](select-clause.html), [`INSERT`](insert.html), and [`UPDATE`](update.html). + +- If applicable to your workload, assign [column families](column-families.html#default-behavior) and separate columns that are frequently read and written into separate columns. Transactions will operate on disjoint column families and reduce the likelihood of conflicts. + +- As a last resort, consider adjusting the [closed timestamp interval](architecture/transaction-layer.html#closed-timestamps) using the `kv.closed_timestamp.target_duration` [cluster setting](cluster-settings.html) to reduce the likelihood of long-running write transactions having their [timestamps pushed](architecture/transaction-layer.html#timestamp-cache). This setting should be carefully adjusted if **no other mitigations are available** because there can be downstream implications (e.g., historical reads, change data capture feeds, statistics collection, handling zone configurations, etc.). For example, a transaction _A_ is forced to refresh (i.e., change its timestamp) due to hitting the maximum [_closed timestamp_](architecture/transaction-layer.html#closed-timestamps) interval (closed timestamps enable [Follower Reads](follower-reads.html#how-stale-follower-reads-work) and [Change Data Capture (CDC)](change-data-capture-overview.html)). This can happen when transaction _A_ is a long-running transaction, and there is a write by another transaction to data that _A_ has already read. For more information, see the reference entry for [`RETRY_SERIALIZABLE`](transaction-retry-error-reference.html#retry_serializable). \ No newline at end of file diff --git a/_includes/v23.1/performance/reduce-hot-spots.md b/_includes/v23.1/performance/reduce-hot-spots.md new file mode 100644 index 00000000000..2873855d277 --- /dev/null +++ b/_includes/v23.1/performance/reduce-hot-spots.md @@ -0,0 +1,31 @@ +- Use index keys with a random distribution of values, so that transactions over different rows are more likely to operate on separate data ranges. See the [SQL FAQs](sql-faqs.html#how-do-i-auto-generate-unique-row-ids-in-cockroachdb) on row IDs for suggestions. + +- Place parts of the records that are modified by different transactions in different tables. That is, increase [normalization](https://en.wikipedia.org/wiki/Database_normalization). However, there are benefits and drawbacks to increasing normalization. + + - Benefits: + + - Allows separate transactions to modify related underlying data without causing [contention](#transaction-contention). + - Can improve performance for read-heavy workloads. + + - Drawbacks: + + - More complex data model. + - Increases the chance of data inconsistency. + - Increases data redundancy. + - Can degrade performance for write-heavy workloads. + +- If the application strictly requires operating on very few different index keys, consider using [`ALTER ... SPLIT AT`](alter-table.html#split-at) so that each index key can be served by a separate group of nodes in the cluster. + +- If you are working with a table that **must** be indexed on sequential keys, consider using [hash-sharded indexes](hash-sharded-indexes.html). For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see the blog post [Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads](https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/). As part of this, we recommend doing thorough performance testing with and without hash-sharded indexes to see which works best for your application. + +- To avoid read hot spots: + + - Increase data distribution, which will allow for more ranges. The hot spot exists because the data being accessed is all co-located in one range. + - Increase [load balancing](recommended-production-settings.html#load-balancing) across more nodes in the same range. Most transactional reads must go to the leaseholder in CockroachDB, which means that opportunities for load balancing over replicas are minimal. + + However, the following features do not permit load balancing over replicas: + + - [Global tables](global-tables.html). + - [Follower reads](follower-reads.html) (both the bounded staleness and the exact staleness kinds). + + In these cases, more replicas will help, up to the number of nodes in the cluster. \ No newline at end of file diff --git a/_includes/v23.1/performance/sql-trace-txn-enable-threshold.md b/_includes/v23.1/performance/sql-trace-txn-enable-threshold.md new file mode 100644 index 00000000000..723f9075d85 --- /dev/null +++ b/_includes/v23.1/performance/sql-trace-txn-enable-threshold.md @@ -0,0 +1 @@ +The default tracing behavior captures a small percent of transactions, so not all contention events will be recorded. When investigating transaction contention, you can set the `sql.trace.txn.enable_threshold` [cluster setting](cluster-settings.html#setting-sql-trace-txn-enable-threshold) to always capture contention events. \ No newline at end of file diff --git a/_includes/v23.1/performance/statement-contention.md b/_includes/v23.1/performance/statement-contention.md deleted file mode 100644 index 58378b561a2..00000000000 --- a/_includes/v23.1/performance/statement-contention.md +++ /dev/null @@ -1,16 +0,0 @@ -Find the transactions and statements within the transactions that are experiencing [contention]({{ link_prefix }}performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention). CockroachDB has several tools to help you track down such transactions and statements: - -* In the DB Console: - - Visit the [**Transaction Executions** view](ui-insights-page.html) on the **Insights** page and look for transaction executions with the **High Contention** insight. - - Visit the [**Transactions**](ui-transactions-page.html) and [**Statements**](ui-statements-page.html) pages and sort transactions and statements by **Contention Time**. -* Query the following tables: - - - [`crdb_internal.cluster_contended_indexes`](crdb-internal.html#cluster_contended_indexes) and [`crdb_internal.cluster_contended_tables`](crdb-internal.html#cluster_contended_tables) tables for your database to find the indexes and tables that are experiencing contention. - - [`crdb_internal.cluster_locks`](crdb-internal.html#cluster_locks) to find out which transactions are holding locks on which objects. - - [`crdb_internal.cluster_contention_events`](crdb-internal.html#view-the-tables-indexes-with-the-most-time-under-contention) to view the tables/indexes with the most time under contention. - -After you identify the transactions or statements that are causing contention, follow the steps in the next section [to avoid contention](performance-best-practices-overview.html#avoid-transaction-contention). - -{{site.data.alerts.callout_info}} -If you experience a hanging or stuck query that is not showing up in the list of contended transactions and statements on the [Transactions](ui-transactions-page.html) or [Statements](ui-statements-page.html) pages in the DB Console, the process described above will not work. You will need to follow the process described in [Hanging or stuck queries](query-behavior-troubleshooting.html#hanging-or-stuck-queries) instead. -{{site.data.alerts.end}} diff --git a/_includes/v23.1/performance/transaction-retry-error-actions.md b/_includes/v23.1/performance/transaction-retry-error-actions.md new file mode 100644 index 00000000000..206f7751faf --- /dev/null +++ b/_includes/v23.1/performance/transaction-retry-error-actions.md @@ -0,0 +1,5 @@ +In most cases, the correct actions to take when encountering transaction retry errors are: + +1. Update your application to support [client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling) when transaction retry errors are encountered. Follow the guidance for the [specific error type](transaction-retry-error-reference.html#transaction-retry-error-reference). + +1. Take steps to [minimize transaction retry errors](transaction-retry-error-reference.html#minimize-transaction-retry-errors) in the first place. This means reducing transaction contention overall, and increasing the likelihood that CockroachDB can [automatically retry](transactions.html#automatic-retries) a failed transaction. \ No newline at end of file diff --git a/_includes/v23.1/prod-deployment/decommission-pre-flight-checks.md b/_includes/v23.1/prod-deployment/decommission-pre-flight-checks.md new file mode 100644 index 00000000000..b0c418a1b66 --- /dev/null +++ b/_includes/v23.1/prod-deployment/decommission-pre-flight-checks.md @@ -0,0 +1,18 @@ +By default, CockroachDB will perform a set of "decommissioning pre-flight checks". That is, decommission pre-checks look over the ranges with replicas on the to-be-decommissioned node, and check that each replica can be moved to some other node in the cluster. If errors are detected that would result in the inability to complete node decommissioning, they will be printed to `STDERR` and the command will exit *without attempting to perform node decommissioning*. For example, ranges that require a certain number of voting replicas in a region but do not have any available nodes in the region not already containing a replica will block the decommissioning process. + +The error format is shown below: + +~~~ +ranges blocking decommission detected +n1 has 44 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster" +n2 has 27 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster" + +ERROR: Cannot decommission nodes. +Failed running "node decommission" +~~~ + +These checks can be skipped by [passing the flag `--checks=skip` to `cockroach node decommission`](cockroach-node.html#decommission-checks). + +{{site.data.alerts.callout_info}} +The amount of remaining disk space on other nodes in the cluster is not yet considered as part of the decommissioning pre-flight checks. For more information, see [cockroachdb/cockroach#71757](https://github.com/cockroachdb/cockroach/issues/71757) +{{site.data.alerts.end}} diff --git a/_includes/v23.1/sidebar-data/develop.json b/_includes/v23.1/sidebar-data/develop.json index dd4767f9293..3ad26f95245 100644 --- a/_includes/v23.1/sidebar-data/develop.json +++ b/_includes/v23.1/sidebar-data/develop.json @@ -120,6 +120,12 @@ "/${VERSION}/inverted-indexes.html" ] }, + { + "title": "Index Full Text", + "urls": [ + "/${VERSION}/full-text-search.html" + ] + }, { "title": "Index Trigrams", "urls": [ diff --git a/_includes/v23.1/sidebar-data/manage.json b/_includes/v23.1/sidebar-data/manage.json index fb2cb034679..30a646ca15d 100644 --- a/_includes/v23.1/sidebar-data/manage.json +++ b/_includes/v23.1/sidebar-data/manage.json @@ -161,15 +161,15 @@ "urls": [ "/${VERSION}/backup-validation.html" ] - }, - { - "title": "Backup and Restore Monitoring", - "urls": [ - "/${VERSION}/backup-and-restore-monitoring.html" - ] } ] }, + { + "title": "Backup and Restore Monitoring", + "urls": [ + "/${VERSION}/backup-and-restore-monitoring.html" + ] + }, { "title": "Restoring Backups Across Versions", "urls": [ diff --git a/_includes/v23.1/sidebar-data/reference.json b/_includes/v23.1/sidebar-data/reference.json index bc267d5fc08..54a504b564f 100644 --- a/_includes/v23.1/sidebar-data/reference.json +++ b/_includes/v23.1/sidebar-data/reference.json @@ -1007,6 +1007,18 @@ "/${VERSION}/timestamp.html" ] }, + { + "title": "TSQUERY", + "urls": [ + "/${VERSION}/tsquery.html" + ] + }, + { + "title": "TSVECTOR", + "urls": [ + "/${VERSION}/tsvector.html" + ] + }, { "title": "UUID", "urls": [ diff --git a/_includes/v23.1/sql/crdb-internal-is-not-supported-for-production-use.md b/_includes/v23.1/sql/crdb-internal-is-not-supported-for-production-use.md new file mode 100644 index 00000000000..59f0764e51a --- /dev/null +++ b/_includes/v23.1/sql/crdb-internal-is-not-supported-for-production-use.md @@ -0,0 +1 @@ +Many of the tables in the `crdb_internal` system catalog are **not supported for external use in production**. This output is provided **as a debugging aid only**. The output of particular `crdb_internal` facilities may change from patch release to patch release without advance warning. For more information, see [the `crdb_internal` documentation](crdb-internal.html). diff --git a/_includes/v23.1/sql/select-for-update-overview.md b/_includes/v23.1/sql/select-for-update-overview.md index ed5a90ac36a..cf545a03721 100644 --- a/_includes/v23.1/sql/select-for-update-overview.md +++ b/_includes/v23.1/sql/select-for-update-overview.md @@ -6,7 +6,7 @@ Because this queueing happens during the read operation, the [thrashing](https:/ As a result, using `SELECT FOR UPDATE` leads to increased throughput and decreased tail latency for contended operations. -Note that using `SELECT FOR UPDATE` does not completely eliminate the chance of [serialization errors](transaction-retry-error-reference.html), which use the `SQLSTATE` error code `40001`, and emit error messages with the string `restart transaction`. These errors can also arise due to [time uncertainty](architecture/transaction-layer.html#transaction-conflicts). To eliminate the need for application-level retry logic, in addition to `SELECT FOR UPDATE` your application also needs to use a [driver that implements automatic retry handling](transactions.html#client-side-intervention). +Note that using `SELECT FOR UPDATE` does not completely eliminate the chance of [serialization errors](transaction-retry-error-reference.html), which use the `SQLSTATE` error code `40001`, and emit error messages with the string `restart transaction`. These errors can also arise due to [time uncertainty](architecture/transaction-layer.html#transaction-conflicts). To eliminate the need for application-level retry logic, in addition to `SELECT FOR UPDATE` your application also needs to use a [driver that implements automatic retry handling](transaction-retry-error-reference.html#client-side-retry-handling). CockroachDB does not support the `FOR SHARE` or `FOR KEY SHARE` [locking strengths](select-for-update.html#locking-strengths). diff --git a/_includes/v23.1/sql/show-ranges-output-deprecation-notice.md b/_includes/v23.1/sql/show-ranges-output-deprecation-notice.md new file mode 100644 index 00000000000..55696660dfb --- /dev/null +++ b/_includes/v23.1/sql/show-ranges-output-deprecation-notice.md @@ -0,0 +1,16 @@ +The statement syntax and output documented on this page use the updated `SHOW RANGES` that **will become the default in CockroachDB v23.2**. To enable this syntax and output, set the [cluster setting `sql.show_ranges_deprecated_behavior.enabled`](cluster-settings.html#setting-sql-show-ranges-deprecated-behavior-enabled) to `false`: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET CLUSTER SETTING sql.show_ranges_deprecated_behavior.enabled = false; +~~~ + +The pre-v23.1 output of `SHOW RANGES` is deprecated in v23.1 **and will be removed in v23.2**. To view the documentation for the deprecated version of the `SHOW RANGES` statement, see [`SHOW RANGES` (v22.2)](../v22.2/show-ranges.html). + +When you use the deprecated version of the `SHOW RANGES` statement, the following message will appear, reminding you to update [the cluster setting](cluster-settings.html#setting-sql-show-ranges-deprecated-behavior-enabled): + +~~~ +NOTICE: attention! the pre-23.1 behavior of SHOW RANGES and crdb_internal.ranges{,_no_leases} is deprecated! +HINT: Consider enabling the new functionality by setting 'sql.show_ranges_deprecated_behavior.enabled' to 'false'. +The new SHOW RANGES statement has more options. Refer to the online documentation or execute 'SHOW RANGES ??' for details. +~~~ diff --git a/_includes/v23.1/sql/unsupported-postgres-features.md b/_includes/v23.1/sql/unsupported-postgres-features.md index 3838cd2e5ee..59fb0f240b1 100644 --- a/_includes/v23.1/sql/unsupported-postgres-features.md +++ b/_includes/v23.1/sql/unsupported-postgres-features.md @@ -2,8 +2,6 @@ - CockroachDB has support for [user-defined functions](user-defined-functions.html). - Triggers. - Events. -- `FULLTEXT` functions and indexes. - - Depending on your use case, you may be able to get by using [trigram indexes](trigram-indexes.html) to do fuzzy string matching and pattern matching. - Drop primary key. {{site.data.alerts.callout_info}} diff --git a/_includes/v23.1/sql/use-case-trigram-indexes.md b/_includes/v23.1/sql/use-case-trigram-indexes.md new file mode 100644 index 00000000000..b3c55364634 --- /dev/null +++ b/_includes/v23.1/sql/use-case-trigram-indexes.md @@ -0,0 +1 @@ +Depending on your use case, you may prefer to use [trigram indexes](trigram-indexes.html) to do fuzzy string matching and pattern matching. For more information about use cases for trigram indexes that could make having full-text search unnecessary, see the 2022 blog post [Use cases for trigram indexes (when not to use Full Text Search)](https://www.cockroachlabs.com/blog/use-cases-trigram-indexes/). \ No newline at end of file diff --git a/_includes/v23.1/ui/active-transaction-executions.md b/_includes/v23.1/ui/active-transaction-executions.md index 6eab121de0f..e6a55112c47 100644 --- a/_includes/v23.1/ui/active-transaction-executions.md +++ b/_includes/v23.1/ui/active-transaction-executions.md @@ -33,7 +33,7 @@ The transaction execution details page provides the following details on the tra - **Most Recent Statement Execution ID**: Link to the ID of the most recently [executed statement](ui-statements-page.html#active-executions-table) in the transaction. - **Session ID**: Link to the ID of the [session](ui-sessions-page.html) in which the transaction is running. -If a transaction execution is waiting, the transaction execution details are followed by Contention Insights and details of the transaction execution on which the blocked transaction execution is waiting. For more information about contention, see [Understanding and avoiding transaction contention]({{ link_prefix }}performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention). +If a transaction execution is waiting, the transaction execution details are followed by Contention Insights and details of the transaction execution on which the blocked transaction execution is waiting. For more information about contention, see [Transaction contention]({{ link_prefix }}performance-best-practices-overview.html#transaction-contention). Movr rides transactions diff --git a/_includes/v23.1/ui/insights.md b/_includes/v23.1/ui/insights.md index fe752f4b848..ad84fe59643 100644 --- a/_includes/v23.1/ui/insights.md +++ b/_includes/v23.1/ui/insights.md @@ -21,7 +21,7 @@ The rows in this page are populated from the [`crdb_internal.transaction_content - The default tracing behavior captures a small percent of transactions so not all contention events will be recorded. When investigating [transaction contention]({{ link_prefix }}performance-best-practices-overview.html#transaction-contention), you can set the [`sql.trace.txn.enable_threshold` cluster setting]({{ link_prefix }}cluster-settings.html#setting-sql-trace-txn-enable-threshold) to always capture contention events. {{site.data.alerts.end}} -Transaction executions with the **High Contention** insight are transactions that experienced [contention]({{ link_prefix }}transactions.html#transaction-contention). +Transaction executions with the **High Contention** insight are transactions that experienced [contention]({{ link_prefix }}performance-best-practices-overview.html#transaction-contention). {% if page.cloud != true -%} The following screenshot shows the execution of a transaction flagged with **High Contention**: @@ -84,8 +84,7 @@ To display this view, click **Insights** in the left-hand navigation of the Clou The rows in this page are populated from the [`crdb_internal.cluster_execution_insights`]({{ link_prefix }}crdb-internal.html) table. - The results displayed on the **Statement Executions** view will be available as long as the number of rows in each node is less than the [`sql.insights.execution_insights_capacity` cluster setting]({{ link_prefix }}cluster-settings.html#setting-sql-insights-execution-insights-capacity). -- The default tracing behavior enables captures a small percent of transactions so not all [contention]({{ link_prefix }}performance-best-practices-overview.html#transaction-contention) events will be recorded. When investigating query latency, you can set the [`sql.trace.txn.enable_threshold` cluster setting]({{ link_prefix }}cluster-settings.html#setting-sql-trace-txn-enable-threshold) to always capture contention events. - +- {% include {{ page.version.version }}/performance/sql-trace-txn-enable-threshold.md %} {{site.data.alerts.end}} {% if page.cloud != true -%} diff --git a/_includes/v23.1/ui/sessions.md b/_includes/v23.1/ui/sessions.md index 80577b00644..402e42d0ada 100644 --- a/_includes/v23.1/ui/sessions.md +++ b/_includes/v23.1/ui/sessions.md @@ -35,7 +35,7 @@ Actions | Options to cancel the active statement and cancel the session. These r To view details of a session, click a **Session Start Time (UTC)** to display session details. -## Session details +## Session Details If a session is idle, the **Transaction** and **Most Recent Statement** panels will display **No Active [Transaction | Statement]**. diff --git a/advisories/a102375.md b/advisories/a102375.md new file mode 100644 index 00000000000..96013618531 --- /dev/null +++ b/advisories/a102375.md @@ -0,0 +1,49 @@ +--- +title: Technical Advisory 102375 +advisory: A-102375 +summary: Some customers may experience spurious privilege errors when trying to run queries due to a bug in the query cache. +toc: true +affected_versions: v22.1.19 and v22.2.8 +advisory_date: 2023-05-11 +docs_area: releases +--- + +Publication date: {{ page.advisory_date | date: "%B %e, %Y" }} + +## Description + +In CockroachDB versions v22.1.19 and v22.2.8, some customers may experience spurious [privilege](../v22.2/security-reference/authorization.html#privileges) errors when trying to run queries due to a bug in the query cache. This can happen if two or more databases exist on the same cluster with tables that have the same name and at least one [foreign key reference](../v22.2/foreign-key.html). If identical queries are used to query the tables in the two different databases by users with different permissions, they may experience errors due to insufficient privileges. + +## Statement + +This is resolved in CockroachDB by PR [#102405](https://github.com/cockroachdb/cockroach/issues/102405) which ensures that privilege checks happen after staleness checks when attempting to use the query cache. + +The fix has been applied to the maintenance release of CockroachDB [v22.2.9](../releases/v22.2.html#v22-2-9). + +This fix will be applied to the maintenance release of CockroachDB v22.1.20. + +This public issue is tracked by [#102375](https://github.com/cockroachdb/cockroach/issues/102375). + +## Mitigation + +Users of CockroachDB v22.1.19 and v22.2.8 who experience spurious [privilege](../v22.2/security-reference/authorization.html#privileges) errors with the query cache enabled are encouraged to upgrade to v22.1.20, v22.2.9, or a later version. + +If an upgrade is not possible, the issue can be avoided by updating the SQL queries to qualify table names with the database name so there is no collision in the query cache. For example, `SELECT * FROM table_name` can be rewritten using [partially qualified](../v22.2/sql-name-resolution.html#lookup-with-partially-qualified-names) or [fully qualified](../v22.2/sql-name-resolution.html#lookup-with-fully-qualified-names) names as follows: + +- `SELECT * FROM database_name.table_name` +- `SELECT * FROM database_name.schema_name.table_name` + +Another option, if an upgrade is not possible, is to disable the query cache with the following command: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET CLUSTER SETTING sql.query_cache.enabled = false; +~~~ + +Disabling the query cache may degrade the performance of the cluster, however. + +## Impact + +Some customers running identical queries with different roles to access tables with the same name in different databases could experience spurious [privilege](../v22.2/security-reference/authorization.html#privileges) errors on CockroachDB v22.1.19 and v22.2.8 with the query cache enabled. + +Please reach out to the [support team](https://support.cockroachlabs.com) if more information or assistance is needed. diff --git a/cockroachcloud/egress-perimeter-controls.md b/cockroachcloud/egress-perimeter-controls.md index 43e312d58c3..25583488925 100644 --- a/cockroachcloud/egress-perimeter-controls.md +++ b/cockroachcloud/egress-perimeter-controls.md @@ -7,10 +7,6 @@ docs_area: security cloud: true --- -{{site.data.alerts.callout_info}} -{% include_cached feature-phases/limited-access.md %} -{{site.data.alerts.end}} - This page describes how Egress Perimeter Controls can enhance the security of {{ site.data.products.dedicated }} clusters, and gives an overview of how to manage a cluster's egress rules. ## Why use Egress Perimeter Controls diff --git a/jekyll-algolia-dev/lib/jekyll/algolia/indexer.rb b/jekyll-algolia-dev/lib/jekyll/algolia/indexer.rb index 500b4711a1b..b22c608e7cd 100644 --- a/jekyll-algolia-dev/lib/jekyll/algolia/indexer.rb +++ b/jekyll-algolia-dev/lib/jekyll/algolia/indexer.rb @@ -339,6 +339,12 @@ def self.update_synonyms synonyms: ['schema conversion tool', 'sct'] }, false) + index.save_synonym('full text search', { + objectID: 'full text search', + type: 'synonym', + synonyms: ['full text search', 'fts'] + }, false) + return end diff --git a/v22.2/cockroachdb-feature-availability.md b/v22.2/cockroachdb-feature-availability.md index f4e1f2a6c13..b3bbaf00450 100644 --- a/v22.2/cockroachdb-feature-availability.md +++ b/v22.2/cockroachdb-feature-availability.md @@ -14,7 +14,7 @@ This page outlines _feature availability_, which is separate from Cockroach Labs ## Feature availability phases -Phase | Definition | Accessibility +Phase | Definition | Accessibility ----------------------------------------------+------------+------------- Private preview | Feature is not production-ready and will not be publicly documented. | Invite-only [Limited access](#features-in-limited-access) | Feature is production-ready but not available widely because of known limitations and/or because capabilities may change or be added based on feedback. | Opt-in
Contact your Cockroach Labs account team. @@ -31,18 +31,6 @@ General availability (GA) | Feature is production-ready and {{ site.data.products.dedicated }} users can use the [Cloud API](../cockroachcloud/cloud-api.html) to configure [log export](../cockroachcloud/export-logs.html) to [AWS CloudWatch](https://aws.amazon.com/cloudwatch/) or [GCP Cloud Logging](https://cloud.google.com/logging). Once the export is configured, logs will flow from all nodes in all regions of your {{ site.data.products.dedicated }} cluster to your chosen cloud log sink. You can configure log export to redact sensitive log entries, limit log output by severity, and send log entries to specific log group targets by log channel, among others. -### Customer-Managed Encryption Keys (CMEK) on {{ site.data.products.dedicated }} - -[Customer-Managed Encryption Keys (CMEK)](../cockroachcloud/cmek.html) allow you to protect data at rest in a {{ site.data.products.dedicated }} cluster using a cryptographic key that is entirely within your control, hosted in a supported key-management system (KMS) platform. - -### Egress perimeter controls for {{ site.data.products.dedicated }} - -[Egress Perimeter Controls](../cockroachcloud/egress-perimeter-controls.html) can enhance the security of {{ site.data.products.dedicated }} clusters by enabling cluster administrators to restrict egress to a list of specified external destinations. This adds a strong layer of protection against malicious or accidental data exfiltration. - -### Private {{ site.data.products.dedicated }} clusters - -Limiting access to a CockroachDB cluster's nodes over the public internet is an important security practice and is also a compliance requirement for many organizations. [{{ site.data.products.dedicated }} private clusters](../cockroachcloud/private-clusters.html) allow organizations to meet this objective. A private {{ site.data.products.dedicated }} cluster's nodes have no public IP addresses, and egress traffic moves over private subnets and through a highly-available NAT gateway that is unique to the cluster. - ### Export Cloud Organization audit logs (Cloud API) {{ site.data.products.db }} captures audit logs when many types of events occur, such as when a cluster is created or when a user is added to or removed from an organization. Any user in an organization with an admin-level service account can [export these audit logs](../cockroachcloud/cloud-org-audit-logs.html) using the [`auditlogevents` endpoint](../cockroachcloud/cloud-api.html#cloud-audit-logs) of the [Cloud API](../cockroachcloud/cloud-api.html). @@ -144,7 +132,7 @@ CockroachDB supports [altering the column types](alter-table.html#alter-column-d [Temporary tables](temporary-tables.html), [temporary views](views.html#temporary-views), and [temporary sequences](create-sequence.html#temporary-sequences) are in preview in CockroachDB. If you create too many temporary objects in a session, the performance of DDL operations will degrade. Performance limitations could persist long after creating the temporary objects. For more details, see [cockroachdb/cockroach#46260](https://github.com/cockroachdb/cockroach/issues/46260). -To enable temporary objects, set the `experimental_enable_temp_tables` [session variable](show-vars.html) to `on`. +To enable temporary objects, set the `experimental_enable_temp_tables` [session variable](show-vars.html) to `on`. ### Password authentication without TLS @@ -187,7 +175,7 @@ Use a [webhook sink](changefeed-sinks.html#webhook-sink) to deliver changefeed m ### Change data capture transformations -[Change data capture transformations](cdc-transformations.html) allow you to define the change data emitted to your sink when you create a changefeed. The expression syntax provides a way to select columns and apply filters to further restrict or transform the data in your [changefeed messages](changefeed-messages.html). +[Change data capture transformations](cdc-transformations.html) allow you to define the change data emitted to your sink when you create a changefeed. The expression syntax provides a way to select columns and apply filters to further restrict or transform the data in your [changefeed messages](changefeed-messages.html). ### External connections diff --git a/v23.1/alter-range.md b/v23.1/alter-range.md index b64d6c23654..5f770cb3f07 100644 --- a/v23.1/alter-range.md +++ b/v23.1/alter-range.md @@ -115,7 +115,7 @@ SELECT store_id FROM crdb_internal.kv_store_status; #### Find range ID and leaseholder information -To use `ALTER RANGE ... RELOCATE`, you need to know how to find the range ID, leaseholder, and other information for a [table](show-ranges.html#show-ranges-for-a-table-primary-index), [index](show-ranges.html#show-ranges-for-an-index), or [database](show-ranges.html#show-ranges-for-a-database). You can find this information using the [`SHOW RANGES`](show-ranges.html) statement. +To use `ALTER RANGE ... RELOCATE`, you need to know how to find the range ID, leaseholder, and other information for a [table](show-ranges.html#show-ranges-for-a-table), [index](show-ranges.html#show-ranges-for-an-index), or [database](show-ranges.html#show-ranges-for-a-database). You can find this information using the [`SHOW RANGES`](show-ranges.html) statement. For example, to get all range IDs, leaseholder store IDs, and leaseholder localities for the [`movr.users`](movr.html) table, use the following query: diff --git a/v23.1/architecture/storage-layer.md b/v23.1/architecture/storage-layer.md index a4510039a66..5768dcf5104 100644 --- a/v23.1/architecture/storage-layer.md +++ b/v23.1/architecture/storage-layer.md @@ -154,8 +154,8 @@ CockroachDB regularly garbage collects MVCC values to reduce the size of data st Garbage collection can only run on MVCC values which are not covered by a *protected timestamp*. The protected timestamp subsystem exists to ensure the safety of operations that rely on historical data, such as: -- [Backups](../backup.html) -- [Changefeeds](../change-data-capture-overview.html) +- [Backups](../create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) +- [Changefeeds](../changefeed-messages.html#garbage-collection-and-changefeeds) Protected timestamps ensure the safety of historical data while also enabling shorter [GC TTLs](../configure-replication-zones.html#gc-ttlseconds). A shorter GC TTL means that fewer previous MVCC values are kept around. This can help lower query execution costs for workloads which update rows frequently throughout the day, since [the SQL layer](sql-layer.html) has to scan over previous MVCC values to find the current value of a row. @@ -165,6 +165,8 @@ Protected timestamps work by creating *protection records*, which are stored in Upon successful creation of a protection record, the MVCC values for the specified data at timestamps less than or equal to the protected timestamp will not be garbage collected. When the job that created the protection record finishes its work, it removes the record, allowing the garbage collector to run on the formerly protected values. +For further detail on protected timestamps, see the Cockroach Labs Blog [Protected Timestamps: For a future with less garbage](https://www.cockroachlabs.com/blog/protected-timestamps-for-less-garbage/). + ## Interactions with other layers ### Storage and replication layers diff --git a/v23.1/architecture/transaction-layer.md b/v23.1/architecture/transaction-layer.md index c997cdda27d..ae6a3885596 100644 --- a/v23.1/architecture/transaction-layer.md +++ b/v23.1/architecture/transaction-layer.md @@ -105,7 +105,7 @@ Whenever a write occurs, its timestamp is checked against the timestamp cache. I ### Closed timestamps -Each CockroachDB range tracks a property called its _closed timestamp_, which means that no new writes can ever be introduced at or below that timestamp. The closed timestamp is advanced continuously on the leaseholder, and lags the current time by some target interval. As the closed timestamp is advanced, notifications are sent to each follower. If a range receives a write at a timestamp less than or equal to its closed timestamp, the write is forced to change its timestamp, which might result in a transaction retry error (see [read refreshing](#read-refreshing)). +Each CockroachDB range tracks a property called its _closed timestamp_, which means that no new writes can ever be introduced at or below that timestamp. The closed timestamp is advanced continuously on the leaseholder, and lags the current time by some target interval. As the closed timestamp is advanced, notifications are sent to each follower. If a range receives a write at a timestamp less than or equal to its closed timestamp, the write is forced to change its timestamp, which might result in a [transaction retry error](../transaction-retry-error-reference.html) (see [read refreshing](#read-refreshing)). In other words, a closed timestamp is a promise by the range's [leaseholder](replication-layer.html#leases) to its follower replicas that it will not accept writes below that timestamp. Generally speaking, the leaseholder continuously closes timestamps a few seconds in the past. @@ -190,7 +190,7 @@ For more details about how the concurrency manager works with the latch manager #### Concurrency manager - The concurrency manager is a structure that sequences incoming requests and provides isolation between the transactions that issued those requests that intend to perform conflicting operations. During sequencing, conflicts are discovered and any found are resolved through a combination of passive queuing and active pushing. Once a request has been sequenced, it is free to evaluate without concerns of conflicting with other in-flight requests due to the isolation provided by the manager. This isolation is guaranteed for the lifetime of the request but terminates once the request completes. +The concurrency manager is a structure that sequences incoming requests and provides isolation between the transactions that issued those requests that intend to perform conflicting operations. During sequencing, conflicts are discovered and any found are resolved through a combination of passive queuing and active pushing. Once a request has been sequenced, it is free to evaluate without concerns of conflicting with other in-flight requests due to the isolation provided by the manager. This isolation is guaranteed for the lifetime of the request but terminates once the request completes. Each request in a transaction should be isolated from other requests, both during the request's lifetime and after the request has completed (assuming it acquired locks), but within the surrounding transaction's lifetime. @@ -263,7 +263,7 @@ To make this simpler to understand, we'll call the first transaction `TxnA` and CockroachDB proceeds through the following steps: -1. If the transaction has an explicit priority set (i.e., `HIGH` or `LOW`), the transaction with the lower priority is aborted (in the write/write case) or has its timestamp pushed (in the write/read case). +1. If the transaction has an explicit priority set (i.e., `HIGH` or `LOW`), the transaction with the lower priority is aborted (in the write/write case) or has its timestamp [pushed](#timestamp-cache) (in the write/read case). 1. If the encountered transaction is expired, it's `ABORTED` and conflict resolution succeeds. We consider a write intent expired if: - It doesn't have a transaction record and its timestamp is outside of the transaction liveness threshold. @@ -297,8 +297,9 @@ If there is a deadlock between transactions (i.e., they're each blocked by each ### Read refreshing -Whenever a transaction's timestamp has been pushed, additional checks are required before allowing it to commit at the pushed timestamp: any values which the transaction previously read must be checked to verify that no writes have subsequently occurred between the original transaction timestamp and the pushed transaction timestamp. This check prevents serializability violation. The check is done by keeping track of all the reads using a dedicated `RefreshRequest`. If this succeeds, the transaction is allowed to commit (transactions perform this check at commit time if they've been pushed by a different transaction or by the [timestamp cache](#timestamp-cache), or they perform the check whenever they encounter a [`ReadWithinUncertaintyIntervalError`](../transaction-retry-error-reference.html#readwithinuncertaintyinterval) immediately, before continuing). -If the refreshing is unsuccessful, then the transaction must be retried at the pushed timestamp. +Whenever a transaction's timestamp has been pushed, additional checks are required before allowing it to commit at the pushed timestamp: any values which the transaction previously read must be checked to verify that no writes have subsequently occurred between the original transaction timestamp and the pushed transaction timestamp. This check prevents serializability violation. + +The check is done by keeping track of all the reads using a dedicated `RefreshRequest`. If this succeeds, the transaction is allowed to commit (transactions perform this check at commit time if they've been pushed by a different transaction or by the [timestamp cache](#timestamp-cache), or they perform the check whenever they encounter a [`ReadWithinUncertaintyIntervalError`](../transaction-retry-error-reference.html#readwithinuncertaintyintervalerror) immediately, before continuing). If the refreshing is unsuccessful (also known as *read invalidation*), then the transaction must be retried at the pushed timestamp. ### Transaction pipelining @@ -398,7 +399,7 @@ Additionally, when other transactions encounter a transaction in `STAGING` state ## Non-blocking transactions - CockroachDB supports low-latency, global reads of read-mostly data in [multi-region clusters](../multiregion-overview.html) using _non-blocking transactions_: an extension of the [standard read-write transaction protocol](#overview) that allows a writing transaction to perform [locking](#concurrency-control) in a manner such that contending reads by other transactions can avoid waiting on its locks. +CockroachDB supports low-latency, global reads of read-mostly data in [multi-region clusters](../multiregion-overview.html) using _non-blocking transactions_: an extension of the [standard read-write transaction protocol](#overview) that allows a writing transaction to perform [locking](#concurrency-control) in a manner such that contending reads by other transactions can avoid waiting on its locks. The non-blocking transaction protocol and replication scheme differ from standard read-write transactions as follows: diff --git a/v23.1/backup-and-restore-monitoring.md b/v23.1/backup-and-restore-monitoring.md index 2cd15f4ceb2..7cb18a26bc5 100644 --- a/v23.1/backup-and-restore-monitoring.md +++ b/v23.1/backup-and-restore-monitoring.md @@ -32,29 +32,37 @@ See the [Monitor CockroachDB with Prometheus](monitor-cockroachdb-with-prometheu We recommend the following guidelines: -- Use the `schedules_backup_last_completed_time` metric to monitor the specific backup job or jobs you would use to recover from a disaster. -- Configure alerting on the `schedules_backup_last_completed_time` metric to watch for cases where the timestamp has not moved forward as expected. +- Use the `schedules.BACKUP.last_completed_time` metric to monitor the specific backup job or jobs you would use to recover from a disaster. +- Configure alerting on the `schedules.BACKUP.last_completed_time` metric to watch for cases where the timestamp has not moved forward as expected. Metric | Description -------+------------- -`schedules_backup_succeeded` | The number of scheduled backup jobs that have succeeded. -`schedules_backup_started` | The number of scheduled backup jobs that have started. -`schedules_backup_last_completed_time` | The Unix timestamp of the most recently completed scheduled backup specified as maintaining this metric. **Note:** This metric only updates if the schedule was created with the [`updates_cluster_last_backup_time_metric` option](create-schedule-for-backup.html#schedule-options). -`schedules_backup_failed` | The number of scheduled backup jobs that have failed. **Note:** A stuck scheduled job will not increment this metric. -`schedules_round_reschedule_wait` | The number of schedules that were rescheduled due to a currently running job. A value greater than 0 indicates that a previous backup was still running when a new scheduled backup was supposed to start. This corresponds to the [`on_previous_running=wait`](create-schedule-for-backup.html#on-previous-running-option) schedule option. -`schedules_round_reschedule_skip` | The number of schedules that were skipped due to a currently running job. A value greater than 0 indicates that a previous backup was still running when a new scheduled backup was supposed to start. This corresponds to the [`on_previous_running=skip`](create-schedule-for-backup.html#on-previous-running-option) schedule option. -`jobs_backup_currently_running` | The number of backup jobs currently running in `Resume` or `OnFailOrCancel` state. -`jobs_backup_fail_or_cancel_retry_error` | The number of backup jobs that failed with a retryable error on their failure or cancelation process. -`jobs_backup_fail_or_cancel_completed` | The number of backup jobs that successfully completed their failure or cancelation process. -`jobs_backup_fail_or_cancel_failed` | The number of backup jobs that failed with a non-retryable error on their failure or cancelation process. -`jobs_backup_resume_failed` | The number of backup jobs that failed with a non-retryable error. -`jobs_backup_resume_retry_error` | The number of backup jobs that failed with a retryable error. -`jobs_restore_resume_retry_error` | The number of restore jobs that failed with a retryable error. -`jobs_restore_resume_completed` | The number of restore jobs that successfully resumed to completion. -`jobs_restore_resume_failed` | The number of restore jobs that failed with a non-retryable error. -`jobs_restore_fail_or_cancel_failed` | The number of restore jobs that failed with a non-retriable error on their failure or cancelation process. -`jobs_restore_fail_or_cancel_retry_error` | The number of restore jobs that failed with a retryable error on their failure or cancelation process. -`jobs_restore_currently_running` | The number of restore jobs currently running in `Resume` or `OnFailOrCancel` state. +`schedules.BACKUP.failed` | The number of scheduled backup jobs that have failed. **Note:** A stuck scheduled job will not increment this metric. +`schedules.BACKUP.last_completed_time` | The Unix timestamp of the most recently completed scheduled backup specified as maintaining this metric. **Note:** This metric only updates if the schedule was created with the [`updates_cluster_last_backup_time_metric` option](create-schedule-for-backup.html#schedule-options). +New in v23.1: `schedules.BACKUP.protected_age_sec` | The age of the oldest [protected timestamp record](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) protected by backup schedules. +New in v23.1: `schedules.BACKUP.protected_record_count` | The number of [protected timestamp records](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) held by backup schedules. +`schedules.BACKUP.started` | The number of scheduled backup jobs that have started. +`schedules.BACKUP.succeeded` | The number of scheduled backup jobs that have succeeded. +`schedules.round.reschedule_skip` | The number of schedules that were skipped due to a currently running job. A value greater than 0 indicates that a previous backup was still running when a new scheduled backup was supposed to start. This corresponds to the [`on_previous_running=skip`](create-schedule-for-backup.html#on-previous-running-option) schedule option. +`schedules.round.reschedule_wait` | The number of schedules that were rescheduled due to a currently running job. A value greater than 0 indicates that a previous backup was still running when a new scheduled backup was supposed to start. This corresponds to the [`on_previous_running=wait`](create-schedule-for-backup.html#on-previous-running-option) schedule option. +New in v23.1: `jobs.backup.currently_paused` | The number of backup jobs currently considered [paused](pause-job.html). +`jobs.backup.currently_running` | The number of backup jobs currently running in `Resume` or `OnFailOrCancel` state. +`jobs.backup.fail_or_cancel_retry_error` | The number of backup jobs that failed with a retryable error on their failure or cancelation process. +`jobs.backup.fail_or_cancel_completed` | The number of backup jobs that successfully completed their failure or cancelation process. +`jobs.backup.fail_or_cancel_failed` | The number of backup jobs that failed with a non-retryable error on their failure or cancelation process. +New in v23.1: `jobs.backup.protected_age_sec` | The age of the oldest [protected timestamp record](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) protected by backup jobs. +New in v23.1: `jobs.backup.protected_record_count` | The number of [protected timestamp records](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) held by backup jobs. +`jobs.backup.resume_failed` | The number of backup jobs that failed with a non-retryable error. +`jobs.backup.resume_retry_error` | The number of backup jobs that failed with a retryable error. +New in v23.1: `jobs.restore.currently_paused` | The number of restore jobs currently considered [paused](pause-job.html). +`jobs.restore.currently_running` | The number of restore jobs currently running in `Resume` or `OnFailOrCancel` state. +`jobs.restore.fail_or_cancel_failed` | The number of restore jobs that failed with a non-retriable error on their failure or cancelation process. +`jobs.restore.fail_or_cancel_retry_error` | The number of restore jobs that failed with a retryable error on their failure or cancelation process. +New in v23.1: `jobs.restore.protected_age_sec` | The age of the oldest [protected timestamp record](architecture/storage-layer.html#protected-timestamps) protected by restore jobs. +New in v23.1: `jobs.restore.protected_record_count` | The number of [protected timestamp records](architecture/storage-layer.html#protected-timestamps) held by restore jobs. +`jobs.restore.resume_completed` | The number of restore jobs that successfully resumed to completion. +`jobs.restore.resume_failed` | The number of restore jobs that failed with a non-retryable error. +`jobs.restore.resume_retry_error` | The number of restore jobs that failed with a retryable error. ## Datadog integration @@ -67,10 +75,10 @@ To use the Datadog integration with your **{{ site.data.products.dedicated }}** Metric | Description -------+------------- -`schedules_backup_succeeded` | The number of scheduled backup jobs that have succeeded. -`schedules_backup_started` | The number of scheduled backup jobs that have started. -`schedules_backup_last_completed_time` | The Unix timestamp of the most recently completed backup by a schedule specified as maintaining this metric. -`schedules_backup_failed` | The number of scheduled backup jobs that have failed. +`schedules.BACKUP.succeeded` | The number of scheduled backup jobs that have succeeded. +`schedules.BACKUP.started` | The number of scheduled backup jobs that have started. +`schedules.BACKUP.last_completed_time` | The Unix timestamp of the most recently completed backup by a schedule specified as maintaining this metric. +`schedules.BACKUP.failed` | The number of scheduled backup jobs that have failed. ## See also diff --git a/v23.1/build-a-go-app-with-cockroachdb-upperdb.md b/v23.1/build-a-go-app-with-cockroachdb-upperdb.md index da67a01d9b5..8dc1272ee01 100644 --- a/v23.1/build-a-go-app-with-cockroachdb-upperdb.md +++ b/v23.1/build-a-go-app-with-cockroachdb-upperdb.md @@ -48,7 +48,7 @@ The sample code shown below uses upper/db to map Go-specific objects to SQL oper {% include {{ page.version.version }}/app/upperdb-basic-sample/main.go %} ~~~ -Note that the sample code also includes a function that simulates a transaction error (`crdbForceRetry()`). Upper/db's CockroachDB adapter [automatically retries transactions](transactions.html#client-side-intervention) when transaction errors are thrown. As a result, this function forces a transaction retry. +Note that the sample code also includes a function that simulates a transaction error (`crdbForceRetry()`). Upper/db's CockroachDB adapter [automatically retries transactions](transaction-retry-error-reference.html#client-side-retry-handling) when transaction errors are thrown. As a result, this function forces a transaction retry. To run the code, copy the sample above, or download it directly. @@ -85,7 +85,7 @@ The sample code shown below uses upper/db to map Go-specific objects to SQL oper {% include {{ page.version.version }}/app/insecure/upperdb-basic-sample/main.go %} ~~~ -Note that the sample code also includes a function that simulates a transaction error (`crdbForceRetry()`). Upper/db's CockroachDB adapter [automatically retries transactions](transactions.html#client-side-intervention) when transaction errors are thrown. As a result, this function forces a transaction retry. +Note that the sample code also includes a function that simulates a transaction error (`crdbForceRetry()`). Upper/db's CockroachDB adapter [automatically retries transactions](transaction-retry-error-reference.html#client-side-retry-handling) when transaction errors are thrown. As a result, this function forces a transaction retry. Copy the code or download it directly. diff --git a/v23.1/bulk-delete-data.md b/v23.1/bulk-delete-data.md index 815407e9f6e..e6260660fbd 100644 --- a/v23.1/bulk-delete-data.md +++ b/v23.1/bulk-delete-data.md @@ -84,7 +84,7 @@ If you cannot index the column that identifies the unwanted rows, we recommend d 1. Execute a [`SELECT` query](selection-queries.html) that returns the primary key values for the rows that you want to delete. When writing the `SELECT` query: - Use a `WHERE` clause that filters on the column identifying the rows. - - Add an [`AS OF SYSTEM TIME` clause](as-of-system-time.html) to the end of the selection subquery, or run the selection query in a separate, read-only transaction with [`SET TRANSACTION AS OF SYSTEM TIME`](as-of-system-time.html#use-as-of-system-time-in-transactions). This helps to reduce [transaction contention](transactions.html#transaction-contention). + - Add an [`AS OF SYSTEM TIME` clause](as-of-system-time.html) to the end of the selection subquery, or run the selection query in a separate, read-only transaction with [`SET TRANSACTION AS OF SYSTEM TIME`](as-of-system-time.html#use-as-of-system-time-in-transactions). This helps to reduce [transaction contention](performance-best-practices-overview.html#transaction-contention). - Use a [`LIMIT`](limit-offset.html) clause to limit the number of rows queried to a subset of the rows that you want to delete. To determine the optimal `SELECT` batch size, try out different sizes (10,000 rows, 100,000 rows, 1,000,000 rows, etc.), and monitor the change in performance. Note that this `SELECT` batch size can be much larger than the batch size of rows that are deleted in the subsequent `DELETE` query. - To ensure that rows are efficiently scanned in the subsequent `DELETE` query, include an [`ORDER BY`](order-by.html) clause on the primary key. diff --git a/v23.1/bulk-update-data.md b/v23.1/bulk-update-data.md index 7e015b8510f..74abff27933 100644 --- a/v23.1/bulk-update-data.md +++ b/v23.1/bulk-update-data.md @@ -34,7 +34,7 @@ Before reading this page, do the following: - Use a `WHERE` clause to filter on columns that identify the rows that you want to update. This clause should also filter out the rows that have been updated by previous iterations of the nested `UPDATE` loop: - For optimal performance, the first condition of the filter should evaluate the last primary key value returned by the last `UPDATE` query that was executed. This narrows each `SELECT` query's scan to the fewest rows possible, and preserves the performance of the row updates over time. - Another condition of the filter should evaluate column values persisted to the database that signal whether or not a row has been updated. This prevents rows from being updated more than once, in the event that the application or script crashes and needs to be restarted. If there is no way to distinguish between an updated row and a row that has not yet been updated, you might need to [add a new column to the table](alter-table.html#add-column) (e.g., `ALTER TABLE ... ADD COLUMN updated BOOL;`). - - Add an [`AS OF SYSTEM TIME` clause](as-of-system-time.html) to the end of the selection subquery, or run the selection query in a separate, read-only transaction with [`SET TRANSACTION AS OF SYSTEM TIME`](as-of-system-time.html#use-as-of-system-time-in-transactions). This helps to reduce [transaction contention](transactions.html#transaction-contention). + - Add an [`AS OF SYSTEM TIME` clause](as-of-system-time.html) to the end of the selection subquery, or run the selection query in a separate, read-only transaction with [`SET TRANSACTION AS OF SYSTEM TIME`](as-of-system-time.html#use-as-of-system-time-in-transactions). This helps to reduce [transaction contention](performance-best-practices-overview.html#transaction-contention). - Use a [`LIMIT`](limit-offset.html) clause to limit the number of rows queried to a subset of the rows that you want to update. To determine the optimal `SELECT` batch size, try out different sizes (10,000 rows, 20,000 rows, etc.), and monitor the change in performance. Note that this `SELECT` batch size can be much larger than the batch size of rows that are updated in the subsequent `UPDATE` query. - To ensure that rows are efficiently scanned in the subsequent `UPDATE` query, include an [`ORDER BY`](order-by.html) clause on the primary key. diff --git a/v23.1/cdc-queries.md b/v23.1/cdc-queries.md index de251a2c02a..e636b4df3ee 100644 --- a/v23.1/cdc-queries.md +++ b/v23.1/cdc-queries.md @@ -167,6 +167,10 @@ CREATE CHANGEFEED INTO sink AS SELECT * FROM table WHERE crdb_region = 'europe-w For more detail on targeting `REGIONAL BY ROW` tables with changefeeds, see [Changefeeds in Multi-Region Deployments](changefeeds-in-multi-region-deployments.html). +{{site.data.alerts.callout_success}} +If you are running changefeeds from a [multi-region](multiregion-overview.html) cluster, you may want to define which nodes take part in running the changefeed job. You can use the [`execution_locality` option](changefeeds-in-multi-region-deployments.html#run-a-changefeed-job-by-locality) with key-value pairs to specify the [locality designations](cockroach-start.html#locality) nodes must meet. +{{site.data.alerts.end}} + ### Stabilize the changefeed message schema As changefeed messages emit from the database, message formats can vary as tables experience [schema changes](changefeed-messages.html#schema-changes). You can select columns with [typecasting](data-types.html#data-type-conversions-and-casts) to prevent message fields from changing during a changefeed's lifecycle: diff --git a/v23.1/change-data-capture-overview.md b/v23.1/change-data-capture-overview.md index 7865bb790e2..cfa0068af78 100644 --- a/v23.1/change-data-capture-overview.md +++ b/v23.1/change-data-capture-overview.md @@ -18,8 +18,8 @@ The main feature of CDC is the changefeed, which targets an allowlist of tables, --------------------------------------------------|-----------------------------------------------------------------| | Useful for prototyping or quick testing. | Recommended for production use. | | Available in all products. | Available in {{ site.data.products.dedicated }} or with an [{{ site.data.products.enterprise }} license](enterprise-licensing.html) in {{ site.data.products.core }} or {{ site.data.products.serverless }}. | -| Streams indefinitely to the SQL client until underlying SQL connection is closed. | Maintains connection to configured sink ([Kafka](changefeed-sinks.html#kafka), [Google Cloud Pub/Sub](changefeed-sinks.html#google-cloud-pub-sub), [Amazon S3](changefeed-sinks.html#amazon-s3), [Google Cloud Storage](changefeed-sinks.html#google-cloud-storage), [Azure Storage](changefeed-sinks.html#azure-blob-storage), [HTTP](changefeed-sinks.html#http), [Webhook](changefeed-sinks.html#webhook-sink)). | -| Create with [`EXPERIMENTAL CHANGEFEED FOR`](changefeed-for.html). | Create with [`CREATE CHANGEFEED`](create-changefeed.html).
Use `CREATE CHANGEFEED` with [CDC queries](cdc-queries.html) to define the emitted change data.
Create a scheduled changefeed with [`CREATE SCHEDULE FOR CHANGEFEED`](create-schedule-for-changefeed.html).
[Export data](export-data-with-changefeeds.html) with changefeeds. | +| Streams indefinitely until underlying SQL connection is closed. | Maintains connection to configured sink ([Kafka](changefeed-sinks.html#kafka), [Google Cloud Pub/Sub](changefeed-sinks.html#google-cloud-pub-sub), [Amazon S3](changefeed-sinks.html#amazon-s3), [Google Cloud Storage](changefeed-sinks.html#google-cloud-storage), [Azure Storage](changefeed-sinks.html#azure-blob-storage), [HTTP](changefeed-sinks.html#http), [Webhook](changefeed-sinks.html#webhook-sink)). | +| Create with [`EXPERIMENTAL CHANGEFEED FOR`](changefeed-for.html). | Create with [`CREATE CHANGEFEED`](create-changefeed.html).
Use `CREATE CHANGEFEED` with [CDC queries](cdc-queries.html) to define the emitted change data.
New in v23.1: Create a scheduled changefeed with [`CREATE SCHEDULE FOR CHANGEFEED`](create-schedule-for-changefeed.html).
New in v23.1: Use [`execution_locality`](changefeeds-in-multi-region-deployments.html#run-a-changefeed-job-by-locality) to determine node locality for changefeed job execution. | | Watches one or multiple tables in a comma-separated list. Emits every change to a "watched" row as a record. | Watches one or multiple tables in a comma-separated list. Emits every change to a "watched" row as a record in a configurable format (JSON, CSV, Avro) to a [configurable sink](changefeed-sinks.html) (e.g., [Kafka](https://kafka.apache.org/)). | | [`CREATE`](create-and-configure-changefeeds.html?filters=core) changefeed and cancel by closing the connection. | Manage changefeed with [`CREATE`](create-and-configure-changefeeds.html#create), [`PAUSE`](create-and-configure-changefeeds.html#pause), [`RESUME`](create-and-configure-changefeeds.html#resume), [`ALTER`](alter-changefeed.html), and [`CANCEL`](create-and-configure-changefeeds.html#cancel), as well as [monitor](monitor-and-debug-changefeeds.html#monitor-a-changefeed) and [debug](monitor-and-debug-changefeeds.html#debug-a-changefeed). | @@ -37,6 +37,8 @@ With [`resolved`](create-changefeed.html#resolved-option) specified when a chang As rows are updated, added, and deleted in the targeted table(s), the node sends the row changes through the [rangefeed mechanism](create-and-configure-changefeeds.html#enable-rangefeeds) to the changefeed encoder, which encodes these changes into the [final message format](changefeed-messages.html#responses). The message is emitted from the encoder to the sink—it can emit to any endpoint in the sink. In the diagram example, this means that the messages can emit to any Kafka Broker. +If you are running changefeeds from a [multi-region](multiregion-overview.html) cluster, you may want to define which nodes take part in running the changefeed job. You can use the [`execution_locality` option](changefeeds-in-multi-region-deployments.html#run-a-changefeed-job-by-locality) with key-value pairs to specify the locality requirements nodes must meet. See [Job coordination using the execution locality option](changefeeds-in-multi-region-deployments.html#job-coordination-using-the-execution-locality-option) for detail on how a changefeed job works with this option. + See the following for more detail on changefeed setup and use: - [Enable rangefeeds](create-and-configure-changefeeds.html#enable-rangefeeds) diff --git a/v23.1/changefeed-messages.md b/v23.1/changefeed-messages.md index 87980bf4e91..8b80a8f9a39 100644 --- a/v23.1/changefeed-messages.md +++ b/v23.1/changefeed-messages.md @@ -180,7 +180,17 @@ Protected timestamps will protect changefeed data from garbage collection in the - The downstream [changefeed sink](changefeed-sinks.html) is unavailable. Protected timestamps will protect changes until you either [cancel](cancel-job.html) the changefeed or the sink becomes available once again. - You [pause](pause-job.html) a changefeed with the [`protect_data_from_gc_on_pause`](create-changefeed.html#protect-pause) option enabled. Protected timestamps will protect changes until you [resume](resume-job.html) the changefeed. -However, if the changefeed lags too far behind, the protected changes could cause data storage issues. To release the protected timestamps and allow garbage collection to resume, you can cancel the changefeed or [resume](resume-job.html) in the case of a paused changefeed. +However, if the changefeed lags too far behind, the protected changes could lead to an accumulation of garbage. This could result in increased disk usage and degraded performance for some workloads. To release the protected timestamps and allow garbage collection to resume, you can: + +- [Cancel](cancel-job.html) the changefeed job. +- [Resume](resume-job.html) a paused changefeed job. +- {% include_cached new-in.html version="v23.1" %} Set the [`gc_protect_expires_after`](create-changefeed.html#gc-protect-expire) option, which will automatically expire the protected timestamp records that are older than your defined duration and cancel the changefeed job. + + For example, if the following changefeed is paused or runs into an error and then pauses, protected timestamps will protect changes for up to 24 hours. After this point, if the changefeed does not resume, the protected timestamp records will expire and the changefeed job will be cancelled. This releases the protected timestamp records and allows garbage collection to resume: + + ~~~sql + CREATE CHANGEFEED FOR TABLE db.table INTO 'external://sink' WITH on_error='pause', protect_data_from_gc_on_pause, gc_protect_expires_after='24h'; + ~~~ We recommend [monitoring](monitor-and-debug-changefeeds.html) storage and the number of running changefeeds. If a changefeed is not advancing and is [retrying](monitor-and-debug-changefeeds.html#changefeed-retry-errors), it will (without limit) accumulate garbage while it retries to run. diff --git a/v23.1/changefeeds-in-multi-region-deployments.md b/v23.1/changefeeds-in-multi-region-deployments.md index cfbb820e9e9..404d55082e9 100644 --- a/v23.1/changefeeds-in-multi-region-deployments.md +++ b/v23.1/changefeeds-in-multi-region-deployments.md @@ -5,7 +5,57 @@ toc: true docs_area: stream_data --- - Changefeeds are supported on [regional by row tables](multiregion-overview.html#regional-by-row-tables). When working with changefeeds on regional by row tables, it is necessary to consider the following: +This page describes features that you can use for changefeeds running on multi-region deployments. + +- {% include_cached new-in.html version="v23.1" %} [Run a changefeed job by locality](#run-a-changefeed-job-by-locality). +- [Run changefeeds on regional by row tables](#run-changefeeds-on-regional-by-row-tables). + +## Run a changefeed job by locality + +{% include_cached new-in.html version="v23.1" %} Use the `execution_locality` option to set locality filter requirements that a node must meet to take part in executing a [changefeed](create-changefeed.html) job. This will pin the [coordination of the changefeed job](change-data-capture-overview.html#how-does-an-enterprise-changefeed-work) and the nodes that process the [changefeed messages](changefeed-messages.html) to the defined locality. + +Defining an execution locality for a changefeed job, could be useful in the following cases: + +- Your [changefeed sink](changefeed-sinks.html) is only available in one region. There is no network connectivity between regions and you need to send all changefeed messages through the node(s) in the sink's region. +- Your cluster runs on a [hybrid topology](topology-patterns.html#multi-region) and you need to send changefeed messages within the same environment. +- Your cluster is [multi-region](multiregion-overview.html) and you need the nodes that are physically closest to the sink to emit changefeed messages. This can avoid cross-regional traffic to reduce expense. +- Your cluster is running through VPC peering connections and you need all the data sent through a particular locality. + +### Syntax + +To specify the locality requirements for the coordinating node, run `execution_locality` with key-value pairs that represent the [locality designations](cockroach-start.html#locality) assigned to the cluster at startup. + +{% include_cached copy-clipboard.html %} +~~~sql +CREATE CHANGEFEED FOR TABLE movr.vehicles INTO 'external://cdc' WITH execution_locality='region=us-east-2,cloud=aws'; +~~~ + +When you run a changefeed with `execution_locality`, consider the following: + +- The changefeed job will fail if no nodes match the locality filter. +- [Selection of the coordinating node](#job-coordination-using-the-execution-locality-option) that matches the locality filter may noticeably increase the startup latency of the changefeed job. +- Even though a changefeed job has been pinned to a locality, it does not guarantee the job will **not** read from another locality if there are no replicas in the defined locality. + +{{site.data.alerts.callout_success}} +To define and filter the change data included in changefeed messages emitted to the sink, see [Change Data Capture Queries](cdc-queries.html). +{{site.data.alerts.end}} + +### Job coordination using the execution locality option + +When you start or [resume](resume-job.html) a changefeed with `execution_locality`, it is necessary to determine the coordinating node for the job. If a node that does not match the locality filter is the first node to claim the job, it will find a node that does match the filter and transfer the execution to it. This can result in a short delay in starting or resuming a changefeed job that has execution locality requirements. When there is no node matching the specified locality, CockroachDB will return an error. + +Once the coordinating node is determined, nodes that match the locality requirements will take part in emitting changefeed messages to the sink. The following will happen in different cases: + +- If the [leaseholder](architecture/reads-and-writes-overview.html#architecture-leaseholder) for the change data matches the filter, it will emit the changefeed messages. +- If the leaseholder does not match the locality filter, a node will be selected matching the locality filter with a preference for nodes with localities that are more similar to the leaseholder. + +When a node matching the locality filter takes part in the changefeed job, that node will read from the closest [replica](architecture/reads-and-writes-overview.html#architecture-replica). If the node is the leaseholder, or is itself a replica, it can read from itself. In the scenario where no replicas are available in the region of the assigned node, it may then read from a replica in a different region. As a result, you may want to consider [placing replicas](configure-replication-zones.html), including potentially [non-voting replicas](architecture/replication-layer.html#non-voting-replicas) that will have less impact on read latency, in the locality or region that you plan on pinning for changefeed job execution. + +For an overview of how a changefeed job works, see the [How does an Enterprise changefeed work?](change-data-capture-overview.html#how-does-an-enterprise-changefeed-work) section. + +## Run changefeeds on regional by row tables + +Changefeeds are supported on [regional by row tables](multiregion-overview.html#regional-by-row-tables). When working with changefeeds on regional by row tables, it is necessary to consider the following: - Setting a table's locality to [`REGIONAL BY ROW`](alter-table.html#regional-by-row) is equivalent to a [schema change](online-schema-changes.html) as the [`crdb_region` column](alter-table.html#crdb_region) becomes a hidden column for each of the rows in the table and is part of the [primary key](primary-key.html). Therefore, when existing tables targeted by changefeeds are made regional by row, it will trigger a backfill of the table through the changefeed. (See [Schema changes with a column backfill](changefeed-messages.html#schema-changes-with-column-backfill) for more details on the effects of schema changes on changefeeds.) diff --git a/v23.1/cockroach-node.md b/v23.1/cockroach-node.md index 38d7037ac45..63a17bb521f 100644 --- a/v23.1/cockroach-node.md +++ b/v23.1/cockroach-node.md @@ -16,7 +16,7 @@ Subcommand | Usage -----------|------ `ls` | List the ID of each node in the cluster, excluding those that have been decommissioned and are offline. `status` | View the status of one or all nodes, excluding nodes that have been decommissioned and taken offline. Depending on flags used, this can include details about range/replicas, disk usage, and decommissioning progress. -`decommission` | Decommission nodes for removal from the cluster. For details, see [Node Shutdown](node-shutdown.html?filters=decommission). +`decommission` | Decommission nodes for removal from the cluster. For more information, see [Decommission nodes](#decommission-nodes). `recommission` | Recommission nodes that are decommissioning. If the decommissioning node has already reached the [draining stage](node-shutdown.html?filters=decommission#draining), you may need to restart the node after it is recommissioned. For details, see [Node Shutdown](node-shutdown.html#recommission-nodes). `drain` | Drain nodes in preparation for process termination. Draining always occurs when sending a termination signal or decommissioning a node. The `drain` subcommand is used to drain nodes without also decommissioning or shutting them down. For details, see [Node Shutdown](node-shutdown.html). @@ -117,10 +117,12 @@ Flag | Description `--stats` | Show node disk usage details. `--timeout` | Set the duration of time that the subcommand is allowed to run before it returns an error and prints partial information. The timeout is specified with a suffix of `s` for seconds, `m` for minutes, and `h` for hours. If this flag is not set, the subcommand may hang. -The `node decommission` subcommand also supports the following general flags: +The `node decommission` subcommand also supports the following general flags. For more information, see `cockroach node decommission --help`. Flag | Description -----|------------ +`--checks` | Whether to perform a set of "decommissioning pre-flight checks". Possible values: `enabled`, `strict`, or `skip`. If `enabled`, CockroachDB will check if a node can successfully complete decommissioning given the current state of the cluster. If errors are detected that would result in the inability to complete node decommissioning, they will be printed to `STDERR` and the command will exit *without attempting to perform node decommissioning*. For more information, see [Remove nodes](node-shutdown.html?filters=decommission#remove-nodes).

**Default:** `enabled` +`--dry-run` | Performs the same decommissioning checks as the `--checks` flag, but without attempting to decommission the node. When `cockroach node decommission {nodeID} --dry-run` is executed, it runs the checks, prints the status of those checks, and exits. `--wait` | When to return to the client. Possible values: `all`, `none`.

If `all`, the command returns to the client only after all replicas on all specified nodes have been transferred to other nodes. If any specified nodes are offline, the command will not return to the client until those nodes are back online.

If `none`, the command does not wait for the decommissioning process to complete; it returns to the client after starting the decommissioning process on all specified nodes that are online. Any specified nodes that are offline will automatically be marked as decommissioning; if they come back online, the cluster will recognize this status and will not rebalance data to the nodes.

**Default:** `all` `--self` | **Deprecated.** Instead, specify a node ID explicitly in addition to the `--host` flag. diff --git a/v23.1/cockroachdb-feature-availability.md b/v23.1/cockroachdb-feature-availability.md index e3485eb38b5..d6c31e57710 100644 --- a/v23.1/cockroachdb-feature-availability.md +++ b/v23.1/cockroachdb-feature-availability.md @@ -14,7 +14,7 @@ This page outlines _feature availability_, which is separate from Cockroach Labs ## Feature availability phases -Phase | Definition | Accessibility +Phase | Definition | Accessibility ----------------------------------------------+------------+------------- Private preview | Feature is not production-ready and will not be publicly documented. | Invite-only [Limited access](#features-in-limited-access) | Feature is production-ready but not available widely because of known limitations and/or because capabilities may change or be added based on feedback. | Opt-in
Contact your Cockroach Labs account team. @@ -31,10 +31,6 @@ General availability (GA) | Feature is production-ready and {{ site.data.products.dedicated }} users can use the [Cloud API](../cockroachcloud/cloud-api.html) to configure [log export](../cockroachcloud/export-logs.html) to [AWS CloudWatch](https://aws.amazon.com/cloudwatch/) or [GCP Cloud Logging](https://cloud.google.com/logging). Once the export is configured, logs will flow from all nodes in all regions of your {{ site.data.products.dedicated }} cluster to your chosen cloud log sink. You can configure log export to redact sensitive log entries, limit log output by severity, and send log entries to specific log group targets by log channel, among others. -### Egress perimeter controls for {{ site.data.products.dedicated }} - -[Egress Perimeter Controls](../cockroachcloud/egress-perimeter-controls.html) can enhance the security of {{ site.data.products.dedicated }} clusters by enabling cluster administrators to restrict egress to a list of specified external destinations. This adds a strong layer of protection against malicious or accidental data exfiltration. - ### Export Cloud Organization audit logs (Cloud API) {{ site.data.products.db }} captures audit logs when many types of events occur, such as when a cluster is created or when a user is added to or removed from an organization. Any user in an organization with an admin-level service account can [export these audit logs](../cockroachcloud/cloud-org-audit-logs.html) using the [`auditlogevents` endpoint](../cockroachcloud/cloud-api.html#cloud-audit-logs) of the [Cloud API](../cockroachcloud/cloud-api.html). @@ -136,7 +132,7 @@ CockroachDB supports [altering the column types](alter-table.html#alter-column-d [Temporary tables](temporary-tables.html), [temporary views](views.html#temporary-views), and [temporary sequences](create-sequence.html#temporary-sequences) are in preview in CockroachDB. If you create too many temporary objects in a session, the performance of DDL operations will degrade. Performance limitations could persist long after creating the temporary objects. For more details, see [cockroachdb/cockroach#46260](https://github.com/cockroachdb/cockroach/issues/46260). -To enable temporary objects, set the `experimental_enable_temp_tables` [session variable](show-vars.html) to `on`. +To enable temporary objects, set the `experimental_enable_temp_tables` [session variable](show-vars.html) to `on`. ### Password authentication without TLS diff --git a/v23.1/common-errors.md b/v23.1/common-errors.md index cbbd8b875aa..b22d6d1415f 100644 --- a/v23.1/common-errors.md +++ b/v23.1/common-errors.md @@ -49,9 +49,9 @@ To resolve this issue, use the [`cockroach cert create-client`](cockroach-cert.h ## restart transaction -Messages with the error code `40001` and the string `restart transaction` indicate that a transaction failed because it conflicted with another concurrent or recent transaction accessing the same data. The transaction needs to be retried by the client. For more information about how to implement client-side retries, see [client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling). +Messages with the error code `40001` and the string `restart transaction` are known as [*transaction retry errors*](transaction-retry-error-reference.html). These indicate that a transaction failed due to [contention](performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention) with another concurrent or recent transaction attempting to write to the same data. The transaction needs to be retried by the client. -For more information about the different types of transaction retry errors such as "retry write too old", "read within uncertainty interval", etc., see the [Transaction Retry Error Reference](transaction-retry-error-reference.html#transaction-retry-error-reference). +{% include {{ page.version.version }}/performance/transaction-retry-error-actions.md %} ## node belongs to cluster \ but is attempting to connect to a gossip network for cluster \ diff --git a/v23.1/computed-columns.md b/v23.1/computed-columns.md index 7279544ebaa..22a3fee3150 100644 --- a/v23.1/computed-columns.md +++ b/v23.1/computed-columns.md @@ -64,7 +64,7 @@ Parameter | Description `STORED` | _(Required for stored computed columns)_ The computed column is stored alongside other columns. `VIRTUAL`| _(Required for virtual columns)_ The computed column is virtual, meaning the column data is not stored in the table's primary index. -For compatibility with PostgreSQL, CockroachDB also supports creating store computed columns with the syntax `column_name GENERATED ALWAYS AS () STORED`. +For compatibility with PostgreSQL, CockroachDB also supports creating stored computed columns with the syntax `column_name GENERATED ALWAYS AS () STORED`. ## Examples diff --git a/v23.1/crdb-internal.md b/v23.1/crdb-internal.md index 24087f4e8c5..d2cb8a69d9c 100644 --- a/v23.1/crdb-internal.md +++ b/v23.1/crdb-internal.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.sql --- -The `crdb_internal` [system catalog](system-catalogs.html) is a schema that contains information about internal objects, processes, and metrics related to a specific database. `crdb_internal` tables are read-only. +The `crdb_internal` [system catalog](system-catalogs.html) is a [schema](schema-design-overview.html#schemas) that contains information about internal objects, processes, and metrics related to a specific database. `crdb_internal` tables are read-only. @@ -74,8 +74,8 @@ Table name | Description| Use in production `node_txn_stats` | Contains transaction statistics for nodes in your cluster.| ✗ `partitions` | Contains information about [partitions](partitioning.html) in your cluster.| ✗ `predefined_comments` | Contains predefined comments about your cluster.| ✗ -`ranges` | Contains information about ranges in your cluster.| ✗ -`ranges_no_leases` | Contains information about ranges in your cluster, without leases.| ✗ +`ranges` | Contains information about [ranges](architecture/overview.html#architecture-range) in your cluster.| ✗ +`ranges_no_leases` | Contains information about [ranges](architecture/overview.html#architecture-range) in your cluster, without [leases](architecture/replication-layer.html#leases).| ✗ `regions` | Contains information about [cluster regions](multiregion-overview.html#cluster-regions).| ✗ `schema_changes` | Contains information about schema changes in your cluster.| ✗ `session_trace` | Contains session trace information for your cluster.| ✗ @@ -273,7 +273,7 @@ SELECT * FROM crdb_internal.cluster_contention_events; To view the [tables](create-table.html) and [indexes](indexes.html) with the most cumulative time under [contention](performance-best-practices-overview.html#transaction-contention) since the last server restart, run the query below. {{site.data.alerts.callout_info}} -The default tracing behavior captures a small percent of transactions so not all contention events will be recorded. When investigating transaction contention, you can set the `sql.trace.txn.enable_threshold` [cluster setting](cluster-settings.html#setting-sql-trace-txn-enable-threshold) to always capture contention events. +{% include {{ page.version.version }}/performance/sql-trace-txn-enable-threshold.md %} {{site.data.alerts.end}} {% include_cached copy-clipboard.html %} @@ -1161,7 +1161,7 @@ Column | Type | Description `contention_duration` | `INTERVAL NOT NULL` | The interval of time the waiting transaction spent waiting for the blocking transaction. `contending_key` | `BYTES NOT NULL` | The key on which the transactions contended. -#### Example +#### Transaction contention - example The following example shows how to join the `transaction_contention_events` table with `transaction_statistics` and `statement_statistics` tables to extract blocking and waiting transaction information. diff --git a/v23.1/create-changefeed.md b/v23.1/create-changefeed.md index f8f8a4d2115..2bf0f19cd80 100644 --- a/v23.1/create-changefeed.md +++ b/v23.1/create-changefeed.md @@ -156,17 +156,19 @@ Option | Value | Description `diff` | N/A | Publish a `before` field with each message, which includes the value of the row before the update was applied. `end_time` | [Timestamp](as-of-system-time.html#parameters) | Indicate the timestamp up to which the changefeed will emit all events and then complete with a `successful` status. Provide a future timestamp to `end_time` in number of nanoseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time). For example, `end_time="1655402400000000000"`. You cannot use `end_time` and [`initial_scan = 'only'`](#initial-scan) simultaneously. `envelope` | `key_only` / `row`* / `wrapped` | `key_only` emits only the key and no value, which is faster if you only want to know when the key changes.

`row` emits the row without any additional metadata fields in the message. *You can only use `row` with Kafka sinks or sinkless changefeeds. `row` does not support [`avro` format](#format).

`wrapped` emits the full message including any metadata fields. See [Responses](changefeed-messages.html#responses) for more detail on message format.

Default: `envelope=wrapped` +New in v23.1: `execution_locality` | Key-value pairs | Restricts the execution of a changefeed to nodes that match the defined locality filter requirements, e.g., `WITH execution_locality = 'region=us-west-1a,cloud=aws'`.

See [Run a changefeed job by locality](changefeeds-in-multi-region-deployments.html#run-a-changefeed-job-by-locality) for usage and reference detail. `format` | `json` / `avro` / `csv`* | Format of the emitted record. For mappings of CockroachDB types to Avro types, [see the table](changefeed-messages.html#avro-types) and detail on [Avro limitations](changefeed-messages.html#avro-limitations).

*`format=csv` works only in combination with [`initial_scan = 'only'`](#initial-scan). You cannot combine `format=csv` with the [`diff`](#diff-opt) or [`resolved`](#resolved-option) options. Changefeeds use the same CSV format as the [`EXPORT`](export.html) statement. See [Export data with changefeeds](export-data-with-changefeeds.html) for details using these options to create a changefeed as an alternative to `EXPORT`.

Default: `format=json`. `full_table_name` | N/A | Use fully qualified table name in topics, subjects, schemas, and record output instead of the default table name. This can prevent unintended behavior when the same table name is present in multiple databases.

**Note:** This option cannot modify existing table names used as topics, subjects, etc., as part of an [`ALTER CHANGEFEED`](alter-changefeed.html) statement. To modify a topic, subject, etc., to use a fully qualified table name, create a new changefeed with this option.

Example: `CREATE CHANGEFEED FOR foo... WITH full_table_name` will create the topic name `defaultdb.public.foo` instead of `foo`. +New in v23.1: `gc_protect_expires_after` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Automatically expires protected timestamp records that are older than the defined duration. In the case where a changefeed job remains paused, `gc_protect_expires_after` will trigger the underlying protected timestamp record to expire and cancel the changefeed job to prevent accumulation of protected data. Use with [`protect_data_from_gc_on_pause`](#protect-pause) to limit the amount of time a changefeed job will remain paused protecting change data.

See [Garbage collection and changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds) for more detail on protecting changefeed data. `initial_scan` | `yes`/`no`/`only` | Control whether or not an initial scan will occur at the start time of a changefeed. Only one `initial_scan` option (`yes`, `no`, or `only`) can be used. If none of these are set, an initial scan will occur if there is no [`cursor`](#cursor-option), and will not occur if there is one. This preserves the behavior from previous releases. With `initial_scan = 'only'` set, the changefeed job will end with a successful status (`succeeded`) after the initial scan completes. You cannot specify `yes`, `no`, `only` simultaneously.

If used in conjunction with `cursor`, an initial scan will be performed at the cursor timestamp. If no `cursor` is specified, the initial scan is performed at `now()`.

Although the [`initial_scan` / `no_initial_scan`](../v21.2/create-changefeed.html#initial-scan) syntax from previous versions is still supported, you cannot combine the previous and current syntax.

Default: `initial_scan = 'yes'` `kafka_sink_config` | [`STRING`](string.html) | Set fields to configure the required level of message acknowledgement from the Kafka server, the version of the server, and batching parameters for Kafka sinks. Set the message file compression type. See [Kafka sink configuration](changefeed-sinks.html#kafka-sink-configuration) for more detail on configuring all the available fields for this option.

Example: `CREATE CHANGEFEED FOR table INTO 'kafka://localhost:9092' WITH kafka_sink_config='{"Flush": {"MaxMessages": 1, "Frequency": "1s"}, "RequiredAcks": "ONE"}'` New in v23.1: `key_column` | `'column'` | Overrides the key used in [message metadata](changefeed-messages.html). This changes the key hashed to determine downstream partitions. In sinks that support partitioning by message, CockroachDB uses the [32-bit FNV-1a](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) hashing algorithm to determine which partition to send to.

**Note:** `key_column` does not preserve ordering of messages from CockroachDB to the downstream sink, therefore you must also include the [`unordered`](#unordered) option in your changefeed creation statement. It does not affect per-key [ordering guarantees](changefeed-messages.html#ordering-guarantees) or the output of [`key_in_value`](#key-in-value).

See the [Define a key to determine the changefeed sink partition](#define-a-key-to-determine-the-changefeed-sink-partition) example. `key_in_value` | N/A | Make the [primary key](primary-key.html) of a deleted row recoverable in sinks where each message has a value but not a key (most have a key and value in each message). `key_in_value` is automatically used for [cloud storage sinks](changefeed-sinks.html#cloud-storage-sink), [webhook sinks](changefeed-sinks.html#webhook-sink), and [GC Pub/Sub sinks](changefeed-sinks.html#google-cloud-pub-sub). {% include {{ page.version.version }}/cdc/cloud-storage-external-connection.md %} -`metrics_label` | [`STRING`](string.html) | This is an **experimental** feature. Define a metrics label to which the metrics for one or multiple changefeeds increment. All changefeeds also have their metrics aggregated.

The maximum length of a label is 128 bytes. There is a limit of 1024 unique labels.

`WITH metrics_label=label_name`

For more detail on usage and considerations, see [Using changefeed metrics labels](monitor-and-debug-changefeeds.html#using-changefeed-metrics-labels). +`metrics_label` | [`STRING`](string.html) | Define a metrics label to which the metrics for one or multiple changefeeds increment. All changefeeds also have their metrics aggregated.

The maximum length of a label is 128 bytes. There is a limit of 1024 unique labels.

`WITH metrics_label=label_name`

For more detail on usage and considerations, see [Using changefeed metrics labels](monitor-and-debug-changefeeds.html#using-changefeed-metrics-labels). `min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often nodes flush their progress to the [coordinating changefeed node](change-data-capture-overview.html#how-does-an-enterprise-changefeed-work). Changefeeds will wait for at least the specified duration before a flush to the sink. This can help you control the flush frequency of higher latency sinks to achieve better throughput. If this is set to `0s`, a node will flush as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit duplicate messages during this time.

**Note:** [`resolved`](#resolved-option) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). Since `min_checkpoint_frequency` defaults to `30s`, you **must** configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency if you require `resolved` messages more frequently than `30s`.

**Default:** `30s` `mvcc_timestamp` | N/A | Include the [MVCC](architecture/storage-layer.html#mvcc) timestamp for each emitted row in a changefeed. With the `mvcc_timestamp` option, each emitted row will always contain its MVCC timestamp, even during the changefeed's initial backfill. `on_error` | `pause` / `fail` | Use `on_error=pause` to pause the changefeed when encountering **non**-retryable errors. `on_error=pause` will pause the changefeed instead of sending it into a terminal failure state. **Note:** Retryable errors will continue to be retried with this option specified.

Use with [`protect_data_from_gc_on_pause`](#protect-pause) to protect changes from [garbage collection](configure-replication-zones.html#gc-ttlseconds).

Default: `on_error=fail` -`protect_data_from_gc_on_pause` | N/A | When a [changefeed is paused](pause-job.html), ensure that the data needed to [resume the changefeed](resume-job.html) is not garbage collected. If `protect_data_from_gc_on_pause` is **unset**, pausing the changefeed will release the existing protected timestamp records. It is also important to note that pausing and adding `protect_data_from_gc_on_pause` to a changefeed will not protect data if the [garbage collection](configure-replication-zones.html#gc-ttlseconds) window has already passed.

Use with [`on-error=pause`](#on-error) to protect changes from garbage collection when encountering non-retryable errors.

See [Garbage collection and changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds) for more detail on protecting changefeed data.

**Note:** If you use this option, changefeeds that are left paused for long periods of time can prevent garbage collection. +`protect_data_from_gc_on_pause` | N/A | When a [changefeed is paused](pause-job.html), ensure that the data needed to [resume the changefeed](resume-job.html) is not garbage collected. If `protect_data_from_gc_on_pause` is **unset**, pausing the changefeed will release the existing protected timestamp records. It is also important to note that pausing and adding `protect_data_from_gc_on_pause` to a changefeed will not protect data if the [garbage collection](configure-replication-zones.html#gc-ttlseconds) window has already passed.

Use with [`on-error=pause`](#on-error) to protect changes from garbage collection when encountering non-retryable errors.

See [Garbage collection and changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds) for more detail on protecting changefeed data.

**Note:** If you use this option, changefeeds that are left paused for long periods of time can prevent garbage collection. Use with the [`gc_protect_expires_after`](#gc-protect-expire) option to set a limit for protected data and for how long a changefeed will remain paused. `resolved` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Emits [resolved timestamp](changefeed-messages.html#resolved-def) events per changefeed in a format dependent on the connected sink. Resolved timestamp events do not emit until all ranges in the changefeed have progressed to a specific point in time.

Set an optional minimal duration between emitting resolved timestamps. Example: `resolved='10s'`. This option will **only** emit a resolved timestamp event if the timestamp has advanced and at least the optional duration has elapsed. If unspecified, all resolved timestamps are emitted as the high-water mark advances.

**Note:** If you require `resolved` message frequency under `30s`, then you **must** set the [`min_checkpoint_frequency`](#min-checkpoint-frequency) option to at least the desired `resolved` frequency. This is because `resolved` messages will not be emitted more frequently than `min_checkpoint_frequency`, but may be emitted less frequently. `schema_change_events` | `default` / `column_changes` | The type of schema change event that triggers the behavior specified by the `schema_change_policy` option:
  • `default`: Include all [`ADD COLUMN`](alter-table.html#add-column) events for columns that have a non-`NULL` [`DEFAULT` value](default-value.html) or are [computed](computed-columns.html), and all [`DROP COLUMN`](alter-table.html#drop-column) events.
  • `column_changes`: Include all schema change events that add or remove any column.

Default: `schema_change_events=default` `schema_change_policy` | `backfill` / `nobackfill` / `stop` | The behavior to take when an event specified by the `schema_change_events` option occurs:
  • `backfill`: When [schema changes with column backfill](changefeed-messages.html#schema-changes-with-column-backfill) are finished, output all watched rows using the new schema.
  • `nobackfill`: For [schema changes with column backfill](changefeed-messages.html#schema-changes-with-column-backfill), perform no logical backfills. The changefeed will still emit any duplicate records for the table being altered, but will not emit the new schema records.
  • `stop`: [schema changes with column backfill](changefeed-messages.html#schema-changes-with-column-backfill), wait for all data preceding the schema change to be resolved before exiting with an error indicating the timestamp at which the schema change occurred. An `error: schema change occurred at ` will display in the `cockroach.log` file.

Default: `schema_change_policy=backfill` diff --git a/v23.1/create-schedule-for-backup.md b/v23.1/create-schedule-for-backup.md index efeb81642ab..a5efdece8ea 100644 --- a/v23.1/create-schedule-for-backup.md +++ b/v23.1/create-schedule-for-backup.md @@ -94,7 +94,10 @@ The data being backed up will not be eligible for garbage collection until a suc You can also use the `exclude_data_from_backup` option with a scheduled backup as a way to prevent protected timestamps from prolonging garbage collection on a table. See the example [Exclude a table's data from backups](take-full-and-incremental-backups.html#exclude-a-tables-data-from-backups) for usage information. -We recommend monitoring for your backup schedule to alert for failed backups. See [Set up monitoring for the backup schedule](manage-a-backup-schedule.html#set-up-monitoring-for-the-backup-schedule) for more detail. +We recommend monitoring for your backup schedule to alert for failed backups: + +- See the [Backup and Restore Monitoring](backup-and-restore-monitoring.html) page for a general overview and list of metrics available for backup, scheduled backup, and restore jobs. +- See [Set up monitoring for the backup schedule](manage-a-backup-schedule.html#set-up-monitoring-for-the-backup-schedule) for metrics and monitoring backup schedules specifically. ## View and control backup schedules diff --git a/v23.1/data-types.md b/v23.1/data-types.md index dad01cb4c20..516af5987bd 100644 --- a/v23.1/data-types.md +++ b/v23.1/data-types.md @@ -30,6 +30,8 @@ Type | Description | Example [`STRING`](string.html) | A string of Unicode characters. | `'a1b2c3'` [`TIME`
`TIMETZ`](time.html) | `TIME` stores a time of day in UTC.
`TIMETZ` converts `TIME` values with a specified time zone offset from UTC. | `TIME '01:23:45.123456'`
`TIMETZ '01:23:45.123456-5:00'` [`TIMESTAMP`
`TIMESTAMPTZ`](timestamp.html) | `TIMESTAMP` stores a date and time pairing in UTC.
`TIMESTAMPTZ` converts `TIMESTAMP` values with a specified time zone offset from UTC. | `TIMESTAMP '2016-01-25 10:10:10'`
`TIMESTAMPTZ '2016-01-25 10:10:10-05:00'` +[`TSQUERY`](tsquery.html) | New in v23.1: A list of lexemes and operators used in [full-text search](full-text-search.html). | `'list' & 'lexem' & 'oper' & 'use' & 'full' & 'text' & 'search'` +[`TSVECTOR`](tsvector.html) | New in v23.1: A list of lexemes with optional integer positions and weights used in [full-text search](full-text-search.html). | `'full':13 'integ':7 'lexem':4 'list':2 'option':6 'posit':8 'search':15 'text':14 'use':11 'weight':10` [`UUID`](uuid.html) | A 128-bit hexadecimal value. | `7f9c24e8-3b12-4fef-91e0-56a2d5a246ec` ## Data type conversions and casts diff --git a/v23.1/error-handling-and-troubleshooting.md b/v23.1/error-handling-and-troubleshooting.md index 27bd42b58c8..7c7dddf22e0 100644 --- a/v23.1/error-handling-and-troubleshooting.md +++ b/v23.1/error-handling-and-troubleshooting.md @@ -23,21 +23,9 @@ Take a look at [Troubleshoot SQL Behavior](query-behavior-troubleshooting.html). ## Transaction retry errors -Messages with [the PostgreSQL error code `40001` and the string `restart transaction`](common-errors.html#restart-transaction) indicate that a transaction failed because it [conflicted with another concurrent or recent transaction accessing the same data](performance-best-practices-overview.html#transaction-contention). The transaction needs to be retried by the client. +Messages with the error code `40001` and the string `restart transaction` are known as [*transaction retry errors*](transaction-retry-error-reference.html). These indicate that a transaction failed due to [contention](performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention) with another concurrent or recent transaction attempting to write to the same data. The transaction needs to be retried by the client. -If your language's client driver or ORM implements transaction retry logic internally (e.g., if you are using Python and [SQLAlchemy with the CockroachDB dialect](build-a-python-app-with-cockroachdb-sqlalchemy.html)), then you do not need to handle this logic from your application. - -If your driver or ORM does not implement this logic, then you will need to implement a retry loop in your application. - -{% include {{page.version.version}}/misc/client-side-intervention-example.md %} - -{{site.data.alerts.callout_info}} -If a consistently high percentage of your transactions are resulting in [transaction retry errors with the error code `40001` and the string `restart transaction`](common-errors.html#restart-transaction), then you may need to evaluate your [schema design](schema-design-overview.html) and data access patterns to find and remove sources of contention. For more information about contention, see [Transaction Contention](performance-best-practices-overview.html#transaction-contention). - -For more information about what is causing a specific transaction retry error code, see the [Transaction Retry Error Reference](transaction-retry-error-reference.html#transaction-retry-error-reference). -{{site.data.alerts.end}} - -For more information about transaction retry errors, see [Client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling). +{% include {{ page.version.version }}/performance/transaction-retry-error-actions.md %} ## Unsupported SQL features diff --git a/v23.1/follower-reads.md b/v23.1/follower-reads.md index 853454e3285..5f30ad37288 100644 --- a/v23.1/follower-reads.md +++ b/v23.1/follower-reads.md @@ -98,7 +98,7 @@ A _bounded staleness read_ is a historical read that uses a dynamic, system-dete Use bounded staleness follower reads when you: -- Need minimally stale reads from the nearest replica without blocking on [conflicting transactions](transactions.html#transaction-contention). This is possible because the historical timestamp is chosen dynamically and the least stale timestamp that can be served locally without blocking is used. +- Need minimally stale reads from the nearest replica without blocking on [conflicting transactions](performance-best-practices-overview.html#transaction-contention). This is possible because the historical timestamp is chosen dynamically and the least stale timestamp that can be served locally without blocking is used. - Can confine the read to a single statement that meets the [bounded staleness limitations](#bounded-staleness-read-limitations). - Need higher availability than is provided by [exact staleness reads](#exact-staleness-reads). Specifically, what we mean by availability in this context is: - The ability to serve a read with low latency from a local replica rather than a leaseholder. diff --git a/v23.1/full-text-search.md b/v23.1/full-text-search.md new file mode 100644 index 00000000000..0bfff629dd3 --- /dev/null +++ b/v23.1/full-text-search.md @@ -0,0 +1,472 @@ +--- +title: Full-Text Search +summary: Full-text searches using TSVECTOR and TSQUERY enable natural-language searches on documents with ranked results. +toc: true +docs_area: develop +--- + +{% include_cached new-in.html version="v23.1" %} A full-text search is used to perform natural-language searches on documents such as articles, websites, or other written formats. + +This page describes how to perform full-text searches using the provided [built-in functions](functions-and-operators.html#full-text-search-functions). + +{{site.data.alerts.callout_info}} +Some PostgreSQL syntax and features are unsupported. For details, see [Unsupported features](#unsupported-features). +{{site.data.alerts.end}} + +## How does full-text search work? + +In the PostgreSQL terminology, a *document* is a natural-language text [converted to a data type](#process-a-document) that is searchable using [specially formatted queries](#form-a-query). A document is typically stored within a single database row or concatenated from multiple fields. + +A full-text search has the following advantages over pattern matching with `LIKE` and `ILIKE`: + +- A full-text search can specify a [text search configuration](#text-search-configuration) that enables language-specific searches. +- The results of a full-text search can be [ranked](#rank-search-results). +- A full-text search can be accelerated using a [full-text index](#full-text-indexes). +- `LIKE` and `ILIKE` are only fast for prefix searches or when indexed with a [trigram index](trigram-indexes.html). + +{{site.data.alerts.callout_success}} +{% include {{ page.version.version }}/sql/use-case-trigram-indexes.md %} +{{site.data.alerts.end}} + +### Process a document + +To make a document searchable, convert it to the [`TSVECTOR`](tsvector.html) data type. A `TSVECTOR` value consists of individual *lexemes*, which are normalized strings used for text matching. Each lexeme also includes a list of integer positions that indicate where the lexeme existed in the original document. + +The `to_tsvector()` [built-in function](functions-and-operators.html#full-text-search-functions) converts a string input into a `TSVECTOR` value: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsvector('How do trees get on the internet?'); +~~~ + +~~~ + to_tsvector +--------------------------------- + 'get':4 'internet':7 'tree':3 +~~~ + +This `TSVECTOR` consists of the lexemes `get`, `internet`, and `tree`. Normalization removes the following from the input: + +- Derivatives of words, which are reduced using a [stemming](https://en.wikipedia.org/wiki/Stemming) algorithm. In this example, "trees" is normalized to `tree`. +- *Stop words*. These are words that are considered not useful for indexing and searching, based on the [text search configuration](#text-search-configuration). This example does not specify a configuration, and `english` is used by default. "How", "do", "on", and "the" are identified as stop words. +- Punctuation and capitalization. + +In the preceding output, the integers indicate that `get` is in the fourth position, `internet` is in the seventh position, and `tree` is in the third position in the input. + +### Form a query + +A full-text search attempts to match a *query* to a document. A full-text search query has the [`TSQUERY`](tsquery.html) data type. Like `TSVECTOR`, a `TSQUERY` value consists of individual *lexemes*, which are normalized strings used for text matching. Lexemes in a `TSQUERY` are separated with any combination of `&` (AND), `|` (OR), `<->` (FOLLOWED BY), or `!` (NOT) operators. + +- The `to_tsquery()` [built-in function](functions-and-operators.html#full-text-search-functions) normalizes a `TSQUERY` input. The input must also be formatted as a `TSQUERY`, or the statement will error. + + {% include_cached copy-clipboard.html %} + ~~~ sql + SELECT to_tsquery('How & do & trees & get & on & the & internet?'); + ~~~ + + ~~~ + to_tsquery + ------------------------------- + 'tree' & 'get' & 'internet' + ~~~ + +- The `plainto_tsquery()` [built-in function](functions-and-operators.html#full-text-search-functions) converts a string input into a `TSQUERY` value, and separates the lexemes with `&` (AND): + + {% include_cached copy-clipboard.html %} + ~~~ sql + SELECT plainto_tsquery('How do trees get on the internet?'); + ~~~ + + ~~~ + plainto_tsquery + ------------------------------- + 'tree' & 'get' & 'internet' + ~~~ + +- The `phraseto_tsquery()` [built-in function](functions-and-operators.html#full-text-search-functions) converts a string input into a `TSQUERY` value, and separates the lexemes with `<->` (FOLLOWED BY): + + ~~~ sql + SELECT phraseto_tsquery('How do trees get on the internet?'); + ~~~ + + ~~~ + phraseto_tsquery + ----------------------------------- + 'tree' <-> 'get' <3> 'internet' + ~~~ + + In the preceding output, `<->` (equivalent to `<1>`) indicates that `get` must follow `tree` in a matching `TSVECTOR`. `<3>` further indicates that `get` and `internet` must be separated by **two** lexemes in a matching `TSVECTOR`. This resulted from converting the stop words "on" and "the" in the input. + + To match this query, a document must therefore contain phrases such as "get tree" and "get {word} {word} internet". + +Queries and documents are matched using the [`@@` comparison operator](#comparisons). For usage examples, see [Match queries to documents](#match-queries-to-documents). + +### Rank search results + +You can rank the results of a full-text search. + +The `ts_rank()` [built-in function](functions-and-operators.html#full-text-search-functions) outputs a search rank based on the frequency of matching lexemes. In the following example, two lexemes match: + +~~~ sql +SELECT ts_rank(to_tsvector('How do trees get on the internet?'), plainto_tsquery('how to get internet')); +~~~ + +{% include_cached copy-clipboard.html %} +~~~ + ts_rank +-------------- + 0.09735848 +~~~ + +In this example, three lexemes match, resulting in a higher rank: + +~~~ sql +SELECT ts_rank(to_tsvector('How do trees get on the internet?'), plainto_tsquery('wow, do trees get internet?')); +~~~ + +{% include_cached copy-clipboard.html %} +~~~ + ts_rank +-------------- + 0.26426345 +~~~ + +{{site.data.alerts.callout_info}} +Because a rank must be calculated for each matching document, ranking a full-text search can incur a performance overhead if there are many matching documents. +{{site.data.alerts.end}} + +For more information about using `ts_rank()`, see the [PostgreSQL documentation](https://www.postgresql.org/docs/15/textsearch-controls.html#TEXTSEARCH-RANKING). + +## Comparisons + +Full-text searches support the following comparison operator: + +- **matching**: [`@@`](functions-and-operators.html#operators). This operator is set between a `TSQUERY` and `TSVECTOR`, and returns `true` if the lexemes match. The `TSQUERY` and `TSVECTOR` can be specified in any order. + +For usage examples, see [Match queries to documents](#match-queries-to-documents). + +## Full-text indexes + +{{site.data.alerts.callout_info}} +You can perform full-text searches without a full-text index. However, an index will drastically improve search performance when searching a large number of documents. +{{site.data.alerts.end}} + +To create a full-text index, use the [`CREATE INDEX`](create-index.html) syntax that defines an [inverted index](inverted-indexes.html), specifying a `TSVECTOR` column. + +- Using the PostgreSQL-compatible syntax: + + ~~~ sql + CREATE INDEX {optional name} ON {table} USING GIN ({column}); + ~~~ + + {{site.data.alerts.callout_info}} + GIN and GiST indexes are implemented identically on CockroachDB. `GIN` and `GIST` are therefore synonymous when defining a full-text index. + {{site.data.alerts.end}} + +- Using `CREATE INVERTED INDEX`: + + ~~~ sql + CREATE INVERTED INDEX {optional name} ON {table} ({column}); + ~~~ + +For more ways to define full-text indexes, see [Create a full-text index with an expression](#create-a-full-text-index-with-an-expression) and [Create a full-text index with a stored computed column](#create-a-full-text-index-with-a-stored-computed-column). + +## Text search configuration + +A *text search configuration* determines how inputs are parsed into `TSVECTOR` and `TSQUERY` values. This includes a dictionary that is used to identify derivatives of words, as well as stop words to exclude when normalizing [documents](#process-a-document) and [queries](#form-a-query). + +The supported dictionaries are English, Danish, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish, and Turkish. An additional `simple` dictionary does not perform stemming or stopwording when normalizing [documents](#process-a-document) or [queries](#form-a-query). + +You can specify a text search configuration as the first parameter when calling any of the [built-in functions](functions-and-operators.html#full-text-search-functions) to [process a document](#process-a-document) or [form a query](#form-a-query). For example: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsvector('swedish', 'Hur får träd tillgång till internet?'); +~~~ + +~~~ + to_tsvector +---------------------------------------------- + 'får':2 'internet':6 'tillgång':4 'träd':3 +~~~ + +If you do not specify a configuration when calling the function, the value of the [`default_text_search_config`](set-vars.html#default-text-search-config) session variable is used. This defaults to `english` and can be changed as follows: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET default_text_search_config = swedish; +~~~ + +For more information about text search configurations, see the [PostgreSQL documentation](https://www.postgresql.org/docs/current/textsearch-intro.html#TEXTSEARCH-INTRO-CONFIGURATIONS). + +{{site.data.alerts.callout_info}} +At this time, only the dictionary can be specified in a text search configuration. See [Unsupported features](#unsupported-features). +{{site.data.alerts.end}} + +## Examples + +### Match queries to documents + +Use the `@@` operator to match a query (`TSQUERY`) to a searchable document (`TSVECTOR`). In the following example, because the `TSQUERY` comprises the lexemes `get` and `internet`, which are both contained in the `TSVECTOR`, the output will be `true`: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsvector('How do trees get on the internet?') @@ to_tsquery('How & to & get & internet?'); +~~~ + +~~~ + ?column? +------------ + t +~~~ + +Use the `plainto_tsquery()` [built-in function](functions-and-operators.html#full-text-search-functions) to match text to a searchable document. This search is equivalent to the preceding example: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsvector('How do trees get on the internet?') @@ plainto_tsquery('How to get internet?'); +~~~ + +~~~ + ?column? +------------ + t +~~~ + +Use the `phraseto_tsquery()` [built-in function](functions-and-operators.html#full-text-search-functions) to match text phrases to a searchable document. Because `phraseto_tsquery()` separates the lexemes `get` and `internet` with the `<->` (FOLLOWED BY) operator, and the document does not contain a phrase like "get internet", the output will be `false`: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsvector('How do trees get on the internet?') @@ phraseto_tsquery('How to get internet?'); +~~~ + +~~~ + ?column? +------------ + f +~~~ + +For an example of how text matching is used on a table, see [Perform a full-text search with ranked results](#perform-a-full-text-search-with-ranked-results). + +### Create a full-text index with an expression + +You can create an [expression index](expression-indexes.html) on a `STRING` column, using [`to_tsvector()`](#process-a-document) to convert the value to `TSVECTOR`. + +Given the table: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE TABLE t (a STRING); +~~~ + +Create an expression index that converts column `a` to `TSVECTOR`: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE INDEX ON t USING GIN (to_tsvector('english', a)); +~~~ + +{{site.data.alerts.callout_info}} +When using a [full-text search function](functions-and-operators.html#full-text-search-functions) in an expression index, you **must** specify a [text search configuration](#text-search-configuration). In the preceding example, the `english` configuration is specified. +{{site.data.alerts.end}} + +### Create a full-text index with a stored computed column + +You can create a full-text index on a [stored computed column](computed-columns.html) that has a `TSVECTOR` data type. + +Given the table: + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE TABLE t (a STRING); +~~~ + +Add a new `TSVECTOR` column that is computed from `a` using [`to_tsvector()`](#process-a-document): + +{% include_cached copy-clipboard.html %} +~~~ sql +ALTER TABLE t ADD COLUMN b TSVECTOR + AS (to_tsvector('english', a)) STORED; +~~~ + +{{site.data.alerts.callout_info}} +When using a [full-text search function](functions-and-operators.html#full-text-search-functions) in a stored generated column, you **must** specify a [text search configuration](#text-search-configuration). In the preceding example, the `english` configuration is specified. +{{site.data.alerts.end}} + +View the columns on the table: + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW COLUMNS FROM t; +~~~ + +~~~ + column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden +--------------+-----------+-------------+----------------+---------------------------+---------------------+------------ + a | STRING | t | NULL | | {t_pkey} | f + rowid | INT8 | f | unique_rowid() | | {t_expr_idx,t_pkey} | t + b | TSVECTOR | t | NULL | to_tsvector('english', a) | {t_pkey} | f +(3 rows) +~~~ + +Create an inverted index on the `TSVECTOR` column: + +~~~ sql +CREATE INDEX ON t USING GIN (b); +~~~ + +### Perform a full-text search with ranked results + +1. Create a table with `STRING` columns: + + {% include_cached copy-clipboard.html %} + ~~~ sql + CREATE TABLE dadjokes (opener STRING, response STRING); + ~~~ + +1. Populate the table with sample values. These are the documents that you will search: + + {% include_cached copy-clipboard.html %} + ~~~ sql + INSERT INTO dadjokes (opener, response) VALUES + ('How do trees get on the internet?', 'They log on.'), + ('What do you call a pony with a sore throat?', 'A little horse.'), + ('What would a bathroom for fancy cats be called?', 'The glitter box.'), + ('Why did the scarecrow win an award?', 'It was outstanding in its field.'), + ('What kind of tree fits in your hand?', 'A palm tree.'), + ('What was a better invention than the first telephone?', 'The second one.'), + ('Where do you learn to make banana splits?', 'At sundae school.'), + ('How did the hipster burn the roof of his mouth?', 'He ate the pizza before it was cool.'), + ('What did one wall say to the other wall?', 'Meet you at the corner.'), + ('When does a joke become a dad joke?', 'When it becomes apparent.'); + ~~~ + +1. You can use `LIKE` or `ILIKE` to search for text. However, the results will be unranked: + + {% include_cached copy-clipboard.html %} + ~~~ sql + SELECT opener, response + FROM dadjokes + WHERE opener LIKE '%tree%' OR response LIKE '%tree%'; + ~~~ + + ~~~ + opener | response + ---------------------------------------+--------------- + How do trees get on the internet? | They log on. + What kind of tree fits in your hand? | A palm tree. + (2 rows) + ~~~ + +1. Create a full-text index on the concatenation of both table columns, specifying a [text search configuration](#text-search-configuration) (in this case, `english`), as is mandatory when [defining an expression index](#create-a-full-text-index-with-an-expression): + + ~~~ sql + CREATE INDEX ON dadjokes USING GIN (to_tsvector('english', opener || response)); + ~~~ + + {{site.data.alerts.callout_info}} + Because inverted joins on `TSVECTOR` values are not yet supported, this index won't be used to accelerate the SQL queries in this example. See [Unsupported features](#unsupported-features). + {{site.data.alerts.end}} + +1. Search the table for a query (in this case, `tree`), and rank the results. + + In the following statement, [`to_tsvector()`](#process-a-document) makes the table values searchable, [`to_tsquery()`](#form-a-query) forms the query, and [`ts_rank()`](#rank-search-results) calculates the search rankings: + + ~~~ sql + SELECT opener, response, ts_rank(joke, query) AS rank + FROM dadjokes, to_tsvector(opener || response) joke, to_tsquery('tree') query + WHERE query @@ joke + ORDER BY rank DESC + LIMIT 10; + ~~~ + + ~~~ + opener | response | rank + ---------------------------------------+--------------+-------------- + What kind of tree fits in your hand? | A palm tree. | 0.075990885 + How do trees get on the internet? | They log on. | 0.06079271 + (2 rows) + ~~~ + + The frequency of the `tree` lexeme in each row determines the difference in the rankings. + +1. Search the table for the query `calling`, and rank the results: + + ~~~ sql + SELECT opener, response, ts_rank(joke, query) AS rank + FROM dadjokes, to_tsvector(opener || response) joke, to_tsquery('calling') query + WHERE query @@ joke + ORDER BY rank DESC + LIMIT 10; + ~~~ + + ~~~ + opener | response | rank + --------------------------------------------------+------------------+------------- + What would a bathroom for fancy cats be called? | The glitter box. | 0.06079271 + What do you call a pony with a sore throat? | A little horse. | 0.06079271 + (2 rows) + ~~~ + + Unlike pattern matching with `LIKE` and `ILIKE`, a full-text search for `calling` produced matches. This is because [`to_tsvector()`](#process-a-document) and [`to_tsquery()`](#form-a-query) each normalized derivatives of the word "call" in their respective inputs to the lexeme `call`, using the default `english` [text search configuration](#text-search-configuration). + +1. Use [`plainto_tsquery()`](#form-a-query) to convert text input to a search query: + + ~~~ sql + SELECT opener, response, ts_rank(joke, query) AS rank + FROM dadjokes, to_tsvector(opener || response) joke, plainto_tsquery('no more joking, dad') query + WHERE query @@ joke + ORDER BY rank DESC + LIMIT 10; + ~~~ + + ~~~ + opener | response | rank + --------------------------------------+---------------------------+------------- + When does a joke become a dad joke? | When it becomes apparent. | 0.18681315 + (1 row) + ~~~ + +1. Alternatively, use [`phraseto_tsquery()`](#form-a-query) to search for matching text phrases (in this example, "joke dad"): + + ~~~ sql + SELECT opener, response, ts_rank(joke, query) AS rank + FROM dadjokes, to_tsvector(opener || response) joke, phraseto_tsquery('no more joking, dad') query + WHERE query @@ joke + ORDER BY rank DESC + LIMIT 10; + ~~~ + + ~~~ + opener | response | rank + ---------+----------+------- + (0 rows) + ~~~ + +## Unsupported features + +Some PostgreSQL syntax and features are unsupported. These include, but are not limited to: + +- Aspects of [text search configurations](#text-search-configuration) other than the specified dictionary. +- `websearch_to_tsquery()` built-in function. +- `tsquery_phrase()` built-in function. +- `ts_rank_cd()` built-in function. +- `setweight()` built-in function. +- Inverted joins on `TSVECTOR` values. +- `tsvector || tsvector` comparisons. +- `tsquery || tsquery` comparisons. +- `tsquery && tsquery` comparisons. +- `tsquery <-> tsquery` comparisons. +- `!! tsquery` comparisons. +- `tsquery @> tsquery` and `tsquery <@ tsquery` comparisons. + +For full details, see the [tracking issue](https://github.com/cockroachdb/cockroach/issues/41288). + +## See also + +- PostgreSQL documentation on [Full Text Search](https://www.postgresql.org/docs/current/textsearch.html) +- [`TSVECTOR`](tsvector.html) +- [`TSQUERY`](tsquery.html) +- [Inverted indexes](inverted-indexes.html) +- [Indexes](indexes.html) +- [SQL Statements](sql-statements.html) \ No newline at end of file diff --git a/v23.1/inverted-indexes.md b/v23.1/inverted-indexes.md index ec583035801..ccf8b5a4d8f 100644 --- a/v23.1/inverted-indexes.md +++ b/v23.1/inverted-indexes.md @@ -10,10 +10,11 @@ Generalized inverted indexes, or GIN indexes, store mappings from values within CockroachDB stores the contents of the following data types in GIN indexes: -- [JSONB](jsonb.html) -- [Arrays](array.html) +- [`JSONB`](jsonb.html) +- [`ARRAY`](array.html) - [Spatial data (`GEOMETRY` and `GEOGRAPHY` types)](spatial-indexes.html) -- [Strings (using trigram indexes)](trigram-indexes.html) +- [`TSVECTOR` (for full-text search)](tsvector.html) +- [`STRING` (using trigram indexes)](trigram-indexes.html) {{site.data.alerts.callout_success}}For a hands-on demonstration of using GIN indexes to improve query performance on a JSONB column, see the JSON tutorial.{{site.data.alerts.end}} @@ -60,7 +61,7 @@ This lets you search based on subcomponents. ### Creation -You can use GIN indexes to improve the performance of queries using `JSONB` or `ARRAY` columns. You can create them: +You can use GIN indexes to improve the performance of queries using [`JSONB`](jsonb.html), [`ARRAY`](array.html), [`TSVECTOR`](tsvector.html) columns (for [full-text searches](full-text-search.html)), or [`STRING`](string.html) (for [fuzzy searches using trigrams](trigram-indexes.html)). You can create them: - Using the PostgreSQL-compatible syntax [`CREATE INDEX ... USING GIN`](create-index.html): @@ -68,18 +69,28 @@ You can use GIN indexes to improve the performance of queries using `JSONB` or ` CREATE INDEX {optional name} ON {table} USING GIN ({column}); ~~~ - You can also specify the `jsonb_ops` or `array_ops` opclass (for `JSONB` and `ARRAY` columns, respectively) using the syntax: + Also specify an opclass when [creating a trigram index](trigram-indexes.html#creation): ~~~ sql CREATE INDEX {optional name} ON {table} USING GIN ({column} {opclass}); ~~~ + {{site.data.alerts.callout_success}} + You can also use the preceding syntax to specify the `jsonb_ops` or `array_ops` opclass (for `JSONB` and `ARRAY` columns, respectively). + {{site.data.alerts.end}} + - While creating the table, using the syntax [`CREATE INVERTED INDEX`](create-table.html#create-a-table-with-secondary-and-gin-indexes): ~~~ sql CREATE INVERTED INDEX {optional name} ON {table} ({column}); ~~~ + Also specify an opclass when [creating a trigram index](trigram-indexes.html#creation): + + ~~~ sql + CREATE INVERTED INDEX {optional name} ON {table} ({column} {opclass}); + ~~~ + ### Selection If a query contains a filter against an indexed `JSONB` or `ARRAY` column that uses any of the [supported operators](#comparisons), the GIN index is added to the set of index candidates. @@ -193,7 +204,7 @@ CREATE TABLE users ( ## Examples -### Create a table with GIN index on a JSONB column +### Create a table with GIN index on a `JSONB` column In this example, let's create a table with a `JSONB` column and a GIN index: @@ -269,7 +280,7 @@ Now, run a query that filters on the `JSONB` column: (2 rows) ~~~ -### Add a GIN index to a table with an array column +### Add a GIN index to a table with an `ARRAY` column In this example, let's create a table with an `ARRAY` column first, and add the GIN index later: @@ -335,7 +346,7 @@ Now, let’s add a GIN index to the table and run a query that filters on the `A (2 rows) ~~~ -### Create a table with a partial GIN index on a JSONB column +### Create a table with a partial GIN index on a `JSONB` column In the same `users` table from [Create a table with GIN index on a JSONB column](#create-a-table-with-gin-index-on-a-jsonb-column), create a partial GIN index for online users. @@ -369,10 +380,14 @@ SELECT * FROM users@idx_online_users WHERE user_profile->'online' = 'true' AND u (1 row) ~~~ -### Create a trigram index on a STRING column +### Create a trigram index on a `STRING` column For an example showing how to create a trigram index on a [`STRING`](string.html) column, see [Trigram Indexes](trigram-indexes.html#examples). +### Create a full-text index on a `TSVECTOR` column + +For an example showing how to create a full-text index on a [`TSVECTOR`](tsvector.html) column, see [Full-Text Search](full-text-search.html#examples). + ### Inverted join examples {% include {{ page.version.version }}/sql/inverted-joins.md %} diff --git a/v23.1/jsonb.md b/v23.1/jsonb.md index 80aeb78ba98..80a38d131d3 100644 --- a/v23.1/jsonb.md +++ b/v23.1/jsonb.md @@ -460,7 +460,7 @@ SELECT '100.50'::JSONB::DECIMAL; (1 row) ~~~ -You use the [`parse_timestamp` function](functions-and-operators.html) to parse strings in `TIMESTAMP` format. +You can use the [`parse_timestamp` function](functions-and-operators.html) to parse strings in `TIMESTAMP` format. {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/v23.1/logging-use-cases.md b/v23.1/logging-use-cases.md index 4272c75969f..d5974b8eb77 100644 --- a/v23.1/logging-use-cases.md +++ b/v23.1/logging-use-cases.md @@ -394,7 +394,7 @@ I210323 20:02:12.095253 59168 10@util/log/event_log.go:32 ⋮ [n1,client=‹[::1 - Preceding the `=` character is the `crdb-v2` event metadata. See the [reference documentation](log-formats.html#format-crdb-v2) for details on the fields. - `ApplicationName` shows that the events originated from an application named `bank`. You can use this field to filter the logging output by application. -- `ErrorText` shows that this query encountered a type of [transaction retry error](transaction-retry-error-reference.html#retry_write_too_old). For details on transaction retry errors and how to resolve them, see the [Transaction retry error reference](transaction-retry-error-reference.html). +- `ErrorText` shows that this query encountered a [type of transaction retry error](transaction-retry-error-reference.html#retry_write_too_old). For details on transaction retry errors and how to resolve them, see the [Transaction Retry Error Reference](transaction-retry-error-reference.html#actions-to-take). - `NumRetries` shows that the transaction was retried once before succeeding. {{site.data.alerts.callout_info}} diff --git a/v23.1/manage-a-backup-schedule.md b/v23.1/manage-a-backup-schedule.md index e69197d0ca6..ecd59a16796 100644 --- a/v23.1/manage-a-backup-schedule.md +++ b/v23.1/manage-a-backup-schedule.md @@ -40,17 +40,19 @@ Further guidance on connecting to Amazon S3, Google Cloud Storage, Azure Storage ## Set up monitoring for the backup schedule -We recommend that you [monitor your backup schedule with Prometheus](monitoring-and-alerting.html#prometheus-endpoint), and alert when there are anomalies such as backups that have failed or no backups succeeding over a certain amount of time— at which point, you can inspect schedules by running [`SHOW SCHEDULES`](show-schedules.html). +We recommend that you [monitor your backup schedule with Prometheus](monitoring-and-alerting.html#prometheus-endpoint), and alert when there are anomalies such as backups that have failed or no backups succeeding over a certain amount of time—at which point, you can inspect schedules by running [`SHOW SCHEDULES`](show-schedules.html). Metrics for scheduled backups fall into two categories: - Backup schedule-specific metrics, aggregated across all schedules: - - `schedules_BACKUP_started`: A counter for the total number of backups started by a schedule - - `schedules_BACKUP_succeeded`: A counter for the number of backups started by a schedule that succeeded - - `schedules_BACKUP_failed`: A counter for the number of backups started by a schedule that failed + - `schedules.BACKUP.started`: The total number of backups started by a schedule. + - `schedules.BACKUP.succeeded`: The number of backups started by a schedule that succeeded. + - `schedules.BACKUP.failed`: The number of backups started by a schedule that failed. - When `schedules_BACKUP_failed` increments, run [`SHOW SCHEDULES`](show-schedules.html) to check which schedule is affected and to inspect the error in the `status` column. + When `schedules.BACKUP.failed` increments, run [`SHOW SCHEDULES`](show-schedules.html) to check which schedule is affected and to inspect the error in the `status` column. + - {% include_cached new-in.html version="v23.1" %} `schedules.BACKUP.protected_age_sec`: The age of the oldest [protected timestamp](architecture/storage-layer.html#protected-timestamps) record protected by backup schedules. + - {% include_cached new-in.html version="v23.1" %} `schedules.BACKUP.protected_record_count`: The number of [protected timestamp](architecture/storage-layer.html#protected-timestamps) records held by backup schedules. - Scheduler-specific metrics: diff --git a/v23.1/metrics.md b/v23.1/metrics.md index a0763850111..cf7f85eb6de 100644 --- a/v23.1/metrics.md +++ b/v23.1/metrics.md @@ -5,7 +5,7 @@ toc: false docs_area: reference.metrics --- -As part of normal operation, CockroachDB continuously records metrics that track performance, latency, usage, and many other runtime indicators. These metrics are often useful in diagnosing problems, troubleshooting performance, or planning cluster infrastructure modifications. This page documents locations where metrics are exposed for analysis, and includes the full list of available metrics in CockroachDB. +As part of normal operation, CockroachDB continuously records metrics that track performance, latency, usage, and many other runtime indicators. These metrics are often useful in diagnosing problems, troubleshooting performance, or planning cluster infrastructure modifications. This page documents locations where metrics are exposed for analysis. ## Available metrics diff --git a/v23.1/monitor-and-debug-changefeeds.md b/v23.1/monitor-and-debug-changefeeds.md index 5bd4de1facf..4a15388054c 100644 --- a/v23.1/monitor-and-debug-changefeeds.md +++ b/v23.1/monitor-and-debug-changefeeds.md @@ -20,7 +20,7 @@ The following define the categories of non-retryable errors: - The changefeed cannot convert the data to the specified [output format](changefeed-messages.html). For example, there are [Avro](changefeed-messages.html#avro) types that changefeeds do not support, or a [CDC query](cdc-queries.html) is using an unsupported or malformed expression. - The terminal error happens as part of established changefeed behavior. For example, you have specified the [`schema_change_policy=stop` option](create-changefeed.html#schema-policy) and a schema change happens. -We recommend monitoring changefeeds with [Prometheus](monitoring-and-alerting.html#prometheus-endpoint) to avoid accumulation of garbage after a changefeed encounters an error. See [Garbage collection and changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds) for more detail on how changefeeds interact with protected timestamps and garbage collection. In addition, see the [Recommended changefeed metrics to track](#recommended-changefeed-metrics-to-track) section for the essential metrics to track on a changefeed. +We recommend monitoring changefeeds with [Prometheus](monitoring-and-alerting.html#prometheus-endpoint) to avoid accumulation of garbage after a changefeed encounters an error. See [Garbage collection and changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds) for more detail on how changefeeds interact with [protected timestamps](architecture/storage-layer.html#protected-timestamps) and garbage collection. In addition, see the [Recommended changefeed metrics to track](#recommended-changefeed-metrics-to-track) section for the essential metrics to track on a changefeed. ## Monitor a changefeed @@ -59,6 +59,17 @@ By default, changefeeds will retry errors with [some exceptions](#changefeed-ret - `changefeed.error_retries`: The total number of retryable errors encountered by all changefeeds. - `changefeed.failures`: The total number of changefeed jobs that have failed. +#### Protected timestamp and garbage collection monitoring + +[Protected timestamps](architecture/storage-layer.html#protected-timestamps) will protect changefeed data from garbage collection in particular scenarios, but if a changefeed lags too far behind, the protected changes could cause data storage issues. See [Garbage collection and changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds) for detail on when changefeed data is protected from garbage collection. + +{% include_cached new-in.html version="v23.1" %} You can monitor changefeed jobs for [protected timestamp](architecture/storage-layer.html#protected-timestamps) usage. We recommend setting up monitoring for the following metrics: + +- `jobs.changefeed.protected_age_sec`: Tracks the age of the oldest [protected timestamp](architecture/storage-layer.html#protected-timestamps) record protected by changefeed jobs. We recommend monitoring if `protected_age_sec` is greater than [`gc.ttlseconds`](configure-replication-zones.html#gc-ttlseconds). As `protected_age_sec` increases, garbage accumulation increases. Garbage collection will not progress on a table, database, or cluster if the protected timestamp record is present. +- `jobs.changefeed.currently_paused`: Tracks the number of changefeed jobs currently considered [paused](pause-job.html). Since paused changefeed jobs can accumulate garbage, it is important to monitor the number of changefeeds paused. +- `jobs.changefeed.expired_pts_records`: Tracks the number of expired [protected timestamp](architecture/storage-layer.html#protected-timestamps) records owned by changefeed jobs. You can monitor this metric in conjunction with the [`gc_protect_expires_after` option](create-changefeed.html#gc-protect-expire). +- `jobs.changefeed.protected_record_count`: Tracks the number of [protected timestamp](architecture/storage-layer.html#protected-timestamps) records held by changefeed jobs. + ### Using changefeed metrics labels {{site.data.alerts.callout_info}} @@ -160,7 +171,7 @@ I190312 18:56:53.537686 585 vendor/github.com/Shopify/sarama/client.go:170 [kaf {% include_cached copy-clipboard.html %} ~~~ sql -> SHOW CHANGEFEED JOBS; +SHOW CHANGEFEED JOBS; ~~~ ~~~ diff --git a/v23.1/node-shutdown.md b/v23.1/node-shutdown.md index 3959aba829d..e1188d541e5 100644 --- a/v23.1/node-shutdown.md +++ b/v23.1/node-shutdown.md @@ -345,6 +345,8 @@ If the rebalancing stalls during decommissioning, replicas that have yet to move Do **not** terminate the node process, delete the storage volume, or remove the VM before a `decommissioning` node has [changed its membership status](#status-change) to `decommissioned`. Prematurely terminating the process will prevent the node from rebalancing all of its range replicas onto other nodes gracefully, cause transient query errors in client applications, and leave the remaining ranges under-replicated and vulnerable to loss of [quorum](architecture/replication-layer.html#overview) if another node goes down. {{site.data.alerts.end}} +{% include {{page.version.version}}/prod-deployment/decommission-pre-flight-checks.md %} + ### Terminate the node process @@ -577,6 +579,15 @@ You can use [`cockroach node drain`](cockroach-node.html) to drain a node separa
### Remove nodes +- [Prerequisites](#prerequisites) +- [Step 1. Get the IDs of the nodes to decommission](#step-1-get-the-ids-of-the-nodes-to-decommission) +- [Step 2. Drain the nodes manually](#step-2-drain-the-nodes-manually) +- [Step 3. Decommission the nodes](#step-3-decommission-the-nodes) +- [Step 4. Confirm the nodes are decommissioned](#step-4-confirm-the-nodes-are-decommissioned) +- [Step 5. Terminate the process on decommissioned nodes](#step-5-terminate-the-process-on-decommissioned-nodes) + +#### Prerequisites + In addition to the [graceful node shutdown](#prepare-for-graceful-shutdown) requirements, observe the following guidelines: - Before decommissioning nodes, verify that there are no [under-replicated or unavailable ranges](ui-cluster-overview-page.html#cluster-overview-panel) on the cluster. @@ -653,6 +664,8 @@ The `is_decommissioning` field remains `true` after all replicas have been remov Do **not** terminate the node process, delete the storage volume, or remove the VM before a `decommissioning` node has [changed its membership status](#status-change) to `decommissioned`. Prematurely terminating the process will prevent the node from rebalancing all of its range replicas onto other nodes gracefully, cause transient query errors in client applications, and leave the remaining ranges under-replicated and vulnerable to loss of [quorum](architecture/replication-layer.html#overview) if another node goes down. {{site.data.alerts.end}} +{% include {{page.version.version}}/prod-deployment/decommission-pre-flight-checks.md %} + #### Step 4. Confirm the nodes are decommissioned Check the status of the decommissioned nodes: diff --git a/v23.1/pause-job.md b/v23.1/pause-job.md index e830fbda9b5..5b714f3e18d 100644 --- a/v23.1/pause-job.md +++ b/v23.1/pause-job.md @@ -42,6 +42,28 @@ Parameter | Description `for_schedules_clause` | The schedule you want to pause jobs for. You can pause jobs for a specific schedule (`FOR SCHEDULE id`) or pause jobs for multiple schedules by nesting a [`SELECT` clause](select-clause.html) in the statement (`FOR SCHEDULES `). See the [examples](#pause-jobs-for-a-schedule) below. `WITH REASON = ...` | The reason to pause the job. CockroachDB stores the reason in the job's metadata, but there is no way to display it. +## Monitoring paused jobs + +We recommend monitoring paused jobs. Jobs that are paused for a long period of time can start to affect the cluster in the following ways: + +- A paused [backup](backup.html), [restore](restore.html), or index backfill job ([schema change](online-schema-changes.html)) will continue to hold a [protected timestamp](architecture/storage-layer.html#protected-timestamps) record on the data the job is operating on. This could result in data accumulation as the older versions of the keys cannot be [garbage collected](architecture/storage-layer.html#garbage-collection). In turn, this may cause increased disk usage and degraded performance for some workloads. See [Protected timestamps and scheduled backups](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) for more detail. +- A paused [changefeed](create-changefeed.html) job, if [`protect_data_from_gc_on_pause`](create-changefeed.html#protect-pause) is set, will also hold a protected timestamp record on the data the job is operating on. Depending on the value of [`gc_protect_expires_after`](create-changefeed.html#gc-protect-expire), this can lead to data accumulation. Once `gc_protect_expires_after` elapses, the protected timestamp record will be released and the changefeed job will be canceled. See [Garbage collection and changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds) for more detail. + +{% include_cached new-in.html version="v23.1" %} To avoid these issues, use the `jobs.{job_type}.currently_paused` metric to track the number of jobs (for each job type) that are currently considered paused. + +You can monitor protected timestamps relating to particular CockroachDB jobs with the following metrics: + +- `jobs.{job_type}.protected_age_sec` tracks the oldest protected timestamp record protecting `{job_type}` jobs. As this metric increases, garbage accumulation increases. Garbage collection will not progress on a table, database, or cluster if the protected timestamp record is present. +- `jobs.{job_type}.protected_record_count` tracks the number of protected timestamp records held by `{job_type}` jobs. + +For a full list of the available job types, access your cluster's [`/_status/vars`](monitoring-and-alerting.html#prometheus-endpoint) endpoint. + +See the following pages for details on metrics: + +- [Monitor and Debug Changefeeds](monitor-and-debug-changefeeds.html) +- [Backup and Restore Monitoring](backup-and-restore-monitoring.html) +- [Metrics](metrics.html) + ## Examples ### Pause a single job diff --git a/v23.1/performance-best-practices-overview.md b/v23.1/performance-best-practices-overview.md index 44304f05419..c7484887031 100644 --- a/v23.1/performance-best-practices-overview.md +++ b/v23.1/performance-best-practices-overview.md @@ -47,7 +47,7 @@ You can also use the [`IMPORT INTO`](import-into.html) statement to bulk-insert For more information, see [Insert Multiple Rows](insert.html#insert-multiple-rows-into-an-existing-table). {{site.data.alerts.callout_info}} -Large multi-row `INSERT` queries can lead to long-running transactions that result in [transaction retry errors](transaction-retry-error-reference.html). If a multi-row `INSERT` query results in an error code [`40001` with the message `"transaction deadline exceeded"`](transaction-retry-error-reference.html#retry_commit_deadline_exceeded), we recommend breaking up the query up into smaller batches of rows. +Large multi-row `INSERT` queries can lead to long-running transactions that result in [transaction retry errors](transaction-retry-error-reference.html). If a multi-row `INSERT` query results in an error code [`40001` with the message `transaction deadline exceeded`](transaction-retry-error-reference.html#retry_commit_deadline_exceeded), we recommend breaking up the query up into smaller batches of rows. {{site.data.alerts.end}} ### Use `IMPORT` instead of `INSERT` for bulk-inserts into new tables @@ -312,104 +312,55 @@ If you have long-running queries (such as analytics queries that perform full ta However, because `AS OF SYSTEM TIME` returns historical data, your reads might be stale. -## Hot spots - -A *hot spot* is any location on the cluster receiving significantly more requests than another. Hot spots can cause problems as requests increase. - -They commonly occur with transactions that operate on the **same range but different index keys**, which are limited by the overall hardware capacity of [the range leaseholder](architecture/overview.html#cockroachdb-architecture-terms) node. - -A hot spot can occur on a range that is indexed on a column of data that is sequential in nature (e.g., [an ordered sequence](sql-faqs.html#what-are-the-differences-between-uuid-sequences-and-unique_rowid), or a series of increasing, non-repeating [`TIMESTAMP`s](timestamp.html)), such that all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. Because the system is unable to find a split point in the range that evenly divides the traffic, the range cannot benefit from [load-based splitting](load-based-splitting.html). This creates a hot spot at the single range. - -Read hot spots can occur if you perform lots of scans of a portion of a table index or a single key. - -### Find hot spots - -To track down nodes experiencing hot spots, use the [Hot Ranges page](ui-hot-ranges-page.html) and the [Range Report](ui-hot-ranges-page.html#range-report). To track down ranges experiencing hot spots, use the [Key visualizer](ui-key-visualizer.html). - -### Reduce hot spots - -To reduce hot spots: - -- Use index keys with a random distribution of values, so that transactions over different rows are more likely to operate on separate data ranges. See the [SQL FAQs](sql-faqs.html#how-do-i-auto-generate-unique-row-ids-in-cockroachdb) on row IDs for suggestions. - -- Place parts of the records that are modified by different transactions in different tables. That is, increase [normalization](https://en.wikipedia.org/wiki/Database_normalization). However, there are benefits and drawbacks to increasing normalization. - - - Benefits: - - - Allows separate transactions to modify related underlying data without causing [contention](#transaction-contention). - - Can improve performance for read-heavy workloads. - - - Drawbacks: - - - More complex data model. - - Increases the chance of data inconsistency. - - Increases data redundancy. - - Can degrade performance for write-heavy workloads. - -- If the application strictly requires operating on very few different index keys, consider using [`ALTER ... SPLIT AT`](alter-table.html#split-at) so that each index key can be served by a separate group of nodes in the cluster. - -- If you are working with a table that **must** be indexed on sequential keys, consider using [hash-sharded indexes](hash-sharded-indexes.html). For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see the blog post [Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads](https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/). As part of this, we recommend doing thorough performance testing with and without hash-sharded indexes to see which works best for your application. - -- To avoid read hot spots: - - - Increase data distribution, which will allow for more ranges. The hot spot exists because the data being accessed is all co-located in one range. - - Increase load balancing across more nodes in the same range. Most transactional reads must go to the leaseholder in CockroachDB, which means that opportunities for load balancing over replicas are minimal. - - However, the following features do permit load balancing over replicas: - - - Global tables - - Follower reads (both the bounded staleness and the exact staleness kinds) - - In these cases, more replicas will help, up to the number of nodes in the cluster. They all only help with reads, and they all come with their own tradeoffs. - ## Transaction contention -Transactions that operate on the _same index key values_ (specifically, that operate on the same [column family](column-families.html) for a given index key) are strictly serialized to obey transaction isolation semantics. To maintain this isolation, writing transactions ["lock" rows](architecture/transaction-layer.html#writing) to prevent hazardous interactions with concurrent transactions. However, locking can lead to processing delays if multiple transactions are trying to access the same "locked" data at the same time. This is referred to as _transaction_ (or _lock_) _contention_. +Transactions that operate on the *same index key values* (specifically, that operate on the same [column family](column-families.html) for a given index key) are strictly serialized to obey transaction isolation semantics. To maintain this isolation, writing transactions ["lock" rows](architecture/transaction-layer.html#writing) to prevent hazardous interactions with concurrent transactions. -Transaction contention occurs when the following three conditions are met: +*Transaction contention* occurs when the following three conditions are met: - There are multiple concurrent transactions or statements (sent by multiple clients connected simultaneously to a single CockroachDB cluster). - They operate on table rows with the _same index key values_ (either on [primary keys](primary-key.html) or secondary [indexes](indexes.html)). -- At least one of the transactions modify the data. - -Transactions that experience contention typically show [delays in completion](query-behavior-troubleshooting.html#hanging-or-stuck-queries) or [`restart transaction` errors with the error code `40001`](common-errors.html#restart-transaction). The possibility of transaction restarts requires clients to implement [client-side transaction retries](transaction-retry-error-reference.html#client-side-retry-handling). +- At least one of the transactions modifies the data. -For further background on transaction contention, see [What is Database Contention, and Why Should You Care?](https://www.cockroachlabs.com/blog/what-is-database-contention/). +[When transactions are experiencing contention](performance-recipes.html#indicators-that-your-application-is-experiencing-transaction-contention), you may observe: -### Indicators your application is experiencing transaction contention +- [Delays in query completion](query-behavior-troubleshooting.html#hanging-or-stuck-queries). This occurs when multiple transactions are trying to write to the same "locked" data at the same time, making a transaction unable to complete. This is also known as *lock contention*. +- [Transaction retries](transactions.html#automatic-retries) performed automatically by CockroachDB. This occurs if a transaction cannot be placed into a [serializable ordering](demo-serializable.html) among all of the currently-executing transactions. +- [Transaction retry errors](transaction-retry-error-reference.html), which are emitted to your client when an automatic retry is not possible or fails. Your application must address transaction retry errors with [client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling). +- [Cluster hot spots](#hot-spots). -{% include {{page.version.version}}/performance/contention-indicators.md %} +To mitigate these effects, [reduce the causes of transaction contention](performance-best-practices-overview.html#reduce-transaction-contention) and [reduce hot spots](#reduce-hot-spots). For further background on transaction contention, see [What is Database Contention, and Why Should You Care?](https://www.cockroachlabs.com/blog/what-is-database-contention/). -### Find transaction contention +### Reduce transaction contention -{% include {{ page.version.version }}/performance/statement-contention.md %} +You can reduce the causes of transaction contention: - +{% include {{ page.version.version }}/performance/reduce-contention.md %} -### Reduce transaction contention +### Improve transaction performance by sizing and configuring the cluster -To reduce transaction contention: +To maximize transaction performance, you'll need to maximize the performance of a single [range](architecture/glossary.html#architecture-range). To achieve this, you can apply multiple strategies: -- Make transactions smaller, so that each transaction has less work to do. In particular, avoid multiple client-server exchanges per transaction. For example, use [common table expressions](common-table-expressions.html) to group multiple [`SELECT`](select-clause.html) and [`INSERT`](insert.html), [`UPDATE`](update.html), [`DELETE`](delete.html), and [`UPSERT`](upsert.html) clauses together in a single SQL statement. +- Minimize the network distance between the [replicas of a range](architecture/overview.html#architecture-replica), possibly using [zone configs](configure-replication-zones.html) and [partitioning](partitioning.html), or the newer [Multi-region SQL capabilities](multiregion-overview.html). +- Use the fastest [storage devices](recommended-production-settings.html#storage) available. +- If the contending transactions operate on different keys within the same range, add [more CPU power (more cores) per node](recommended-production-settings.html#sizing). However, if the transactions all operate on the same key, this may not provide an improvement. - - For an example showing how to break up large transactions in an application, see [Break up large transactions into smaller units of work](build-a-python-app-with-cockroachdb-sqlalchemy.html#break-up-large-transactions-into-smaller-units-of-work). - - If you are experiencing contention (retries) when doing bulk deletes, see [Bulk-delete data](bulk-delete-data.html). +## Hot spots -- [Send all of the statements in your transaction in a single batch](transactions.html#batched-statements) so that CockroachDB can automatically retry the transaction for you. +A *hot spot* is any location on the cluster receiving significantly more requests than another. Hot spots are a symptom of *resource contention* and can create problems as requests increase, including excessive [transaction contention](#transaction-contention). -- Use the [`SELECT FOR UPDATE`](select-for-update.html) statement in scenarios where a transaction performs a read and then updates the row(s) it just read. The statement orders transactions by controlling concurrent access to one or more rows of a table. It works by locking the rows returned by a [selection query](selection-queries.html), such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue that is ordered based on when they try to read the value of the locked row(s). +[Hot spots occur](performance-recipes.html#indicators-that-your-cluster-has-hot-spots) when an imbalanced workload access pattern causes significantly more reads and writes on a subset of data. For example: -- When replacing values in a row, use [`UPSERT`](upsert.html) and specify values for all columns in the inserted rows. This will usually have the best performance under [contention](#transaction-contention), compared to combinations of [`SELECT`](select-clause.html), [`INSERT`](insert.html), and [`UPDATE`](update.html). +- Transactions operate on the **same range but different index keys**. These operations are limited by the overall hardware capacity of [the range leaseholder](architecture/overview.html#cockroachdb-architecture-terms) node. +- A range is indexed on a column of data that is sequential in nature (e.g., [an ordered sequence](sql-faqs.html#what-are-the-differences-between-uuid-sequences-and-unique_rowid), or a series of increasing, non-repeating [`TIMESTAMP`s](timestamp.html)), such that all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. Because the system is unable to find a split point in the range that evenly divides the traffic, the range cannot benefit from [load-based splitting](load-based-splitting.html). This creates a hot spot at the single range. -### Improve transaction performance by sizing and configuring the cluster +Read hot spots can occur if you perform lots of scans of a portion of a table index or a single key. -To maximize transaction performance, you'll need to maximize the performance of a single [range](architecture/glossary.html#architecture-range). To achieve this, you can apply multiple strategies: +### Reduce hot spots -- Minimize the network distance between the [replicas of a range](architecture/overview.html#architecture-replica), possibly using [zone configs](configure-replication-zones.html) and [partitioning](partitioning.html), or the newer [Multi-region SQL capabilities](multiregion-overview.html). -- Use the fastest [storage devices](recommended-production-settings.html#storage) available. -- If the contending transactions operate on different keys within the same range, add [more CPU power (more cores) per node](recommended-production-settings.html#sizing). However, if the transactions all operate on the same key, this may not provide an improvement. +{% include {{ page.version.version }}/performance/reduce-hot-spots.md %} ## See also diff --git a/v23.1/performance-recipes.md b/v23.1/performance-recipes.md index 47603cccc15..14c08839043 100644 --- a/v23.1/performance-recipes.md +++ b/v23.1/performance-recipes.md @@ -25,18 +25,25 @@ This section describes how to use CockroachDB commands and dashboards to identif
    -
  • Your application is experiencing degraded performance with the following transaction retry errors: -
      -
    • SQLSTATE: 40001
    • -
    • RETRY_WRITE_TOO_OLD
    • -
    • RETRY_SERIALIZABLE
    • -
    -
  • The SQL Statement Contention dashboard in the DB Console is showing spikes over time.
  • -
  • The SQL Statement Errors graph in the DB Console is showing spikes in retries over time.
  • -
+
  • The Transactions page in the {{ site.data.products.db }} Console or DB Console shows transactions with Waiting status.
  • +
  • Your application is experiencing degraded performance with SQLSTATE: 40001 and a transaction retry error message.
  • +
  • Querying the crdb_internal.transaction_contention_events table indicates that your transactions have experienced contention.
  • +
  • The SQL Statement Contention graph in the {{ site.data.products.db }} Console or DB Console is showing spikes over time.
  • +
  • The Transaction Restarts graph in the {{ site.data.products.db }} Console or DB Console is showing spikes in retries over time.
  • - + + + + +
      +
    • The Hot Ranges page (DB Console) displays a higher-than-expected QPS for a range.
    • +
    • The Key Visualizer (DB Console) shows ranges with much higher-than-average write rates for the cluster.
    • +
    + +
      @@ -77,15 +84,113 @@ This section provides solutions for common performance issues in your applicatio ### Transaction contention -[Transaction contention](performance-best-practices-overview.html#transaction-contention) occurs when transactions issued from multiple clients at the same time operate on the same data. This can cause transactions to wait on each other (like when many people try to check out with the same cashier at a store) and decrease performance. +[Transaction contention](performance-best-practices-overview.html#transaction-contention) is a state of conflict that occurs when: + +- A [transaction](transactions.html) is unable to complete due to another concurrent or recent transaction attempting to write to the same data. This is also called *lock contention*. +- A transaction is [automatically retried](transactions.html#automatic-retries) because it could not be placed into a [serializable ordering](demo-serializable.html) among all of the currently-executing transactions. If the automatic retry is not possible or fails, a [*transaction retry error*](transaction-retry-error-reference.html) is emitted to the client, requiring the client application to [retry the transaction](transaction-retry-error-reference.html#client-side-retry-handling). #### Indicators that your application is experiencing transaction contention -{% include {{page.version.version}}/performance/contention-indicators.md %} +##### Waiting transaction + +These are indicators that a transaction is trying to access a row that has been ["locked"](architecture/transaction-layer.html#writing) by another, concurrent write transaction. + +- The **Active Executions** table on the **Transactions** page ([{{ site.data.products.db }} Console](../cockroachcloud/transactions-page.html) or [DB Console](ui-transactions-page.html#active-executions-table)) shows transactions with `Waiting` in the **Status** column. You can sort the table by **Time Spent Waiting**. +- Querying the [`crdb_internal.cluster_locks`](crdb-internal.html#cluster_locks) table shows transactions where [`granted`](crdb-internal.html#cluster-locks-columns) is `false`. + +These are indicators that lock contention occurred in the past: + +- Querying the [`crdb_internal.transaction_contention_events`](crdb-internal.html#transaction_contention_events) table indicates that your transactions have experienced lock contention. + + - This is also shown in the **Transaction Executions** view on the **Insights** page ([{{ site.data.products.db }} Console](../cockroachcloud/insights-page.html#transaction-executions-view) and [DB Console](ui-insights-page.html#transaction-executions-view)). Transaction executions will display the **High Contention** insight. + {{site.data.alerts.callout_info}} + {% include {{ page.version.version }}/performance/sql-trace-txn-enable-threshold.md %} + {{site.data.alerts.end}} + +- The **SQL Statement Contention** graph ([{{ site.data.products.db }} Console](../cockroachcloud/metrics-page.html#sql-statement-contention) and [DB Console](ui-sql-dashboard.html#sql-statement-contention)) is showing spikes over time. + SQL Statement Contention graph in DB Console + +If a long-running transaction is waiting due to [lock contention](performance-best-practices-overview.html#transaction-contention): + +1. [Identify the blocking transaction](#identify-conflicting-transactions). +1. Evaluate whether you can cancel the transaction. If so, [cancel it](#cancel-a-blocking-transaction) to unblock the waiting transaction. +1. Optimize the transaction to [reduce further contention](#reduce-transaction-contention). In particular, break down larger transactions such as [bulk deletes](bulk-delete-data.html) into smaller ones to have transactions hold locks for a shorter duration, and use [historical reads](as-of-system-time.html) when possible to reduce conflicts with other writes. + +If lock contention occurred in the past, you can [identify the transactions and objects that experienced lock contention](#identify-transactions-and-objects-that-experienced-lock-contention). + +##### Transaction retry error + +These are indicators that a transaction has failed due to [contention](performance-best-practices-overview.html#transaction-contention). + +- A [transaction retry error](transaction-retry-error-reference.html) with `SQLSTATE: 40001`, the string [`restart transaction`](common-errors.html#restart-transaction), and an error code such as [`RETRY_WRITE_TOO_OLD`](transaction-retry-error-reference.html#retry_write_too_old) or [`RETRY_SERIALIZABLE`](transaction-retry-error-reference.html#retry_serializable), is emitted to the client. +- An event with `TransactionRetryWithProtoRefreshError` is emitted to the CockroachDB [logs](logging-use-cases.html#example-slow-sql-query). + +These are indicators that transaction retries occurred in the past: + +- The **Transaction Restarts** graph ([{{ site.data.products.db }} Console](../cockroachcloud/metrics-page.html#transaction-restarts) and [DB Console](ui-sql-dashboard.html#transaction-restarts) is showing spikes in transaction retries over time. + +{% include {{ page.version.version }}/performance/transaction-retry-error-actions.md %} #### Fix transaction contention problems -{% include {{ page.version.version }}/performance/statement-contention.md %} +Identify the transactions that are in conflict, and unblock them if possible. In general, take steps to [reduce transaction contention](#reduce-transaction-contention). + +In addition, implement [client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling) so that your application can respond to [transaction retry errors](transaction-retry-error-reference.html) that are emitted when CockroachDB cannot [automatically retry](transactions.html#automatic-retries) a transaction. + +##### Identify conflicting transactions + +- In the **Active Executions** table on the **Transactions** page ([{{ site.data.products.db }} Console](../cockroachcloud/transactions-page.html) or [DB Console](ui-transactions-page.html#active-executions-table)), look for a **waiting** transaction (`Waiting` status). + {{site.data.alerts.callout_success}} + If you see many waiting transactions, a single long-running transaction may be blocking transactions that are, in turn, blocking others. In this case, sort the table by **Time Spent Waiting** to find the transaction that has been waiting for the longest amount of time. Unblocking this transaction may unblock the other transactions. + {{site.data.alerts.end}} + Click the transaction's execution ID and view the following transaction execution details: + Movr rides transactions + - **Last Retry Reason** shows the last [transaction retry error](#transaction-retry-error) received for the transaction, if applicable. + - The details of the **blocking** transaction, directly below the **Contention Insights** section. Click the blocking transaction to view its details. + +##### Cancel a blocking transaction + +1. [Identify the **blocking** transaction](#identify-conflicting-transactions) and view its transaction execution details. +1. Click its **Session ID** to open the **Session Details** page. + Sessions Details Page +1. Click **Cancel Statement** to cancel the **Most Recent Statement** and thus the transaction, or click **Cancel Session** to cancel the session issuing the transaction. + +##### Identify transactions and objects that experienced lock contention + +To identify transactions that experienced [lock contention](performance-best-practices-overview.html#transaction-contention) in the past: + +- In the **Transaction Executions** view on the **Insights** page ([{{ site.data.products.db }} Console](../cockroachcloud/insights-page.html#transaction-executions-view) and [DB Console](ui-insights-page.html#transaction-executions-view)), look for a transaction with the **High Contention** insight. Click the transaction's execution ID and view the transaction execution details, including the details of the blocking transaction. +- Visit the **Transactions** page ([{{ site.data.products.db }} Console](../cockroachcloud/transactions-page.html) and [DB Console](ui-transactions-page.html)) and sort transactions by **Contention Time**. + +To view tables and indexes that experienced [contention](performance-best-practices-overview.html#transaction-contention): + +- Query the [`crdb_internal.transaction_contention_events`](crdb-internal.html#transaction_contention_events) table to view [transactions that have blocked other transactions](crdb-internal.html#transaction-contention-example). +- Query the [`crdb_internal.cluster_contended_tables`](crdb-internal.html#cluster_contended_tables) table to [view all tables that have experienced contention](crdb-internal.html#view-all-tables-that-have-experienced-contention). +- Query the [`crdb_internal.cluster_contended_indexes`](crdb-internal.html#cluster_contended_indexes) table to [view all indexes that have experienced contention](crdb-internal.html#view-all-indexes-that-have-experienced-contention). +- Query the [`crdb_internal.cluster_contention_events`](crdb-internal.html#cluster_contention_events) table +to [view the tables, indexes, and transactions with the most time under contention](crdb-internal.html#view-the-tables-indexes-with-the-most-time-under-contention). + +##### Reduce transaction contention + +[Contention](performance-best-practices-overview.html#transaction-contention) is often reported after it has already resolved. Therefore, preventing contention before it affects your cluster's performance is a more effective approach: + +{% include {{ page.version.version }}/performance/reduce-contention.md %} + +### Hot spots + +[Hot spots](performance-best-practices-overview.html#hot-spots) are a symptom of *resource contention* and can create problems as requests increase, including excessive [transaction contention](#transaction-contention). + +#### Indicators that your cluster has hot spots + +- The **CPU Percent** graph on the [**Hardware**](ui-hardware-dashboard.html) and [**Overload**](ui-overload-dashboard.html) dashboards (DB Console) shows spikes in CPU usage. +- The **Hot Ranges** list on the [**Hot Ranges** page](ui-hot-ranges-page.html) (DB Console) displays a higher-than-expected QPS for a range. +- The [**Key Visualizer**](ui-key-visualizer.html) (DB Console) shows [ranges with much higher-than-average write rates](ui-key-visualizer.html#identifying-hot-spots) for the cluster. + +If you find hot spots, use the [**Range Report**](ui-hot-ranges-page.html#range-report) and [**Key Visualizer**](ui-key-visualizer.html) to identify the ranges with excessive traffic. Then take steps to [reduce hot spots](#reduce-hot-spots). + +#### Reduce hot spots + +{% include {{ page.version.version }}/performance/reduce-hot-spots.md %} ### Statements with full table scans diff --git a/v23.1/query-behavior-troubleshooting.md b/v23.1/query-behavior-troubleshooting.md index 3d83963e9aa..f5c0fc2065d 100644 --- a/v23.1/query-behavior-troubleshooting.md +++ b/v23.1/query-behavior-troubleshooting.md @@ -17,42 +17,9 @@ For a developer-centric overview of optimizing SQL statement performance, see [O When you experience a hanging or stuck query and the cluster is healthy (i.e., no [unavailable ranges](ui-replication-dashboard.html#unavailable-ranges), [network partitions](cluster-setup-troubleshooting.html#network-partition), etc), the cause could be a long-running transaction holding [write intents](architecture/transaction-layer.html#write-intents) open against the same rows as your query. -Such long-running queries can hold intents open for (practically) unlimited durations. If your query tries to access those rows, it may have to wait for that transaction to complete (by [committing](commit-transaction.html) or [rolling back](rollback-transaction.html)) before it can make progress. +Such long-running queries can hold intents open for (practically) unlimited durations. If your query tries to access those rows, it may have to wait for that transaction to complete (by [committing](commit-transaction.html) or [rolling back](rollback-transaction.html)) before it can make progress. Until the transaction is committed or rolled back, the chances of concurrent transactions internally retrying and throwing a retry error increase. -This situation is hard to diagnose via the [Transactions](ui-transactions-page.html) and [Statements](ui-statements-page.html) pages in the [DB Console](ui-overview.html) since [contention](performance-best-practices-overview.html#transaction-contention) is only reported after the conflict has been resolved (which in this scenario may be never). - -In these cases, you will need to take the following steps. - -1. [Find long running transactions](#step-1-find-long-running-transactions) -1. [Find client sessions for those transactions](#step-2-find-the-client-session) -1. [Cancel the transaction or session](#step-3-cancel-the-transaction-or-session) - -#### Step 1. Find long-running transactions - -Run the following query against the [`crdb_internal.cluster_transactions`](crdb-internal.html#cluster_transactions) table to list transactions that have been running longer than 10 minutes. - -{% include_cached copy-clipboard.html %} -~~~ sql -SELECT now() - start AS dur, * FROM crdb_internal.cluster_transactions WHERE now() - start > '10m'::INTERVAL ORDER BY dur DESC LIMIT 10 -~~~ - -For each row in the results, if the `txn_string` column shows `lock=true` (or `seq > 0`), the transaction associated with that row is a writing transaction, and its open write intents will block access for other transactions. -If the query returns lots of transactions, it is often the case than a single transaction is blocking others, and those may be blocking yet others. Try to look for the oldest, longest-running transaction and cancel that one first; that may be sufficient to unblock all of the others. - -#### Step 2. Find the client session - -Next, find the client session owning the long-running transaction by querying the [`crdb_internal.cluster_sessions`](crdb-internal.html#cluster_sessions) table. You will need the value of the `id` column from the query in the previous step. -This step is necessary if you want to cancel the entire session the transaction is associated with. - -~~~ sql -SELECT * FROM crdb_internal.cluster_sessions WHERE kv_txn = {id_column_from_previous_query} -~~~ - -#### Step 3. Cancel the transaction or session - -Finally, cancel the longest-running transaction you found in [Step 1](#step-1-find-long-running-transactions) using [`CANCEL QUERY`](cancel-query.html) and check if that resolves the problem. - -If you want to cancel the whole session for that transaction, use [`CANCEL SESSION`](cancel-session.html) using the session ID you found in [Step 2](#step-2-find-the-client-session). +Refer to the performance tuning recipe for [identifying and unblocking a waiting transaction](performance-recipes.html#waiting-transaction). ### Identify slow queries diff --git a/v23.1/recommended-production-settings.md b/v23.1/recommended-production-settings.md index d04d897c4e8..c717960a1fb 100644 --- a/v23.1/recommended-production-settings.md +++ b/v23.1/recommended-production-settings.md @@ -670,4 +670,6 @@ When running CockroachDB on Kubernetes, making the following minimal customizati For more information and additional customization suggestions, see our full detailed guide to [CockroachDB Performance on Kubernetes](kubernetes-performance.html). -{% include common/transaction-retries.md %} +## Transaction retries + +When several transactions try to modify the same underlying data concurrently, they may experience [contention](performance-best-practices-overview.html#transaction-contention) that leads to [transaction retries](transactions.html#transaction-retries). To avoid failures in production, your application should be engineered to handle transaction retries using [client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling). \ No newline at end of file diff --git a/v23.1/schema-design-table.md b/v23.1/schema-design-table.md index 4f84636e873..1a6405e91cb 100644 --- a/v23.1/schema-design-table.md +++ b/v23.1/schema-design-table.md @@ -211,7 +211,7 @@ Here are some best practices to follow when selecting primary key columns: - Avoid defining primary keys over a single column of sequential data. - Querying a table with a primary key on a single sequential column (e.g., an auto-incrementing [`INT`](int.html) column, or a [`TIMESTAMP`](timestamp.html) value) can result in single-range [hot spots](performance-best-practices-overview.html#hot-spots) that negatively affect performance, or cause [transaction contention](transactions.html#transaction-contention). + Querying a table with a primary key on a single sequential column (e.g., an auto-incrementing [`INT`](int.html) column, or a [`TIMESTAMP`](timestamp.html) value) can result in single-range [hot spots](performance-best-practices-overview.html#hot-spots) that negatively affect performance, or cause [transaction contention](performance-best-practices-overview.html#transaction-contention). If you are working with a table that *must* be indexed on sequential keys, use [hash-sharded indexes](hash-sharded-indexes.html). For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see our [Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads](https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/) blog post. diff --git a/v23.1/show-jobs.md b/v23.1/show-jobs.md index f677ad830e1..c8cda4bc9c6 100644 --- a/v23.1/show-jobs.md +++ b/v23.1/show-jobs.md @@ -101,6 +101,10 @@ Status | Description `revert-failed` | Job encountered a non-retryable error when reverting the changes. It is necessary to manually clean up a job with this status. `retrying` | Job is retrying another job that failed. +{{site.data.alerts.callout_info}} +We recommend monitoring paused jobs to protect historical data from [garbage collection](architecture/storage-layer.html#garbage-collection), or potential data accumulation in the case of [changefeeds](changefeed-messages.html#garbage-collection-and-changefeeds). See [Monitoring paused jobs](pause-job.html#monitoring-paused-jobs) for detail on metrics to track paused jobs and [protected timestamps](architecture/storage-layer.html#protected-timestamps). +{{site.data.alerts.end}} + ## Examples ### Show jobs diff --git a/v23.1/show-ranges.md b/v23.1/show-ranges.md index 3d92344750d..4086361435f 100644 --- a/v23.1/show-ranges.md +++ b/v23.1/show-ranges.md @@ -5,9 +5,13 @@ toc: true docs_area: reference.sql --- -The `SHOW RANGES` [statement](sql-statements.html) shows information about the [ranges](architecture/overview.html#architecture-range) that comprise the data for a table, index, or entire database. This information is useful for verifying how SQL data maps to underlying ranges, and where the replicas for ranges are located. If `SHOW RANGES` displays `NULL` for both the start and end keys of a range, the range is empty and has no splits. +The `SHOW RANGES` [statement](sql-statements.html) shows information about the [ranges](architecture/overview.html#architecture-range) that comprise the data for a table, index, database, or the current catalog. This information is useful for verifying how SQL data maps to underlying [ranges](architecture/overview.html#architecture-range), and where the [replicas](architecture/glossary.html#replica) for those ranges are located. {{site.data.alerts.callout_info}} +{% include {{page.version.version}}/sql/show-ranges-output-deprecation-notice.md %} +{{site.data.alerts.end}} + +{{site.data.alerts.callout_success}} To show range information for a specific row in a table or index, use the [`SHOW RANGE ... FOR ROW`](show-range-for-row.html) statement. {{site.data.alerts.end}} @@ -25,96 +29,388 @@ To use the `SHOW RANGES` statement, a user must either be a member of the [`admi Parameter | Description ----------|------------ -[`table_name`](sql-grammar.html#table_name) | The name of the table you want range information about. -[`table_index_name`](sql-grammar.html#table_index_name) | The name of the index you want range information about. -[`database_name`](sql-grammar.html#database_name) | The name of the database you want range information about. +[`table_name`](sql-grammar.html#table_name) | The name of the [table](show-tables.html) you want [range](architecture/overview.html#architecture-range) information about. +[`table_index_name`](sql-grammar.html#table_index_name) | The name of the [index](indexes.html) you want [range](architecture/overview.html#architecture-range) information about. +[`database_name`](sql-grammar.html#database_name) | The name of the [database](show-databases.html) you want [range](architecture/overview.html#architecture-range) information about. +[`opt_show_ranges_options`](sql-grammar.html#show_ranges_options) | The [options](#options) used to configure what fields appear in the [response](#response). + +## Options + +The following [options](sql-grammar.html#show_ranges_options) are available to affect the output. Multiple options can be passed at once, separated by commas. + +- `TABLES`: List [tables](show-tables.html) contained per [range](architecture/overview.html#architecture-range). +- `INDEXES`: List [indexes](indexes.html) contained per [range](architecture/overview.html#architecture-range). +- `DETAILS`: Add [range](architecture/overview.html#architecture-range) size, [leaseholder](architecture/glossary.html#leaseholder) and other details. Note that this incurs a large computational overhead because it needs to fetch data across nodes. +- `KEYS`: Include binary [start and end keys](#start-key). ## Response -The following fields are returned for each partition: +The specific fields in the response vary depending on the values passed as [options](#options). The following fields may be returned: -Field | Description -------|------------ -`table_name` | The name of the table. -`start_key` | The start key for the range. -`end_key` | The end key for the range. -`range_id` | The range ID. -`range_size_mb` | The size of the range. -`lease_holder` | The node that contains the range's [leaseholder](architecture/overview.html#architecture-range). -`lease_holder_locality` | The [locality](cockroach-start.html#locality) of the leaseholder. -`replicas` | The nodes that contain the range [replicas](architecture/overview.html#architecture-range). -`replica_localities` | The [locality](cockroach-start.html#locality) of the range. +Field | Description | Emitted for option(s) +------|-------------|---------------------- +`start_key` | The start key for the [range](architecture/overview.html#architecture-range). | Always emitted. +`end_key` | The end key for the [range](architecture/overview.html#architecture-range). | Always emitted. +`raw_start_key` | The start key for the [range](architecture/overview.html#architecture-range), displayed as a [hexadecimal byte value](sql-constants.html#string-literals-with-character-escapes). | `KEYS` +`raw_end_key` | The end key for the [range](architecture/overview.html#architecture-range), displayed as a [hexadecimal byte value](sql-constants.html#string-literals-with-character-escapes). | `KEYS` +`range_id` | The internal [range](architecture/overview.html#architecture-range) ID. | Always emitted. +`voting_replicas` | The [nodes](architecture/glossary.html#node) that contain the range's voting replicas (that is, the replicas that participate in [Raft](architecture/replication-layer.html#raft) elections). | Always emitted. +`non_voting_replicas` | The [nodes](architecture/glossary.html#node) that contain the range's [non-voting replicas](architecture/replication-layer.html#non-voting-replicas). | Always emitted. +`replicas` | The [nodes](architecture/glossary.html#node) that contain the range's [replicas](architecture/glossary.html#replica). | Always emitted. +`replica_localities` | The [localities](cockroach-start.html#locality) of the range's [replicas](architecture/glossary.html#replica). | Always emitted. +`range_size` | The size of the [range](architecture/overview.html#architecture-range) in bytes. | `DETAILS` +`range_size_mb` | The size of the [range](architecture/overview.html#architecture-range) in MiB. | `DETAILS` +`lease_holder` | The [node](architecture/glossary.html#node) that contains the range's [leaseholder](architecture/glossary.html#leaseholder). | `DETAILS` +`lease_holder_locality` | The [locality](cockroach-start.html#locality) of the range's [leaseholder](architecture/glossary.html#leaseholder). | `DETAILS` +`learner_replicas` | The _learner replicas_ of the range. A learner replica is a replica that has just been added to a range, and is thus in an interim state. It accepts messages but doesn't vote in [Raft](architecture/replication-layer.html#raft) elections. This means it doesn't affect quorum and thus doesn't affect the stability of the range, even if it's very far behind. | Always emitted. +`split_enforced_until` | The time a [range split](architecture/distribution-layer.html#range-splits) is enforced until. This can be set using [`ALTER TABLE ... SPLIT AT`](alter-table.html#split-at) using the [`WITH EXPIRATION` clause](alter-table.html#set-the-expiration-on-a-split-enforcement). Example: `2262-04-11 23:47:16.854776` (this is a default value which means "never"). | Always emitted. +`schema_name` | The name of the [schema](create-schema.html) this [range](architecture/overview.html#architecture-range) holds data for. | `TABLES`, `INDEXES` +`table_name` | The name of the [table](create-table.html) this [range](architecture/overview.html#architecture-range) holds data for. | `TABLES`, `INDEXES` +`table_id` | The internal ID of the [table](create-table.html) this [range](architecture/overview.html#architecture-range) holds data for. | `TABLES`, `INDEXES` +`table_start_key` | The start key of the first [range](architecture/overview.html#architecture-range) that holds data for this table. | `TABLES` +`table_end_key` | The end key of the last [range](architecture/overview.html#architecture-range) that holds data for this table. | `TABLES` +`raw_table_start_key` | The start key of the first [range](architecture/overview.html#architecture-range) that holds data for this table, expressed as [`BYTES`](bytes.html). | `TABLES`, `KEYS` +`raw_table_end_key` | The end key of the last [range](architecture/overview.html#architecture-range) that holds data for this table, expressed as [`BYTES`](bytes.html). | `TABLES`, `KEYS` +`index_name` | The name of the [index](indexes.html) this [range](architecture/overview.html#architecture-range) holds data for. | `INDEXES` +`index_id` | The internal ID of the [index](indexes.html) this [range](architecture/overview.html#architecture-range) holds data for. | `INDEXES` +`index_start_key` | The start key of the first [range](architecture/overview.html#architecture-range) of [index](indexes.html) data. | `INDEXES` +`index_end_key` | The end key of the last [range](architecture/overview.html#architecture-range) of [index](indexes.html) data. | `INDEXES` +`raw_index_start_key` | The start key of the first [range](architecture/overview.html#architecture-range) of [index](indexes.html) data, expressed as [`BYTES`](bytes.html). | `INDEXES`, `KEYS` +`raw_index_end_key` | The end key of the last [range](architecture/overview.html#architecture-range) of [index](indexes.html) data, expressed as [`BYTES`](bytes.html). | `INDEXES`, `KEYS` + +## Examples {{site.data.alerts.callout_info}} -If both `start_key` and `end_key` show `NULL`, the range is empty and has no splits. +{% include {{page.version.version}}/sql/show-ranges-output-deprecation-notice.md %} {{site.data.alerts.end}} -## Examples - {% include {{page.version.version}}/sql/movr-statements-geo-partitioned-replicas.md %} -### Show ranges for a table (primary index) +### Show ranges for a database + +- [Show ranges for a database (without options)](#show-ranges-for-a-database-without-options) +- [Show ranges for a database (with tables, keys, details)](#show-ranges-for-a-database-with-tables-keys-details) +- [Show ranges for a database (with tables)](#show-ranges-for-a-database-with-tables) +- [Show ranges for a database (with indexes)](#show-ranges-for-a-database-with-indexes) +- [Show ranges for a database (with details)](#show-ranges-for-a-database-with-details) +- [Show ranges for a database (with keys)](#show-ranges-for-a-database-with-keys) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW DATABASES; +~~~ + +~~~ + database_name | owner | primary_region | secondary_region | regions | survival_goal +----------------+-------+----------------+------------------+---------+---------------- + defaultdb | root | NULL | NULL | {} | NULL + movr | demo | NULL | NULL | {} | NULL + postgres | root | NULL | NULL | {} | NULL + system | node | NULL | NULL | {} | NULL +(4 rows) +~~~ + +#### Show ranges for a database (without options) {% include_cached copy-clipboard.html %} ~~~ sql -> WITH x as (SHOW RANGES FROM TABLE vehicles) SELECT * FROM x WHERE "start_key" NOT LIKE '%Prefix%'; +SHOW RANGES FROM DATABASE movr; ~~~ + +~~~ + start_key | end_key | range_id | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+----------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + /Table/106 | /Table/106/1/"amsterdam" | 70 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + /Table/106/1/"amsterdam" | /Table/106/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + /Table/111/1/"washington dc"/PrefixEnd | /Max | 309 | {3,5,9} | {"region=us-east1,az=d","region=us-west1,az=b","region=europe-west1,az=d"} | {3,5,9} | {} | {} | NULL +(178 rows) +~~~ + +#### Show ranges for a database (with tables, keys, details) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM DATABASE movr WITH TABLES, KEYS, DETAILS; +~~~ + +~~~ + start_key | end_key | raw_start_key | raw_end_key | range_id | schema_name | table_name | table_id | table_start_key | table_end_key | raw_table_start_key | raw_table_end_key | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until | range_size +-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+-------------+----------------------------+----------+-----------------+----------------------+---------------------+-------------------+----------------------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------+------------- + /Table/106 | /Table/106/1/"amsterdam" | \xf2 | \xf28912616d7374657264616d0001 | 174 | public | users | 106 | /Table/106 | /Table/107 | \xf2 | \xf3 | 0 | 3 | region=us-east1,az=d | {3,6,9} | {"region=us-east1,az=d","region=us-west1,az=c","region=europe-west1,az=d"} | {3,9,6} | {} | {} | NULL | 0 + /Table/106/1/"amsterdam" | /Table/106/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | \xf28912616d7374657264616d0001 | \xf28912616d7374657264616d000112b333333333334000ff8000ff00ff00ff00ff00ff00ff230001 | 175 | public | users | 106 | /Table/106 | /Table/107 | \xf2 | \xf3 | 0.00011900000000000000000 | 3 | region=us-east1,az=d | {3,7,8} | {"region=us-east1,az=d","region=europe-west1,az=b","region=europe-west1,az=c"} | {3,7,8} | {} | {} | NULL | 119 + ... + /Table/111/1/"washington dc"/PrefixEnd | /Max | \xf66f891277617368696e67746f6e2064630002 | \xffff | 295 | public | user_promo_codes | 111 | /Table/111 | /Table/112 | \xf66f | \xf670 | 0 | 8 | region=europe-west1,az=c | {3,4,8} | {"region=us-east1,az=d","region=us-west1,az=a","region=europe-west1,az=c"} | {3,8,4} | {} | {} | NULL | 0 +(145 rows) +~~~ + +#### Show ranges for a database (with tables) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM DATABASE movr WITH TABLES; +~~~ + +~~~ + start_key | end_key | range_id | schema_name | table_name | table_id | table_start_key | table_end_key | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+----------+-------------+----------------------------+----------+-----------------+----------------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + /Table/106 | /Table/106/1/"amsterdam" | 67 | public | users | 106 | /Table/106 | /Table/107 | {1,4,9} | {"region=us-east1,az=b","region=us-west1,az=a","region=europe-west1,az=d"} | {1,9,4} | {} | {} | NULL + /Table/106/1/"amsterdam" | /Table/106/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 68 | public | users | 106 | /Table/106 | /Table/107 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {7,9,8} | {} | {} | NULL + ... + /Table/111/1/"washington dc"/PrefixEnd | /Max | 311 | public | user_promo_codes | 111 | /Table/111 | /Table/112 | {1,5,7} | {"region=us-east1,az=b","region=us-west1,az=b","region=europe-west1,az=b"} | {1,7,5} | {} | {} | NULL +(178 rows) +~~~ + +#### Show ranges for a database (with indexes) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM DATABASE movr WITH INDEXES; +~~~ + +~~~ + start_key | end_key | range_id | schema_name | table_name | table_id | index_name | index_id | index_start_key | index_end_key | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+----------+-------------+----------------------------+----------+-----------------------------------------------+----------+-----------------+---------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + /Table/106 | /Table/106/1/"amsterdam" | 70 | public | users | 106 | users_pkey | 1 | /Table/106/1 | /Table/106/2 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + /Table/106/1/"amsterdam" | /Table/106/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | public | users | 106 | users_pkey | 1 | /Table/106/1 | /Table/106/2 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + /Table/111/1/"washington dc"/PrefixEnd | /Max | 309 | public | user_promo_codes | 111 | user_promo_codes_pkey | 1 | /Table/111/1 | /Table/111/2 | {3,5,9} | {"region=us-east1,az=d","region=us-west1,az=b","region=europe-west1,az=d"} | {3,5,9} | {} | {} | NULL +(179 rows) +~~~ + +#### Show ranges for a database (with details) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM DATABASE movr WITH DETAILS; +~~~ + +~~~ + start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until | range_size +-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+----------+----------------------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------+------------- + /Table/106 | /Table/106/1/"amsterdam" | 70 | 0 | 1 | region=us-east1,az=b | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL | 0 + /Table/106/1/"amsterdam" | /Table/106/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | 0.00011800000000000000000 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL | 118 + ... + /Table/111/1/"washington dc"/PrefixEnd | /Max | 309 | 0 | 9 | region=europe-west1,az=d | {3,5,9} | {"region=us-east1,az=d","region=us-west1,az=b","region=europe-west1,az=d"} | {3,5,9} | {} | {} | NULL | 0 +(178 rows) +~~~ + +#### Show ranges for a database (with keys) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM DATABASE movr WITH KEYS; +~~~ + +~~~ + start_key | end_key | raw_start_key | raw_end_key | range_id | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +-----------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + /Table/106 | /Table/106/1/"amsterdam" | \xf2 | \xf28912616d7374657264616d0001 | 70 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + /Table/106/1/"amsterdam" | /Table/106/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | \xf28912616d7374657264616d0001 | \xf28912616d7374657264616d000112b333333333334000ff8000ff00ff00ff00ff00ff00ff230001 | 71 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + /Table/111/1/"washington dc"/PrefixEnd | /Max | \xf66f891277617368696e67746f6e2064630002 | \xffff | 309 | {3,5,9} | {"region=us-east1,az=d","region=us-west1,az=b","region=europe-west1,az=d"} | {3,5,9} | {} | {} | NULL +(178 rows) +~~~ + +### Show ranges for a table + +- [Show ranges for a table (without options)](#show-ranges-for-a-table-without-options) +- [Show ranges for a table (with indexes, keys, details)](#show-ranges-for-a-table-with-indexes-keys-details) +- [Show ranges for a table (with indexes)](#show-ranges-for-a-table-with-indexes) +- [Show ranges for a table (with details)](#show-ranges-for-a-table-with-details) +- [Show ranges for a table (with keys)](#show-ranges-for-a-table-with-keys) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW TABLES; +~~~ + ~~~ - start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities -+------------------+----------------------------+----------+---------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+ - /"new york" | /"new york"/PrefixEnd | 58 | 0.000304 | 2 | region=us-east1,az=c | {1,2,5} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-west1,az=b"} - /"washington dc" | /"washington dc"/PrefixEnd | 102 | 0.000173 | 2 | region=us-east1,az=c | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} - /"boston" | /"boston"/PrefixEnd | 63 | 0.000288 | 3 | region=us-east1,az=d | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} - /"seattle" | /"seattle"/PrefixEnd | 97 | 0.000295 | 4 | region=us-west1,az=a | {4,5,6} | {"region=us-west1,az=a","region=us-west1,az=b","region=us-west1,az=c"} - /"los angeles" | /"los angeles"/PrefixEnd | 55 | 0.000156 | 5 | region=us-west1,az=b | {4,5,6} | {"region=us-west1,az=a","region=us-west1,az=b","region=us-west1,az=c"} - /"san francisco" | /"san francisco"/PrefixEnd | 71 | 0.000309 | 6 | region=us-west1,az=c | {1,5,6} | {"region=us-east1,az=b","region=us-west1,az=b","region=us-west1,az=c"} - /"amsterdam" | /"amsterdam"/PrefixEnd | 59 | 0.000305 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} - /"paris" | /"paris"/PrefixEnd | 62 | 0.000299 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} - /"rome" | /"rome"/PrefixEnd | 67 | 0.000168 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} -(9 rows) + schema_name | table_name | type | owner | estimated_row_count | locality +--------------+----------------------------+-------+-------+---------------------+----------- + public | promo_codes | table | demo | 1000 | NULL + public | rides | table | demo | 500 | NULL + public | user_promo_codes | table | demo | 5 | NULL + public | users | table | demo | 50 | NULL + public | vehicle_location_histories | table | demo | 1000 | NULL + public | vehicles | table | demo | 15 | NULL +(6 rows) +~~~ + +#### Show ranges for a table (without options) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM TABLE movr.users; +~~~ + +~~~ + start_key | end_key | range_id | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+----------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + …/ | …/1/"amsterdam" | 70 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + …/1/"amsterdam" | …/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + …/1/"washington dc"/PrefixEnd | …/ | 154 | {2,4,7} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=b"} | {2,4,7} | {} | {} | NULL +(27 rows) +~~~ + +#### Show ranges for a table (with indexes, keys, details) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM TABLE movr.users with INDEXES, KEYS, DETAILS; +~~~ + +~~~ + start_key | end_key | raw_start_key | raw_end_key | range_id | index_name | index_id | index_start_key | index_end_key | raw_index_start_key | raw_index_end_key | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until | range_size +--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+------------+----------+-----------------+---------------+---------------------+-------------------+---------------------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------+------------- + …/ | …/1/"amsterdam" | \xf2 | \xf28912616d7374657264616d0001 | 174 | users_pkey | 1 | …/1 | …/2 | \xf289 | \xf28a | 0 | 3 | region=us-east1,az=d | {3,6,9} | {"region=us-east1,az=d","region=us-west1,az=c","region=europe-west1,az=d"} | {3,9,6} | {} | {} | NULL | 0 + …/1/"amsterdam" | …/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | \xf28912616d7374657264616d0001 | \xf28912616d7374657264616d000112b333333333334000ff8000ff00ff00ff00ff00ff00ff230001 | 175 | users_pkey | 1 | …/1 | …/2 | \xf289 | \xf28a | 0.00011900000000000000000 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL | 119 + ... + …/1/"washington dc"/PrefixEnd | …/ | \xf2891277617368696e67746f6e2064630002 | \xf3 | 111 | users_pkey | 1 | …/1 | …/2 | \xf289 | \xf28a | 0 | 9 | region=europe-west1,az=d | {1,5,9} | {"region=us-east1,az=b","region=us-west1,az=b","region=europe-west1,az=d"} | {1,9,5} | {} | {} | NULL | 0 +(27 rows) +~~~ + +#### Show ranges for a table (with indexes) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM TABLE movr.users WITH INDEXES; +~~~ + +~~~ + start_key | end_key | range_id | index_name | index_id | index_start_key | index_end_key | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+----------+------------+----------+-----------------+---------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + …/ | …/1/"amsterdam" | 70 | users_pkey | 1 | …/1 | …/2 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + …/1/"amsterdam" | …/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | users_pkey | 1 | …/1 | …/2 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + …/1/"washington dc"/PrefixEnd | …/ | 154 | users_pkey | 1 | …/1 | …/2 | {2,4,7} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=b"} | {2,4,7} | {} | {} | NULL +(27 rows) +~~~ + +#### Show ranges for a table (with details) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM TABLE movr.users WITH DETAILS; +~~~ + +~~~ + start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until | range_size +--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+----------+----------------------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------+------------- + …/ | …/1/"amsterdam" | 70 | 0 | 1 | region=us-east1,az=b | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL | 0 + …/1/"amsterdam" | …/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | 0.00011800000000000000000 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL | 118 + ... + …/1/"washington dc"/PrefixEnd | …/ | 154 | 0 | 4 | region=us-west1,az=a | {2,4,7} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=b"} | {2,4,7} | {} | {} | NULL | 0 +(27 rows) +~~~ + +#### Show ranges for a table (with keys) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM TABLE movr.users WITH KEYS; +~~~ + +~~~ + start_key | end_key | raw_start_key | raw_end_key | range_id | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +--------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + …/ | …/1/"amsterdam" | \xf2 | \xf28912616d7374657264616d0001 | 70 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + …/1/"amsterdam" | …/1/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | \xf28912616d7374657264616d0001 | \xf28912616d7374657264616d000112b333333333334000ff8000ff00ff00ff00ff00ff00ff230001 | 71 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + …/1/"washington dc"/PrefixEnd | …/ | \xf2891277617368696e67746f6e2064630002 | \xf3 | 154 | {2,4,7} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=b"} | {2,4,7} | {} | {} | NULL +(27 rows) ~~~ ### Show ranges for an index +- [Show ranges for an index (without options)](#show-ranges-for-an-index-without-options) +- [Show ranges for an index (with keys, details)](#show-ranges-for-an-index-with-keys-details) +- [Show ranges for an index (with details)](#show-ranges-for-an-index-with-details) +- [Show ranges for an index (with keys)](#show-ranges-for-an-index-with-keys) + {% include_cached copy-clipboard.html %} ~~~ sql -> WITH x AS (SHOW RANGES FROM INDEX vehicles_auto_index_fk_city_ref_users) SELECT * FROM x WHERE "start_key" NOT LIKE '%Prefix%'; +SHOW INDEXES FROM movr.users; ~~~ + ~~~ - start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities -+------------------+----------------------------+----------+---------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+ - /"washington dc" | /"washington dc"/PrefixEnd | 188 | 0.000089 | 2 | region=us-east1,az=c | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} - /"boston" | /"boston"/PrefixEnd | 141 | 0.000164 | 3 | region=us-east1,az=d | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} - /"new york" | /"new york"/PrefixEnd | 168 | 0.000174 | 3 | region=us-east1,az=d | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} - /"los angeles" | /"los angeles"/PrefixEnd | 165 | 0.000087 | 6 | region=us-west1,az=c | {4,5,6} | {"region=us-west1,az=a","region=us-west1,az=b","region=us-west1,az=c"} - /"san francisco" | /"san francisco"/PrefixEnd | 174 | 0.000183 | 6 | region=us-west1,az=c | {4,5,6} | {"region=us-west1,az=a","region=us-west1,az=b","region=us-west1,az=c"} - /"seattle" | /"seattle"/PrefixEnd | 186 | 0.000166 | 6 | region=us-west1,az=c | {4,5,6} | {"region=us-west1,az=a","region=us-west1,az=b","region=us-west1,az=c"} - /"amsterdam" | /"amsterdam"/PrefixEnd | 137 | 0.00017 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} - /"paris" | /"paris"/PrefixEnd | 170 | 0.000162 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} - /"rome" | /"rome"/PrefixEnd | 172 | 0.00008 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} -(9 rows) + table_name | index_name | non_unique | seq_in_index | column_name | definition | direction | storing | implicit | visible +-------------+------------+------------+--------------+-------------+-------------+-----------+---------+----------+---------- + users | users_pkey | f | 1 | city | city | ASC | f | f | t + users | users_pkey | f | 2 | id | id | ASC | f | f | t + users | users_pkey | f | 3 | name | name | N/A | t | f | t + users | users_pkey | f | 4 | address | address | N/A | t | f | t + users | users_pkey | f | 5 | credit_card | credit_card | N/A | t | f | t +(5 rows) ~~~ -### Show ranges for a database +#### Show ranges for an index (without options) {% include_cached copy-clipboard.html %} ~~~ sql -> WITH x as (SHOW RANGES FROM database movr) SELECT * FROM x WHERE "start_key" NOT LIKE '%Prefix%'; +SHOW RANGES FROM INDEX movr.users_pkey; +~~~ + +~~~ + start_key | end_key | range_id | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+----------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + …/TableMin | …/"amsterdam" | 70 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + …/"amsterdam" | …/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + …/"washington dc"/PrefixEnd | …/ | 154 | {2,4,7} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=b"} | {2,4,7} | {} | {} | NULL +(27 rows) ~~~ + +#### Show ranges for an index (with keys, details) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM INDEX movr.users_pkey WITH KEYS, DETAILS; ~~~ - table_name | start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities -+----------------------------+------------------+----------------------------+----------+---------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+ - users | /"amsterdam" | /"amsterdam"/PrefixEnd | 47 | 0.000562 | 7 | region=europe-west1,az=b | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} - users | /"boston" | /"boston"/PrefixEnd | 51 | 0.000665 | 3 | region=us-east1,az=d | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} - users | /"chicago" | /"los angeles" | 83 | 0 | 4 | region=us-west1,az=a | {2,4,8} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=c"} - users | /"los angeles" | /"los angeles"/PrefixEnd | 45 | 0.000697 | 4 | region=us-west1,az=a | {4,5,6} | {"region=us-west1,az=a","region=us-west1,az=b","region=us-west1,az=c"} - users | /"new york" | /"new york"/PrefixEnd | 48 | 0.000664 | 1 | region=us-east1,az=b | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} - users | /"paris" | /"paris"/PrefixEnd | 52 | 0.000628 | 8 | region=europe-west1,az=c | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} +~~~ + start_key | end_key | raw_start_key | raw_end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until | range_size +------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+---------------------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------+------------- + …/TableMin | …/"amsterdam" | \xf2 | \xf28912616d7374657264616d0001 | 174 | 0 | 3 | region=us-east1,az=d | {3,6,9} | {"region=us-east1,az=d","region=us-west1,az=c","region=europe-west1,az=d"} | {3,9,6} | {} | {} | NULL | 0 + …/"amsterdam" | …/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | \xf28912616d7374657264616d0001 | \xf28912616d7374657264616d000112b333333333334000ff8000ff00ff00ff00ff00ff00ff230001 | 175 | 0.00011900000000000000000 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL | 119 + ... + …/"washington dc"/PrefixEnd | …/ | \xf2891277617368696e67746f6e2064630002 | \xf3 | 111 | 0 | 9 | region=europe-west1,az=d | {1,5,9} | {"region=us-east1,az=b","region=us-west1,az=b","region=europe-west1,az=d"} | {1,9,5} | {} | {} | NULL | 0 +(27 rows) +~~~ + +#### Show ranges for an index (with details) + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM INDEX movr.users_pkey WITH DETAILS; +~~~ + +~~~ + start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until | range_size +------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+----------+----------------------------+--------------+--------------------------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------+------------- + …/TableMin | …/"amsterdam" | 70 | 0 | 1 | region=us-east1,az=b | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL | 0 + …/"amsterdam" | …/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | 71 | 0.00011800000000000000000 | 9 | region=europe-west1,az=d | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL | 118 ... + …/"washington dc"/PrefixEnd | …/ | 154 | 0 | 4 | region=us-west1,az=a | {2,4,7} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=b"} | {2,4,7} | {} | {} | NULL | 0 +(27 rows) +~~~ + +#### Show ranges for an index (with keys) - user_promo_codes | /"washington dc" | /"washington dc"/PrefixEnd | 144 | 0 | 2 | region=us-east1,az=c | {1,2,3} | {"region=us-east1,az=b","region=us-east1,az=c","region=us-east1,az=d"} -(73 rows) +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW RANGES FROM INDEX movr.users_pkey WITH KEYS; +~~~ + +~~~ + start_key | end_key | raw_start_key | raw_end_key | range_id | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until +------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+----------+------------------------------------------------------------------------------------+-----------------+---------------------+------------------+----------------------------- + …/TableMin | …/"amsterdam" | \xf2 | \xf28912616d7374657264616d0001 | 70 | {1,6,8} | {"region=us-east1,az=b","region=us-west1,az=c","region=europe-west1,az=c"} | {1,6,8} | {} | {} | NULL + …/"amsterdam" | …/"amsterdam"/"\xb333333@\x00\x80\x00\x00\x00\x00\x00\x00#" | \xf28912616d7374657264616d0001 | \xf28912616d7374657264616d000112b333333333334000ff8000ff00ff00ff00ff00ff00ff230001 | 71 | {7,8,9} | {"region=europe-west1,az=b","region=europe-west1,az=c","region=europe-west1,az=d"} | {9,7,8} | {} | {} | NULL + ... + …/"washington dc"/PrefixEnd | …/ | \xf2891277617368696e67746f6e2064630002 | \xf3 | 154 | {2,4,7} | {"region=us-east1,az=c","region=us-west1,az=a","region=europe-west1,az=b"} | {2,4,7} | {} | {} | NULL +(27 rows) ~~~ ## See also diff --git a/v23.1/sql-faqs.md b/v23.1/sql-faqs.md index b1cd108f4e2..4c5939d1d48 100644 --- a/v23.1/sql-faqs.md +++ b/v23.1/sql-faqs.md @@ -131,13 +131,11 @@ require('long').fromString(idString).add(1).toString(); // GOOD: returns '235191 {% include {{ page.version.version }}/faq/simulate-key-value-store.html %} -## Does CockroachDB support full text search? +## Does CockroachDB support full-text search? -If you need full text search in a production environment, Cockroach Labs recommends that you use a search engine like [Elasticsearch](https://www.elastic.co/elasticsearch) or [Solr](https://solr.apache.org/). You can use CockroachDB [change data capture (CDC)](change-data-capture-overview.html) to set up a [changefeed](changefeed-messages.html) to keep Elasticsearch and Solr indexes synchronized to your CockroachDB tables. +Yes. For more information, see [Full-Text Search](full-text-search.html). -Depending on your use case, you may be able to get by using [trigram indexes](trigram-indexes.html) to do fuzzy string matching and pattern matching. For more information about use cases for trigram indexes that could make having full text search unnecessary, see the 2022 blog post [Use cases for trigram indexes (when not to use Full Text Search)](https://www.cockroachlabs.com/blog/use-cases-trigram-indexes/). - -For an example showing how to build a simplified full text indexing and search solution using CockroachDB's support for [generalized inverted indexes (also known as GIN indexes)](inverted-indexes.html), see the 2020 blog post [Full Text Indexing and Search in CockroachDB](https://www.cockroachlabs.com/blog/full-text-indexing-search/). +{% include {{ page.version.version }}/sql/use-case-trigram-indexes.md %} ## See also diff --git a/v23.1/sql-feature-support.md b/v23.1/sql-feature-support.md index 2e0777b95c0..fa8b5e5905a 100644 --- a/v23.1/sql-feature-support.md +++ b/v23.1/sql-feature-support.md @@ -87,7 +87,7 @@ table tr td:nth-child(2) { Partial indexes | ✓ | Common Extension | [Partial indexes documentation](partial-indexes.html) Spatial indexes | ✓ | Common Extension | [Spatial indexes documentation](spatial-indexes.html) Multiple indexes per query | Partial | Common Extension | [Index selection](indexes.html#selection) - Full-text indexes | ✗ | Common Extension | [GitHub issue tracking full-text index support](https://github.com/cockroachdb/cockroach/issues/7821) + Full-text indexes | ✓ | Common Extension | [Full-text search documentation](full-text-search.html) Expression indexes | ✓ | Common Extension | [Expression indexes](expression-indexes.html) Prefix indexes | ✗ | Common Extension | Implement using [Expression indexes](expression-indexes.html) Hash indexes | ✗ | Common Extension | Improves performance of queries looking for single, exact values diff --git a/v23.1/string.md b/v23.1/string.md index 9dbf7d5dd01..8f8767aa6ec 100644 --- a/v23.1/string.md +++ b/v23.1/string.md @@ -142,6 +142,8 @@ Type | Details `INTERVAL` | Requires supported [`INTERVAL`](interval.html) string format, e.g., `'1h2m3s4ms5us6ns'`. `TIME` | Requires supported [`TIME`](time.html) string format, e.g., `'01:22:12'` (microsecond precision). `TIMESTAMP` | Requires supported [`TIMESTAMP`](timestamp.html) string format, e.g., `'2016-01-25 10:10:10.555555'`. +`TSQUERY` | Requires supported [`TSQUERY`](tsquery.html) string format, e.g., `'Requires & supported & TSQUERY & string & format'`.
      Note that casting a string to a `TSQUERY` will not normalize the tokens into lexemes. To do so, [use `to_tsquery()`, `plainto_tsquery()`, or `phraseto_tsquery()`](#convert-string-to-tsquery). +`TSVECTOR` | Requires supported [`TSVECTOR`](tsvector.html) string format, e.g., `'Requires supported TSVECTOR string format.'`.
      Note that casting a string to a `TSVECTOR` will not normalize the tokens into lexemes. To do so, [use `to_tsvector()`](#convert-string-to-tsvector). `UUID` | Requires supported [`UUID`](uuid.html) string format, e.g., `'63616665-6630-3064-6465-616462656562'`. ### `STRING` vs. `BYTES` @@ -176,12 +178,12 @@ In this case, [`LENGTH(string)`](functions-and-operators.html#string-and-byte-fu A literal entered through a SQL client will be translated into a different value based on the type: -+ `BYTES` give a special meaning to the pair `\x` at the beginning, and translates the rest by substituting pairs of hexadecimal digits to a single byte. For example, `\xff` is equivalent to a single byte with the value of 255. For more information, see [SQL Constants: String literals with character escapes](sql-constants.html#string-literals-with-character-escapes). ++ `BYTES` gives a special meaning to the pair `\x` at the beginning, and translates the rest by substituting pairs of hexadecimal digits to a single byte. For example, `\xff` is equivalent to a single byte with the value of 255. For more information, see [SQL Constants: String literals with character escapes](sql-constants.html#string-literals-with-character-escapes). + `STRING` does not give a special meaning to `\x`, so all characters are treated as distinct Unicode code points. For example, `\xff` is treated as a `STRING` with length 4 (`\`, `x`, `f`, and `f`). ### Cast hexadecimal digits to `BIT` - You can cast a `STRING` value of hexadecimal digits prefixed by `x` or `X` to a `BIT` value. +You can cast a `STRING` value of hexadecimal digits prefixed by `x` or `X` to a `BIT` value. For example: @@ -243,6 +245,58 @@ For example: (1 row) ~~~ +### Convert `STRING` to `TIMESTAMP` + +You can use the [`parse_timestamp()` function](functions-and-operators.html) to parse strings in `TIMESTAMP` format. + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT parse_timestamp ('2022-05-28T10:53:25.160Z'); +~~~ + +~~~ + parse_timestamp +-------------------------- +2022-05-28 10:53:25.16 +(1 row) +~~~ + +### Convert `STRING` to `TSVECTOR` + +You can use the [`to_tsvector()` function](functions-and-operators.html#full-text-search-functions) to parse strings in [`TSVECTOR`](tsvector.html) format. This will normalize the tokens into lexemes, and will add an integer position to each lexeme. + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsvector('How do trees get on the internet?'); +~~~ + +~~~ + to_tsvector +--------------------------------- + 'get':4 'internet':7 'tree':3 +~~~ + +For more information on usage, see [Full-Text Search](full-text-search.html). + +### Convert `STRING` to `TSQUERY` + +You can use the [`to_tsquery()`, `plainto_tsquery()`, and `phraseto_tsquery()` functions](functions-and-operators.html#full-text-search-functions) to parse strings in [`TSQUERY`](tsquery.html) format. This will normalize the tokens into lexemes. + +When using `to_tsquery()`, the string input must be formatted as a [`TSQUERY`](tsquery.html#syntax), with operators separating tokens. + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsquery('How & do & trees & get & on & the & internet?'); +~~~ + +~~~ + to_tsquery +------------------------------- + 'tree' & 'get' & 'internet' +~~~ + +For more information on usage, see [Full-Text Search](full-text-search.html). + ## See also - [Data Types](data-types.html) diff --git a/v23.1/transaction-retry-error-example.md b/v23.1/transaction-retry-error-example.md index a1be5ee8e10..7080d61bd75 100644 --- a/v23.1/transaction-retry-error-example.md +++ b/v23.1/transaction-retry-error-example.md @@ -5,13 +5,40 @@ toc: true docs_area: reference.transaction_retry_error_example --- -When a [transaction](transactions.html) is unable to complete due to [contention](architecture/overview.html#architecture-overview-contention) with another concurrent or recent transaction attempting to write to the same data, CockroachDB will [automatically attempt to retry the failed transaction](transactions.html#automatic-retries) without involving the client (i.e., silently). If the automatic retry is not possible or fails, a [transaction retry error](transaction-retry-error-reference.html)is emitted to the client. +When a [transaction](transactions.html) is unable to complete due to [contention](performance-best-practices-overview.html#transaction-contention) with another concurrent or recent transaction attempting to write to the same data, CockroachDB will [automatically attempt to retry the failed transaction](transactions.html#automatic-retries) without involving the client (i.e., silently). If the automatic retry is not possible or fails, a [transaction retry error](transaction-retry-error-reference.html) is emitted to the client. This page presents an [example of an application's transaction retry logic](#client-side-retry-handling-example), as well as a manner by which that logic can be [tested and verified](#testing-transaction-retry-logic) against your application's needs. ## Client-side retry handling example -{% include {{page.version.version}}/misc/client-side-intervention-example.md %} +The Python-like pseudocode below shows how to implement an application-level retry loop; it does not require your driver or ORM to implement [advanced retry handling logic](advanced-client-side-transaction-retries.html), so it can be used from any programming language or environment. In particular, your retry loop must: + +- Raise an error if the `max_retries` limit is reached +- Retry on `40001` error codes +- [`COMMIT`](commit-transaction.html) at the end of the `try` block +- Implement [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff) logic as shown below for best performance + +~~~ python +while true: + n++ + if n == max_retries: + throw Error("did not succeed within N retries") + try: + # add logic here to run all your statements + conn.exec('COMMIT') + break + catch error: + if error.code != "40001": + throw error + else: + # This is a retry error, so we roll back the current transaction + # and sleep for a bit before retrying. The sleep time increases + # for each failed transaction. Adapted from + # https://colintemple.com/2017/03/java-exponential-backoff/ + conn.exec('ROLLBACK'); + sleep_ms = int(((2**n) * 100) + rand( 100 - 1 ) + 1) + sleep(sleep_ms) # Assumes your sleep() takes milliseconds +~~~ ## Testing transaction retry logic diff --git a/v23.1/transaction-retry-error-reference.md b/v23.1/transaction-retry-error-reference.md index 1287083fade..af786bd9c0b 100644 --- a/v23.1/transaction-retry-error-reference.md +++ b/v23.1/transaction-retry-error-reference.md @@ -5,7 +5,7 @@ toc: true docs_area: reference.transaction_retry_error_reference --- -When a [transaction](transactions.html) is unable to complete due to [contention](architecture/overview.html#architecture-overview-contention) with another concurrent or recent transaction attempting to write to the same data, CockroachDB will [automatically attempt to retry the failed transaction](transactions.html#automatic-retries) without involving the client (i.e., silently). If the automatic retry is not possible or fails, a _transaction retry error_ is emitted to the client. +When a [transaction](transactions.html) is unable to complete due to [contention](performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention) with another concurrent or recent transaction attempting to write to the same data, CockroachDB will [automatically attempt to retry the failed transaction](transactions.html#automatic-retries) without involving the client (i.e., silently). If the automatic retry is not possible or fails, a _transaction retry error_ is emitted to the client. Transaction retry errors fall into two categories: @@ -26,11 +26,7 @@ The main reason why CockroachDB cannot auto-retry every serialization error with ## Actions to take -In most cases, the correct actions to take when encountering transaction retry errors are: - -1. Update your application to support [client-side retry handling](#client-side-retry-handling) when transaction retry errors are encountered. - -1. Adjust your application logic to [minimize transaction retry errors](#minimize-transaction-retry-errors) in the first place. +{% include {{ page.version.version }}/performance/transaction-retry-error-actions.md %} ### Client-side retry handling @@ -65,32 +61,24 @@ For a conceptual example of application-defined retry logic, and testing that lo ### Minimize transaction retry errors -In addition to the steps described in [Client-side retry handling](#client-side-retry-handling), which detail how to configure your application to restart a failed transaction, there are also a number of changes you can make to your application logic to increase the chance that CockroachDB can [automatically retry](transactions.html#automatic-retries) a failed transaction, and to reduce the number of transaction retry errors that reach the client application in the first place: - -1. Limit the number of affected rows by following [performance-tuning best practices](apply-statement-performance-rules.html) (e.g., query performance tuning, index design and maintenance, etc.). Not only will transactions run faster and hold locks for a shorter duration, but the chances of [read invalidation](architecture/transaction-layer.html#read-refreshing) when the transaction’s [timestamp is pushed](architecture/transaction-layer.html#timestamp-cache) due to a conflicting write is decreased due to a smaller read set (i.e., a smaller number of rows read). - -1. Break down larger transactions into smaller ones (e.g., [bulk deletes](bulk-delete-data.html)) to have transactions hold locks for a shorter duration. This will also decrease the likelihood of [pushed timestamps](architecture/transaction-layer.html#timestamp-cache) and retry errors. For instance, as the size of writes (number of rows written) decreases, the chances of the (bulk delete) transaction’s timestamp getting bumped by concurrent reads decreases. - -1. Design your applications to reduce network round trips by [sending statements in transactions as a single batch](transactions.html#batched-statements) (e.g., using [common table expressions](common-table-expressions.html)). Batching allows CockroachDB to [automatically retry](transactions.html#automatic-retries) a transaction when [previous reads are invalidated](architecture/transaction-layer.html#read-refreshing) at a [pushed timestamp](architecture/transaction-layer.html#timestamp-cache). When a multi-statement transaction is not batched, and takes more than a single round trip, CockroachDB cannot automatically retry the transaction. - -1. Limit the size of the result sets of your transactions to under 16KB, so that CockroachDB is more likely to [automatically retry](transactions.html#automatic-retries) when [previous reads are invalidated](architecture/transaction-layer.html#read-refreshing) at a [pushed timestamp](architecture/transaction-layer.html#timestamp-cache). When a transaction returns a result set over 16KB, even if that transaction has been sent as a single batch, CockroachDB cannot automatically retry the transaction. +In addition to the steps described in [Client-side retry handling](#client-side-retry-handling), which detail how to configure your application to restart a failed transaction, there are also a number of changes you can make to your application logic to reduce the number of transaction retry errors that reach the client application in the first place. -1. Use [`SELECT FOR UPDATE`](select-for-update.html) to aggressively lock rows that will later be updated in the transaction. Locking (blocking) earlier in the transaction will not allow other concurrent write transactions to conflict which leads to a situation where we would return out-of-date information subsequently returning a retry error ([`RETRY_WRITE_TOO_OLD`](#retry_write_too_old)). See [When and why to use `SELECT FOR UPDATE` in CockroachDB](https://www.cockroachlabs.com/blog/when-and-why-to-use-select-for-update-in-cockroachdb/) for more information. +Reduce failed transactions caused by [timestamp pushes](architecture/transaction-layer.html#timestamp-cache) or [read invalidation](architecture/transaction-layer.html#read-refreshing): -1. Use historical reads ([`SELECT ... AS OF SYSTEM TIME`](as-of-system-time.html)), preferably [bounded staleness reads](follower-reads.html#when-to-use-bounded-staleness-reads) or [exact staleness with follower reads](follower-reads.html#run-queries-that-use-exact-staleness-follower-reads) when possible to reduce conflicts with other writes. This reduces the likelihood of conflicts as fewer writes will happen at the historical timestamp. More specifically, writes’ timestamps are less likely to be pushed by historical reads as they would [when the read has a higher priority level](architecture/transaction-layer.html#transaction-conflicts). +{% include {{ page.version.version }}/performance/reduce-contention.md %} -1. If applicable to your workload, assign [column families](column-families.html#default-behavior) and separate columns that are frequently read and written into separate columns. Transactions will operate on disjoint column families and reduce the likelihood of conflicts. +Increase the chance that CockroachDB can [automatically retry](transactions.html#automatic-retries) a failed transaction: -1. As a last resort, consider adjusting the [closed timestamp interval](architecture/transaction-layer.html#closed-timestamps) using the `kv.closed_timestamp.target_duration` [cluster setting](cluster-settings.html) to reduce the likelihood of long-running write transactions having their [timestamps pushed](architecture/transaction-layer.html#timestamp-cache). This setting should be carefully adjusted if **no other mitigations are available** because there can be downstream implications (e.g., Historical reads, change data capture feeds, Stats collection, handling zone configurations, etc.). For example, a transaction _A_ is forced to refresh (i.e., change its timestamp) due to hitting the maximum [_closed timestamp_](architecture/transaction-layer.html#closed-timestamps) interval (closed timestamps enable [Follower Reads](follower-reads.html#how-stale-follower-reads-work) and [Change Data Capture (CDC)](change-data-capture-overview.html)). This can happen when transaction _A_ is a long-running transaction, and there is a write by another transaction to data that _A_ has already read. See the reference entry for [`RETRY_SERIALIZABLE`](#retry_serializable) for more information. +{% include {{ page.version.version }}/performance/increase-server-side-retries.md %} ## Transaction retry error reference -Note that your application's retry logic does not need to distinguish between the different types of serialization errors. They are listed here for reference during advanced troubleshooting. +Note that your application's retry logic does not need to distinguish between the different types of serialization errors. They are listed here for reference during [advanced troubleshooting](performance-recipes.html#transaction-contention). - [RETRY_WRITE_TOO_OLD](#retry_write_too_old) - [RETRY_SERIALIZABLE](#retry_serializable) - [RETRY_ASYNC_WRITE_FAILURE](#retry_async_write_failure) -- [ReadWithinUncertaintyInterval](#readwithinuncertaintyinterval) +- [ReadWithinUncertaintyIntervalError](#readwithinuncertaintyintervalerror) - [RETRY_COMMIT_DEADLINE_EXCEEDED](#retry_commit_deadline_exceeded) - [ABORT_REASON_ABORTED_RECORD_FOUND](#abort_reason_aborted_record_found) - [ABORT_REASON_CLIENT_REJECT](#abort_reason_client_reject) @@ -182,7 +170,7 @@ The `RETRY_ASYNC_WRITE_FAILURE` error occurs when some kind of problem with your See [Minimize transaction retry errors](#minimize-transaction-retry-errors) for the full list of recommended remediations. -### ReadWithinUncertaintyInterval +### ReadWithinUncertaintyIntervalError ``` TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: @@ -213,7 +201,7 @@ The solution is to do one of the following: 1. If you [trust your clocks](operational-faqs.html#what-happens-when-node-clocks-are-not-properly-synchronized), you can try lowering the [`--max-offset` option to `cockroach start`](cockroach-start.html#flags), which provides an upper limit on how long a transaction can continue to restart due to uncertainty. {{site.data.alerts.callout_info}} -Uncertainty errors are a form of transaction conflict. For more information about transaction conflicts, see [Transaction conflicts](architecture/transaction-layer.html#transaction-conflicts). +Uncertainty errors are a sign of transaction conflict. For more information about transaction conflicts, see [Transaction conflicts](architecture/transaction-layer.html#transaction-conflicts). {{site.data.alerts.end}} See [Minimize transaction retry errors](#minimize-transaction-retry-errors) for the full list of recommended remediations. diff --git a/v23.1/transactions.md b/v23.1/transactions.md index ed4fc980fc0..6837d917ec5 100644 --- a/v23.1/transactions.md +++ b/v23.1/transactions.md @@ -57,29 +57,28 @@ To handle errors in transactions, you should check for the following types of se Type | Description -----|------------ -**Retry Errors** | Errors with [the code `40001` or string `restart transaction`](common-errors.html#restart-transaction), which indicate that a transaction failed because it could not be placed in a serializable ordering of transactions by CockroachDB. This is often due to [contention](#transaction-contention): conflicts with another concurrent or recent transaction accessing the same data. In such cases, the transaction needs to be retried by the client as described in [client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling). For a reference listing all of the retry error codes emitted by CockroachDB, see the [Transaction Retry Error Reference](transaction-retry-error-reference.html#transaction-retry-error-reference). +**Transaction Retry Errors** | Errors with the code `40001` and string `restart transaction`, which indicate that a transaction failed because it could not be placed in a [serializable ordering](demo-serializable.html) of transactions by CockroachDB. For details on transaction retry errors and how to resolve them, see the [Transaction Retry Error Reference](transaction-retry-error-reference.html#actions-to-take). **Ambiguous Errors** | Errors with the code `40003` which indicate that the state of the transaction is ambiguous, i.e., you cannot assume it either committed or failed. How you handle these errors depends on how you want to resolve the ambiguity. For information about how to handle ambiguous errors, see [here](common-errors.html#result-is-ambiguous). **SQL Errors** | All other errors, which indicate that a statement in the transaction failed. For example, violating the `UNIQUE` constraint generates a `23505` error. After encountering these errors, you can either issue a [`COMMIT`](commit-transaction.html) or [`ROLLBACK`](rollback-transaction.html) to abort the transaction and revert the database to its state before the transaction began.

      If you want to attempt the same set of statements again, you must begin a completely new transaction. ## Transaction retries -Transactions may require retries if they experience deadlock or [read/write contention](performance-best-practices-overview.html#transaction-contention) with other concurrent transactions which cannot be resolved without allowing potential [serializable anomalies](https://en.wikipedia.org/wiki/Serializability). - -To mitigate read-write contention and reduce the need for transaction retries, use the following techniques: - -- Perform reads using [`AS OF SYSTEM TIME`](performance-best-practices-overview.html#use-as-of-system-time-to-decrease-conflicts-with-long-running-queries). -- Use [`SELECT FOR UPDATE`](select-for-update.html) to order transactions by controlling concurrent access to one or more rows of a table. This reduces retries in scenarios where a transaction performs a read and then updates the same row it just read. +Transactions may require retries due to [contention](performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention) with another concurrent or recent transaction attempting to write to the same data. There are two cases in which transaction retries can occur: -- [Automatic retries](#automatic-retries), which CockroachDB processes for you. -- [Client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling), which your application must handle. +- [Automatic retries](#automatic-retries), which CockroachDB silently processes for you. +- [Client-side retries](transaction-retry-error-reference.html#client-side-retry-handling), which your application must handle after receiving a [*transaction retry error*](transaction-retry-error-reference.html). + +To reduce the need for transaction retries, see [Reduce transaction contention](performance-best-practices-overview.html#reduce-transaction-contention). ### Automatic retries -CockroachDB automatically retries individual statements (implicit transactions) and transactions sent from the client as a single batch, as long as the size of the results being produced for the client, including protocol overhead, is less than 16KiB by default. Once that buffer overflows, CockroachDB starts streaming results back to the client, at which point automatic retries cannot be performed any more. As long as the results of a single statement or batch of statements are known to stay clear of this limit, the client does not need to worry about transaction retries. +CockroachDB automatically retries individual statements (implicit transactions) and [transactions sent from the client as a single batch](#batched-statements), as long as the size of the results being produced for the client, including protocol overhead, is less than 16KiB by default. Once that buffer overflows, CockroachDB starts streaming results back to the client, at which point automatic retries cannot be performed any more. As long as the results of a single statement or batch of statements are known to stay clear of this limit, the client does not need to worry about transaction retries. -You can change the results buffer size for all new sessions using the `sql.defaults.results_buffer.size` [cluster setting](cluster-settings.html), or for a specific session using the `results_buffer_size` [session variable](set-vars.html). Decreasing the buffer size can increase the number of transaction retry errors a client receives, whereas increasing the buffer size can increase the delay until the client receives the first result row. +You can increase the occurrence of automatic retries as a way to [minimize transaction retry errors](transaction-retry-error-reference.html#minimize-transaction-retry-errors): + +{% include {{ page.version.version }}/performance/increase-server-side-retries.md %} {% include {{page.version.version}}/sql/sql-defaults-cluster-settings-deprecation-notice.md %} @@ -134,31 +133,6 @@ The [`enable_implicit_transaction_for_batch_statements` session variable](set-va In the event [bounded staleness reads](follower-reads.html#bounded-staleness-reads) are used along with either the [`with_min_timestamp` function or the `with_max_staleness` function](functions-and-operators.html#date-and-time-functions) and the `nearest_only` parameter is set to `true`, the query will throw an error if it can't be served by a nearby replica. -### Client-side intervention - -Your application should include client-side retry handling when the statements are sent individually, such as: - -~~~ -> BEGIN; - -> UPDATE products SET inventory = 0 WHERE sku = '8675309'; - -> INSERT INTO orders (customer, status) VALUES (1, 'new'); - -> COMMIT; -~~~ - -To indicate that a transaction must be retried, CockroachDB signals an error with the `SQLSTATE` error code `40001` (serialization error) and an error message that begins with the string `"restart transaction"`. These errors **cannot** be [resolved automatically](#automatic-retries), and require client-side intervention: - -- See [client-side retry handling](transaction-retry-error-reference.html#client-side-retry-handling) for a list of actions to take when encountering transaction retry errors. -- See [Transaction retry error reference](transaction-retry-error-reference.html#transaction-retry-error-reference) for a complete list of transaction retry error codes. - -## Transaction contention - -Transactions in CockroachDB [lock](crdb-internal.html#cluster_locks) data resources that are written during their execution. When a pending write from one transaction conflicts with a write of a concurrent transaction, the concurrent transaction must wait for the earlier transaction to complete before proceeding. When a dependency cycle is detected between transactions, the transaction with the higher priority aborts the dependent transaction to avoid deadlock, which must be [retried](#client-side-intervention). - -For more details about transaction contention and best practices for avoiding contention, see [Transaction Contention](performance-best-practices-overview.html#transaction-contention). - ## Nested transactions CockroachDB supports the nesting of transactions using [savepoints](savepoint.html). These nested transactions are also known as sub-transactions. Nested transactions can be rolled back without discarding the state of the entire surrounding transaction. diff --git a/v23.1/tsquery.md b/v23.1/tsquery.md new file mode 100644 index 00000000000..e1d5d4827e8 --- /dev/null +++ b/v23.1/tsquery.md @@ -0,0 +1,63 @@ +--- +title: TSQUERY +summary: The TSQUERY data type a list of lexemes separated by operators, and is used in full-text search. +toc: true +docs_area: reference.sql +--- + +The `TSQUERY` [data type](data-types.html) stores a list of lexemes separated by operators. `TSQUERY` values are used in [full-text search](full-text-search.html). + +## Syntax + +A `TSQUERY` comprises individual lexemes and operators in the form: `'These' & 'lexemes' & 'are' & 'not' & 'normalized' & 'lexemes.'`. + +The operators in a `TSQUERY` are used to [match a `TSQUERY` to a `TSVECTOR`](full-text-search.html#match-queries-to-documents). Valid `TSQUERY` operators are: + +- `&` (AND). Given `'one' & 'two'`, both `one` and `two` must be present in the matching `TSVECTOR`. +- `|` (OR). Given `'one' | 'two'`, either `one` or `two` must be present in the matching `TSVECTOR`. +- `!` (NOT). Given `'one' & ! 'two'`, `one` must be present and `two` must **not** be present in the matching `TSVECTOR`. +- `<->` (FOLLOWED BY). Given `'one' <-> 'two'`, `one` must be followed by `two` in the matching `TSVECTOR`. + - `<->` is equivalent to `<1>`. You can specify an integer `` to indicate that lexemes must be separated by `n-1` other lexemes. Given `'one' <4> 'two'`, `one` must be followed by three lexemes and then followed by `two` in the matching `TSVECTOR`. + +You can optionally add the following to each lexeme: + +- One or more weight letters (`A`, `B`, `C`, or `D`): + + `'These' & 'lexemes':B & 'are' & 'not' & 'normalized':A & 'lexemes':B` + + If not specified, a lexeme's weight defaults to `D`. It is only necessary to specify weights in a `TSQUERY` if they are also [specified in a `TSVECTOR`](tsvector.html#syntax) to be used in a comparison. The lexemes in a `TSQUERY` and `TSVECTOR` will only match if they have matching weights. For more information about weights, see the [PostgreSQL documentation](https://www.postgresql.org/docs/15/datatype-textsearch.html#DATATYPE-TSQUERY). + +To be usable in [full-text search](full-text-search.html), the lexemes **must be normalized**. You can do this by using the `to_tsquery()`, `plainto_tsquery()`, or `phraseto_tsquery()` [built-in functions](functions-and-operators.html#full-text-search-functions) to convert a string input to `TSQUERY`: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsquery('These & lexemes & are & not & normalized & lexemes.'); +~~~ + +~~~ + to_tsquery +-------------------------------- + 'lexem' & 'normal' & 'lexem' +~~~ + +Normalization removes the following from the input: + +- Derivatives of words, which are reduced using a [stemming](https://en.wikipedia.org/wiki/Stemming) algorithm. +- *Stop words*. These are words that are considered not useful for indexing and searching, based on the [text search configuration](full-text-search.html#text-search-configuration). In the preceding example, "These", "are", and "not" are identified as stop words. +- Punctuation and capitalization. + +{% comment %} +## PostgreSQL compatibility + +`TSQUERY` values in CockroachDB are fully [PostgreSQL-compatible](https://www.postgresql.org/docs/15/datatype-textsearch.html#DATATYPE-TSQUERY) for [full-text search](full-text-search.html). +{% endcomment %} + +## Examples + +For usage examples, see [Full-Text Search](full-text-search.html). + +## See also + +- [Full-Text Search](full-text-search.html) +- [`TSVECTOR`](tsvector.html) +- [Data Types](data-types.html) \ No newline at end of file diff --git a/v23.1/tsvector.md b/v23.1/tsvector.md new file mode 100644 index 00000000000..fc955b08b03 --- /dev/null +++ b/v23.1/tsvector.md @@ -0,0 +1,61 @@ +--- +title: TSVECTOR +summary: The TSVECTOR data type stores a list of lexemes, optionally with integer positions and weights, and is used in full-text search. +toc: true +docs_area: reference.sql +--- + +The `TSVECTOR` [data type](data-types.html) stores a list of lexemes, optionally with integer positions and weights. `TSVECTOR` values are used in [full-text search](full-text-search.html). + +## Syntax + +A `TSVECTOR` comprises individual lexemes in the form: `'These' 'lexemes' 'are' 'not' 'normalized' 'lexemes.'`. + +You can optionally add the following to each lexeme: + +- One or more comma-separated integer positions: + + `'These':1 'lexemes':2 'are':3 'not':4 'normalized':5 'lexemes.':6` + +- A weight letter (`A`, `B`, `C`, or `D`), in combination with an integer position: + + `'These':1 'lexemes':2B 'are':3 'not':4 'normalized':5A 'lexemes.':6B` + + If not specified, a lexeme's weight defaults to `D`. The lexemes in a `TSQUERY` and `TSVECTOR` will only match if they have matching weights. For more information about weights, see the [PostgreSQL documentation](https://www.postgresql.org/docs/15/datatype-textsearch.html#DATATYPE-TSVECTOR). + +To be usable in [full-text search](full-text-search.html), the lexemes **must be normalized**. You can do this by using the `to_tsvector()` [built-in function](functions-and-operators.html#full-text-search-functions) to convert a string input to `TSVECTOR`: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT to_tsvector('These lexemes are not normalized lexemes.'); +~~~ + +~~~ + to_tsvector +-------------------------- + 'lexem':2,6 'normal':5 +~~~ + +Normalization removes the following from the input: + +- Derivatives of words, which are reduced using a [stemming](https://en.wikipedia.org/wiki/Stemming) algorithm. +- *Stop words*. These are words that are considered not useful for indexing and searching, based on the [text search configuration](full-text-search.html#text-search-configuration). In the preceding example, "These", "are", and "not" are identified as stop words. +- Punctuation and capitalization. + +In the preceding output, the integers indicate that `normal` is in the fifth position and `lexem` is in both the second and sixth position in the input. + +{% comment %} +## PostgreSQL compatibility + +`TSVECTOR` values in CockroachDB are fully [PostgreSQL-compatible](https://www.postgresql.org/docs/15/datatype-textsearch.html#DATATYPE-TSVECTOR) for [full-text search](full-text-search.html). +{% endcomment %} + +## Examples + +For usage examples, see [Full-Text Search](full-text-search.html). + +## See also + +- [Full-Text Search](full-text-search.html) +- [`TSQUERY`](tsquery.html) +- [Data Types](data-types.html) \ No newline at end of file diff --git a/v23.1/ui-overview-dashboard.md b/v23.1/ui-overview-dashboard.md index 09cd0b6d480..8aa015428ac 100644 --- a/v23.1/ui-overview-dashboard.md +++ b/v23.1/ui-overview-dashboard.md @@ -31,9 +31,9 @@ See the [Statements page](ui-statements-page.html) for more details on the clust The statement contention metric is a counter that represents the number of statements that have experienced [contention](performance-best-practices-overview.html#transaction-contention). If a statement experiences at least one contention "event" (i.e., the statement is forced to wait for another transaction), the counter is incremented at most once. -- In the node view, the graph shows the total number of SQL statements that experienced [contention](transactions.html#transaction-contention) on that node. +- In the node view, the graph shows the total number of SQL statements that experienced [contention](performance-best-practices-overview.html#transaction-contention) on that node. -- In the cluster view, the graph shows the total number of SQL statements that experienced [contention](transactions.html#transaction-contention) across all nodes in the cluster. +- In the cluster view, the graph shows the total number of SQL statements that experienced [contention](performance-best-practices-overview.html#transaction-contention) across all nodes in the cluster. See the [Statements page](ui-statements-page.html) for more details on the cluster's SQL statements. diff --git a/v23.1/ui-sql-dashboard.md b/v23.1/ui-sql-dashboard.md index 48767dfe1af..64161472609 100644 --- a/v23.1/ui-sql-dashboard.md +++ b/v23.1/ui-sql-dashboard.md @@ -65,9 +65,9 @@ See the [Statements page](ui-statements-page.html) for more details on the clust The statement contention metric is a counter that represents the number of statements that have experienced [contention](performance-best-practices-overview.html#transaction-contention). If a statement experiences at least one contention "event" (i.e., the statement is forced to wait for another transaction), the counter is incremented at most once. -- In the node view, the graph shows the total number of SQL statements that experienced [contention](transactions.html#transaction-contention) on that node. +- In the node view, the graph shows the total number of SQL statements that experienced [contention](performance-best-practices-overview.html#transaction-contention) on that node. -- In the cluster view, the graph shows the total number of SQL statements that experienced [contention](transactions.html#transaction-contention) across all nodes in the cluster. +- In the cluster view, the graph shows the total number of SQL statements that experienced [contention](performance-best-practices-overview.html#transaction-contention) across all nodes in the cluster. See the [Statements page](ui-statements-page.html) for more details on the cluster's SQL statements.