Merge remote-tracking branch 'upstream/master' into add_v6.4_release_…

…notes
pingcap · Nov 16, 2022 · 3f2bbaf · 3f2bbaf
2 parents d41b93f + fadbfa6
commit 3f2bbaf
Show file tree

Hide file tree

Showing 197 changed files with 1,130 additions and 691 deletions.
diff --git a/.github/ISSUE_TEMPLATE/change-request.md b/.github/ISSUE_TEMPLATE/change-request.md
@@ -17,6 +17,4 @@ Please answer the following questions before submitting your issue. Thanks!
 2. Describe your suggestion or addition.
 
 
-3. Provide some reference materials (documents, websites, etc) if you could.
-
-
+3. Provide some reference materials (such as documents and websites) if you could.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -19,6 +19,7 @@ By default, **CHOOSE MASTER ONLY** so your changes will be applied to the next T
 For details, see [tips for choosing the affected versions](https://github.com/pingcap/docs/blob/master/CONTRIBUTING.md#guideline-for-choosing-the-affected-versions).
 
 - [ ] master (the latest development version)
+- [ ] v6.5 (TiDB 6.5 versions)
 - [ ] v6.4 (TiDB 6.4 versions)
 - [ ] v6.3 (TiDB 6.3 versions)
 - [ ] v6.1 (TiDB 6.1 versions)

diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md
@@ -164,9 +164,11 @@
      - [Overview](/garbage-collection-overview.md)
      - [Configuration](/garbage-collection-configuration.md)
   - [Tune TiFlash performance](/tiflash/tune-tiflash-performance.md)
-- Manage User Access
+- Security
   - [Manage Console User Access](/tidb-cloud/manage-user-access.md)
   - [Configure Cluster Security Settings](/tidb-cloud/configure-security-settings.md)
+  - [Database Audit Logging](/tidb-cloud/tidb-cloud-auditing.md)
+  - [Secure Connections to Serverless Tier Clusters](/tidb-cloud/secure-connections-to-serverless-tier-clusters.md)
 - Billing
   - [Invoices](/tidb-cloud/tidb-cloud-billing.md#invoices)
   - [Billing Details](/tidb-cloud/tidb-cloud-billing.md#billing-details)
@@ -422,7 +424,6 @@
   - [Dumpling](/dumpling-overview.md)
   - [Table Filter](/table-filter.md)
   - [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md)
-  - [Secure Connections to Serverless Tier Clusters](/tidb-cloud/secure-connections-to-serverless-tier-clusters.md)
 - [FAQs](/tidb-cloud/tidb-cloud-faq.md)
 - Release Notes
   - [2022](/tidb-cloud/release-notes-2022.md)

diff --git a/TOC.md b/TOC.md
@@ -97,7 +97,7 @@
     - [Hybrid Topology](/hybrid-deployment-topology.md)
   - Install and Start
     - [Use TiUP](/production-deployment-using-tiup.md)
-    - [Deploy in Kubernetes](/tidb-in-kubernetes.md)
+    - [Deploy on Kubernetes](/tidb-in-kubernetes.md)
   - [Verify Cluster Status](/post-installation-check.md)
   - Test Cluster Performance
     - [Test TiDB Using Sysbench](/benchmark/benchmark-tidb-using-sysbench.md)
@@ -142,7 +142,7 @@
   - [Daily Checklist](/daily-check.md)
   - [Maintain TiFlash](/tiflash/maintain-tiflash.md)
   - [Maintain TiDB Using TiUP](/maintain-tidb-using-tiup.md)
-  - [Modify Configuration Online](/dynamic-config.md)
+  - [Modify Configuration Dynamically](/dynamic-config.md)
   - [Online Unsafe Recovery](/online-unsafe-recovery.md)
   - [Replicate Data Between Primary and Secondary Clusters](/replicate-between-primary-and-secondary-clusters.md)
 - Monitor and Alert
@@ -632,6 +632,7 @@
       - [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md)
       - [`EXPLAIN`](/sql-statements/sql-statement-explain.md)
       - [`FLASHBACK CLUSTER TO TIMESTAMP`](/sql-statements/sql-statement-flashback-to-timestamp.md)
+      - [`FLASHBACK DATABASE`](/sql-statements/sql-statement-flashback-database.md)
       - [`FLASHBACK TABLE`](/sql-statements/sql-statement-flashback-table.md)
       - [`FLUSH PRIVILEGES`](/sql-statements/sql-statement-flush-privileges.md)
       - [`FLUSH STATUS`](/sql-statements/sql-statement-flush-status.md)

diff --git a/_index.md b/_index.md
@@ -43,7 +43,7 @@ hide_commit: true
 
 [Deploy a TiDB Cluster Using TiUP](https://docs.pingcap.com/tidb/dev/production-deployment-using-tiup)
 
-[Deploy a TiDB Cluster in Kubernetes](https://docs.pingcap.com/tidb/dev/tidb-in-kubernetes)
+[Deploy a TiDB Cluster on Kubernetes](https://docs.pingcap.com/tidb/dev/tidb-in-kubernetes)
 
 </LearningPath>
 

diff --git a/alert-rules.md b/alert-rules.md
@@ -260,6 +260,20 @@ This section gives the alert rules for the PD component.
     * If you confirm that the TiKV/TiFlash instance cannot be recovered, you can make it offline.
     * If you confirm that the TiKV/TiFlash instance can be recovered, but not in the short term, you can consider increasing the value of `max-down-time`. It will prevent the TiKV/TiFlash instance from being considered as irrecoverable and the data from being removed from the TiKV/TiFlash.
 
+#### `PD_cluster_unhealthy_tikv_nums`
+
+* Alert rule:
+
+    `(sum(pd_cluster_status{type="store_unhealth_count"}) by (instance) > 0) and (sum(etcd_server_is_leader) by (instance) > 0)`
+
+* Description:
+
+    Indicates that there are unhealthy stores. If the situation persists for some time (configured by [`max-store-down-time`](/pd-configuration-file.md#max-store-down-time), defaults to `30m`), the store is likely to change to `Offline` state, which triggers the [`PD_cluster_down_store_nums`](#pd_cluster_down_store_nums) alert.
+
+* Solution:
+
+    Check the state of the TiKV stores.
+
 #### `PD_cluster_low_space`
 
 * Alert rule:
@@ -274,7 +288,7 @@ This section gives the alert rules for the PD component.
 
     * Check whether the space in the cluster is generally insufficient. If so, increase its capacity.
     * Check whether there is any issue with Region balance scheduling. If so, it will lead to uneven data distribution.
-    * Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, core dump, etc.
+    * Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, and core dump.
     * Lower the Region weight of the node to reduce the data volume.
     * When it is not possible to release the space, consider proactively making the node offline. This prevents insufficient disk space that leads to downtime.
 
@@ -353,7 +367,7 @@ This section gives the alert rules for the PD component.
 
 * Solution:
 
-    * Exclude the human factors, such as restarting PD, manually transferring leader, adjusting leader priority, etc.
+    * Exclude the human factors, such as restarting PD, manually transferring leader, and adjusting leader priority.
     * Check the network and system load status.
     * If the problematic PD instance cannot be recovered due to environmental factors, make it offline and replace it.
 
@@ -370,7 +384,7 @@ This section gives the alert rules for the PD component.
 * Solution:
 
     * Check whether it is needed to increase capacity.
-    * Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, core dump, etc.
+    * Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, and core dump.
 
 #### `PD_system_time_slow`
 
@@ -1140,4 +1154,4 @@ This section gives the alert rules for the Blackbox_exporter TCP, ICMP, and HTTP
 * Solution:
 
     * View the ping latency between the two nodes on the Grafana Blackbox Exporter page to check whether it is too high.
-    * Check the TCP panel on the Grafana Node Exporter page to check whether there is any packet loss.
+    * Check the TCP panel on the Grafana Node Exporter page to check whether there is any packet loss.
diff --git a/benchmark/benchmark-tidb-using-sysbench.md b/benchmark/benchmark-tidb-using-sysbench.md
@@ -5,7 +5,7 @@ aliases: ['/docs/dev/benchmark/benchmark-tidb-using-sysbench/','/docs/dev/benchm
 
 # How to Test TiDB Using Sysbench
 
-It is recommended to use Sysbench 1.0 or later, which can be [downloaded here](https://github.com/akopytov/sysbench/releases/tag/1.0.14).
+It is recommended to use Sysbench 1.0 or later, which can be [downloaded here](https://github.com/akopytov/sysbench/releases/tag/1.0.20).
 
 ## Test plan
 
@@ -19,6 +19,8 @@ server_configs:
     log.level: "error"
 ```
 
+It is also recommended to make sure [`tidb_enable_prepared_plan_cache`](/system-variables.md#tidb_enable_prepared_plan_cache-new-in-v610) is enabled and that you allow sysbench to use prepared statements by _not_ using `--db-ps-mode=disabled`. See the [SQL Prepared Execution Plan Cache](/sql-prepared-plan-cache.md) for documetnation about what the SQL plan cache does and how to monitor it.
+
 ### TiKV configuration
 
 Higher log level also means better performance for TiKV.
@@ -109,10 +111,10 @@ Restart MySQL client and execute the following SQL statement to create a databas
 create database sbtest;
 ```
 
-Adjust the order in which Sysbench scripts create indexes. Sysbench imports data in the order of "Build Table -> Insert Data -> Create Index", which takes more time for TiDB to import data. Users can adjust the order to speed up the import of data. Suppose that you use the Sysbench version [1.0.14](https://github.com/akopytov/sysbench/tree/1.0.14). You can adjust the order in either of the following two ways:
+Adjust the order in which Sysbench scripts create indexes. Sysbench imports data in the order of "Build Table -> Insert Data -> Create Index", which takes more time for TiDB to import data. Users can adjust the order to speed up the import of data. Suppose that you use the Sysbench version [1.0.20](https://github.com/akopytov/sysbench/tree/1.0.20). You can adjust the order in either of the following two ways:
 
 - Download the modified [oltp_common.lua](https://raw.githubusercontent.com/pingcap/tidb-bench/master/sysbench/sysbench-patch/oltp_common.lua) file for TiDB and overwrite the `/usr/share/sysbench/oltp_common.lua` file with it.
-- In `/usr/share/sysbench/oltp_common.lua`, move the lines [235](https://github.com/akopytov/sysbench/blob/1.0.14/src/lua/oltp_common.lua#L235)-[240](https://github.com/akopytov/sysbench/blob/1.0.14/src/lua/oltp_common.lua#L240) to be right behind the line 198.
+- In `/usr/share/sysbench/oltp_common.lua`, move the lines [235-240](https://github.com/akopytov/sysbench/blob/1.0.20/src/lua/oltp_common.lua#L235-L240) to be right behind the line 198.
 
 > **Note:**
 >
@@ -130,22 +132,8 @@ sysbench --config-file=config oltp_point_select --tables=32 --table-size=1000000
 
 To warm data, we load data from disk into the block cache of memory. The warmed data has significantly improved the overall performance of the system. It is recommended to warm data once after restarting the cluster.
 
-Sysbench 1.0.14 does not provide data warming, so it must be done manually. If you are using [Sysbench of the master version](https://github.com/akopytov/sysbench/tree/master), you can use the data warming feature included in the tool itself.
-
-Take a table sbtest7 in Sysbench as an example. Execute the following SQL to warming up data:
-
-{{< copyable "sql" >}}
-
-```sql
-SELECT COUNT(pad) FROM sbtest7 USE INDEX (k_7);
-```
-
-Collecting statistics helps the optimizer choose a more accurate execution plan. The `analyze` command can be used to collect statistics on the table sbtest. Each table needs statistics.
-
-{{< copyable "sql" >}}
-
-```sql
-ANALYZE TABLE sbtest7;
+```bash
+sysbench --config-file=config oltp_point_select --tables=32 --table-size=10000000 warmup
 ```
 
 ### Point select test command

diff --git a/benchmark/online-workloads-and-add-index-operations.md b/benchmark/online-workloads-and-add-index-operations.md
@@ -345,5 +345,5 @@ When the target column of the `ADD INDEX` statement is irrelevant to online work
 
 ## Summary
 
-- When you perform frequent write operations (including `INSERT`, `DELETE` and `UPDATE` operations) to the target column of the `ADD INDEX` statement, the default `ADD INDEX` configuration  causes relatively frequent write conflicts, which has a great impact on online workloads. At the same time, the `ADD INDEX` operation takes a long time to complete due to continuous retry attempts. In this test, you can modify the product of `tidb_ddl_reorg_worker_cnt` and `tidb_ddl_reorg_batch_size` to 1/32 of the default value. For example, you can set `tidb_ddl_reorg_worker_cnt` to `4` and `tidb_ddl_reorg_batch_size` to `256` for better performance.
+- When you perform frequent write operations (including `INSERT`, `DELETE` and `UPDATE` operations) to the target column of the `ADD INDEX` statement, the default `ADD INDEX` configuration causes relatively frequent write conflicts, which has a great impact on online workloads. At the same time, the `ADD INDEX` operation takes a long time to complete due to continuous retry attempts. In this test, you can modify the product of `tidb_ddl_reorg_worker_cnt` and `tidb_ddl_reorg_batch_size` to 1/32 of the default value. For example, you can set `tidb_ddl_reorg_worker_cnt` to `4` and `tidb_ddl_reorg_batch_size` to `256` for better performance.
 - When only performing query operations to the target column of the `ADD INDEX` statement or the target column is not directly related to online workloads, you can use the default `ADD INDEX` configuration.
diff --git a/best-practices/high-concurrency-best-practices.md b/best-practices/high-concurrency-best-practices.md
@@ -18,7 +18,7 @@ This document assumes that you have a basic understanding of TiDB. It is recomme
 
 ## Highly-concurrent write-intensive scenario
 
-The highly concurrent write scenario often occurs when you perform batch tasks in applications, such as clearing, settlement and so on. This scenario has the following features:
+The highly concurrent write scenario often occurs when you perform batch tasks in applications, such as clearing and settlement. This scenario has the following features:
 
 + A huge volume of data
 + The need to import historical data into database in a short time

diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md
@@ -90,7 +90,7 @@ For hot write regions, `hot-region-scheduler` attempts to redistribute both regi
 
 Cluster topology awareness enables PD to distribute replicas of a region as much as possible. This is how TiKV ensures high availability and disaster recovery capability. PD continuously scans all regions in the background. When PD finds that the distribution of regions is not optimal, it generates an operator to replace peers and redistribute regions.
 
-The component to check region distribution is `replicaChecker`, which is similar to a scheduler except that it cannot be disabled. `replicaChecker` schedules based on the the configuration of `location-labels`. For example, `[zone,rack,host]` defines a three-tier topology for a cluster. PD attempts to schedule region peers to different zones first, or to different racks when zones are insufficient (for example, 2 zones for 3 replicas), or to different hosts when racks are insufficient, and so on.
+The component to check region distribution is `replicaChecker`, which is similar to a scheduler except that it cannot be disabled. `replicaChecker` schedules based on the the configuration of `location-labels`. For example, `[zone,rack,host]` defines a three-tier topology for a cluster. PD attempts to schedule region peers to different zones first, or to different racks when zones are insufficient (for example, 2 zones for 3 replicas), or to different hosts when racks are insufficient.
 
 ### Scale-down and failure recovery
 
@@ -215,7 +215,7 @@ If there is a big difference in the rating of different stores, you need to exam
 
     - The scheduling speed is limited by default for load balancing purpose. You can adjust `leader-schedule-limit` or `region-schedule-limit` to larger values without significantly impacting regular services. In addition, you can also properly ease the restrictions specified by `max-pending-peer-count` and `max-snapshot-count`.
     - Other scheduling tasks are running concurrently, which slows down the balancing. In this case, if the balancing takes precedence over other scheduling tasks, you can stop other tasks or limit their speeds. For example, if you take some nodes offline when balancing is in progress, both operations consume the quota of `region-schedule-limit`. In this case, you can limit the speed of scheduler to remove nodes, or simply set `enable-replace-offline-replica = false` to temporarily disable it.
-    - The scheduling process is too slow. You can check the **Operator step duration** metric to confirm the cause. Generally, steps that do not involve sending and receiving snapshots (such as `TransferLeader`, `RemovePeer`, `PromoteLearner`) should be completed in milliseconds, while steps that involve snapshots (such as `AddLearner` and `AddPeer`) are expected to be completed in tens of seconds. If the duration is obviously too long, it could be caused by high pressure on TiKV or bottleneck in network, etc., which needs specific analysis.
+    - The scheduling process is too slow. You can check the **Operator step duration** metric to confirm the cause. Generally, steps that do not involve sending and receiving snapshots (such as `TransferLeader`, `RemovePeer`, `PromoteLearner`) should be completed in milliseconds, while steps that involve snapshots (such as `AddLearner` and `AddPeer`) are expected to be completed in tens of seconds. If the duration is obviously too long, it could be caused by high pressure on TiKV or bottleneck in network, which needs specific analysis.
 
 - PD fails to generate the corresponding balancing scheduler. Possible reasons include:
 

diff --git a/br-usage-backup-for-maintain.md b/br-usage-backup-for-maintain.md
@@ -105,7 +105,7 @@ In the preceding command, `--db` and `--table` specify the database name and tab
 
 To back up multiple tables with more criteria, run the `br backup full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`.
 
-Example: Back up `db*.tbl*` data of a table to  the `table-filter/2022-01-30/` directory in the `backup-data` bucket of Amazon S3.
+Example: Back up `db*.tbl*` data of a table to the `table-filter/2022-01-30/` directory in the `backup-data` bucket of Amazon S3.
 
 {{< copyable "shell-regular" >}}
 
@@ -172,7 +172,7 @@ BR supports encrypting backup data at the backup end and at the storage end when
 Since TiDB v5.3.0, you can encrypt backup data by configuring the following parameters:
 
 - `--crypter.method`: Encryption algorithm, which can be `aes128-ctr`, `aes192-ctr`, or `aes256-ctr`. The default value is `plaintext`, indicating that data is not encrypted.
-- `--crypter.key`: Encryption key in hexadecimal string format. It is a  128-bit (16 bytes) key for the algorithm `aes128-ctr`, 24-byte key for the algorithm `aes192-ctr`, and 32-byte key for the algorithm `aes256-ctr`.
+- `--crypter.key`: Encryption key in hexadecimal string format. It is a 128-bit (16 bytes) key for the algorithm `aes128-ctr`, 24-byte key for the algorithm `aes192-ctr`, and 32-byte key for the algorithm `aes256-ctr`.
 - `--crypter.key-file`: The key file. You can directly pass in the file path where the key is stored as a parameter without passing in "crypter.key".
 
 Example: Encrypt backup data at the backup end.

diff --git a/br/backup-and-restore-faq.md b/br/backup-and-restore-faq.md
@@ -180,7 +180,7 @@ If the cluster backed up using BR has TiFlash, `TableInfo` stores the TiFlash in
 
 No. BR does not support in-place full restoration of some historical backup.
 
-## How can I use BR for incremental backup in the Kubernetes environment?
+## How can I use BR for incremental backup on Kubernetes?
 
 To get the `commitTs` field of the last BR backup, run the `kubectl -n ${namespace} get bk ${name}` command using kubectl. You can use the content of this field as `--lastbackupts`.
 

diff --git a/br/br-deployment.md b/br/br-deployment.md
@@ -11,9 +11,9 @@ This document describes the recommended deployment of Backup & Restore (BR) and
 
 Recommended practices when deploying BR:
 
-- In production environments, deploy BR on a node with at least 8 cores CPU and 16 GB memory. Select an appropriate OS version by following [Linux OS version requirements](/hardware-and-software-requirements.md#linux-os-version-requirements).
+- In production environments, deploy BR on a node with at least 8 cores CPU and 16 GB memory. Select an appropriate OS version by following [Linux OS version requirements](/hardware-and-software-requirements.md#os-and-platform-requirements).
 - Save backup data to Amazon S3, GCS or Azure Blob Storage.
-- Allocate sufficient resources for backup and restoration；
+- Allocate sufficient resources for backup and restoration:
 
     - BR, TiKV nodes, and the backup storage system should provide network bandwidth that is greater than the backup speed. If the target cluster is particularly large, the threshold of backup and restoration speed is limited by the bandwidth of the backup network.
     - The backup storage system should also provide sufficient write/read performance (IOPS). Otherwise, the IOPS might become a performance bottleneck during backup or restoration.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -17,6 +17,4 @@ Please answer the following questions before submitting your issue. Thanks!
		2. Describe your suggestion or addition.


		3. Provide some reference materials (documents, websites, etc) if you could.


		3. Provide some reference materials (such as documents and websites) if you could.