Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into add_v6.4_release_…
Browse files Browse the repository at this point in the history
…notes
  • Loading branch information
qiancai committed Nov 16, 2022
2 parents d41b93f + fadbfa6 commit 3f2bbaf
Show file tree
Hide file tree
Showing 197 changed files with 1,130 additions and 691 deletions.
4 changes: 1 addition & 3 deletions .github/ISSUE_TEMPLATE/change-request.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,4 @@ Please answer the following questions before submitting your issue. Thanks!
2. Describe your suggestion or addition.


3. Provide some reference materials (documents, websites, etc) if you could.


3. Provide some reference materials (such as documents and websites) if you could.
1 change: 1 addition & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ By default, **CHOOSE MASTER ONLY** so your changes will be applied to the next T
For details, see [tips for choosing the affected versions](https://github.com/pingcap/docs/blob/master/CONTRIBUTING.md#guideline-for-choosing-the-affected-versions).

- [ ] master (the latest development version)
- [ ] v6.5 (TiDB 6.5 versions)
- [ ] v6.4 (TiDB 6.4 versions)
- [ ] v6.3 (TiDB 6.3 versions)
- [ ] v6.1 (TiDB 6.1 versions)
Expand Down
5 changes: 3 additions & 2 deletions TOC-tidb-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,11 @@
- [Overview](/garbage-collection-overview.md)
- [Configuration](/garbage-collection-configuration.md)
- [Tune TiFlash performance](/tiflash/tune-tiflash-performance.md)
- Manage User Access
- Security
- [Manage Console User Access](/tidb-cloud/manage-user-access.md)
- [Configure Cluster Security Settings](/tidb-cloud/configure-security-settings.md)
- [Database Audit Logging](/tidb-cloud/tidb-cloud-auditing.md)
- [Secure Connections to Serverless Tier Clusters](/tidb-cloud/secure-connections-to-serverless-tier-clusters.md)
- Billing
- [Invoices](/tidb-cloud/tidb-cloud-billing.md#invoices)
- [Billing Details](/tidb-cloud/tidb-cloud-billing.md#billing-details)
Expand Down Expand Up @@ -422,7 +424,6 @@
- [Dumpling](/dumpling-overview.md)
- [Table Filter](/table-filter.md)
- [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md)
- [Secure Connections to Serverless Tier Clusters](/tidb-cloud/secure-connections-to-serverless-tier-clusters.md)
- [FAQs](/tidb-cloud/tidb-cloud-faq.md)
- Release Notes
- [2022](/tidb-cloud/release-notes-2022.md)
Expand Down
5 changes: 3 additions & 2 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
- [Hybrid Topology](/hybrid-deployment-topology.md)
- Install and Start
- [Use TiUP](/production-deployment-using-tiup.md)
- [Deploy in Kubernetes](/tidb-in-kubernetes.md)
- [Deploy on Kubernetes](/tidb-in-kubernetes.md)
- [Verify Cluster Status](/post-installation-check.md)
- Test Cluster Performance
- [Test TiDB Using Sysbench](/benchmark/benchmark-tidb-using-sysbench.md)
Expand Down Expand Up @@ -142,7 +142,7 @@
- [Daily Checklist](/daily-check.md)
- [Maintain TiFlash](/tiflash/maintain-tiflash.md)
- [Maintain TiDB Using TiUP](/maintain-tidb-using-tiup.md)
- [Modify Configuration Online](/dynamic-config.md)
- [Modify Configuration Dynamically](/dynamic-config.md)
- [Online Unsafe Recovery](/online-unsafe-recovery.md)
- [Replicate Data Between Primary and Secondary Clusters](/replicate-between-primary-and-secondary-clusters.md)
- Monitor and Alert
Expand Down Expand Up @@ -632,6 +632,7 @@
- [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md)
- [`EXPLAIN`](/sql-statements/sql-statement-explain.md)
- [`FLASHBACK CLUSTER TO TIMESTAMP`](/sql-statements/sql-statement-flashback-to-timestamp.md)
- [`FLASHBACK DATABASE`](/sql-statements/sql-statement-flashback-database.md)
- [`FLASHBACK TABLE`](/sql-statements/sql-statement-flashback-table.md)
- [`FLUSH PRIVILEGES`](/sql-statements/sql-statement-flush-privileges.md)
- [`FLUSH STATUS`](/sql-statements/sql-statement-flush-status.md)
Expand Down
2 changes: 1 addition & 1 deletion _index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ hide_commit: true

[Deploy a TiDB Cluster Using TiUP](https://docs.pingcap.com/tidb/dev/production-deployment-using-tiup)

[Deploy a TiDB Cluster in Kubernetes](https://docs.pingcap.com/tidb/dev/tidb-in-kubernetes)
[Deploy a TiDB Cluster on Kubernetes](https://docs.pingcap.com/tidb/dev/tidb-in-kubernetes)

</LearningPath>

Expand Down
22 changes: 18 additions & 4 deletions alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,20 @@ This section gives the alert rules for the PD component.
* If you confirm that the TiKV/TiFlash instance cannot be recovered, you can make it offline.
* If you confirm that the TiKV/TiFlash instance can be recovered, but not in the short term, you can consider increasing the value of `max-down-time`. It will prevent the TiKV/TiFlash instance from being considered as irrecoverable and the data from being removed from the TiKV/TiFlash.

#### `PD_cluster_unhealthy_tikv_nums`

* Alert rule:

`(sum(pd_cluster_status{type="store_unhealth_count"}) by (instance) > 0) and (sum(etcd_server_is_leader) by (instance) > 0)`

* Description:

Indicates that there are unhealthy stores. If the situation persists for some time (configured by [`max-store-down-time`](/pd-configuration-file.md#max-store-down-time), defaults to `30m`), the store is likely to change to `Offline` state, which triggers the [`PD_cluster_down_store_nums`](#pd_cluster_down_store_nums) alert.

* Solution:

Check the state of the TiKV stores.

#### `PD_cluster_low_space`

* Alert rule:
Expand All @@ -274,7 +288,7 @@ This section gives the alert rules for the PD component.

* Check whether the space in the cluster is generally insufficient. If so, increase its capacity.
* Check whether there is any issue with Region balance scheduling. If so, it will lead to uneven data distribution.
* Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, core dump, etc.
* Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, and core dump.
* Lower the Region weight of the node to reduce the data volume.
* When it is not possible to release the space, consider proactively making the node offline. This prevents insufficient disk space that leads to downtime.

Expand Down Expand Up @@ -353,7 +367,7 @@ This section gives the alert rules for the PD component.

* Solution:

* Exclude the human factors, such as restarting PD, manually transferring leader, adjusting leader priority, etc.
* Exclude the human factors, such as restarting PD, manually transferring leader, and adjusting leader priority.
* Check the network and system load status.
* If the problematic PD instance cannot be recovered due to environmental factors, make it offline and replace it.

Expand All @@ -370,7 +384,7 @@ This section gives the alert rules for the PD component.
* Solution:

* Check whether it is needed to increase capacity.
* Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, core dump, etc.
* Check whether there is any file that occupies a large amount of disk space, such as the log, snapshot, and core dump.

#### `PD_system_time_slow`

Expand Down Expand Up @@ -1140,4 +1154,4 @@ This section gives the alert rules for the Blackbox_exporter TCP, ICMP, and HTTP
* Solution:

* View the ping latency between the two nodes on the Grafana Blackbox Exporter page to check whether it is too high.
* Check the TCP panel on the Grafana Node Exporter page to check whether there is any packet loss.
* Check the TCP panel on the Grafana Node Exporter page to check whether there is any packet loss.
26 changes: 7 additions & 19 deletions benchmark/benchmark-tidb-using-sysbench.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ aliases: ['/docs/dev/benchmark/benchmark-tidb-using-sysbench/','/docs/dev/benchm

# How to Test TiDB Using Sysbench

It is recommended to use Sysbench 1.0 or later, which can be [downloaded here](https://github.com/akopytov/sysbench/releases/tag/1.0.14).
It is recommended to use Sysbench 1.0 or later, which can be [downloaded here](https://github.com/akopytov/sysbench/releases/tag/1.0.20).

## Test plan

Expand All @@ -19,6 +19,8 @@ server_configs:
log.level: "error"
```
It is also recommended to make sure [`tidb_enable_prepared_plan_cache`](/system-variables.md#tidb_enable_prepared_plan_cache-new-in-v610) is enabled and that you allow sysbench to use prepared statements by _not_ using `--db-ps-mode=disabled`. See the [SQL Prepared Execution Plan Cache](/sql-prepared-plan-cache.md) for documetnation about what the SQL plan cache does and how to monitor it.

### TiKV configuration

Higher log level also means better performance for TiKV.
Expand Down Expand Up @@ -109,10 +111,10 @@ Restart MySQL client and execute the following SQL statement to create a databas
create database sbtest;
```

Adjust the order in which Sysbench scripts create indexes. Sysbench imports data in the order of "Build Table -> Insert Data -> Create Index", which takes more time for TiDB to import data. Users can adjust the order to speed up the import of data. Suppose that you use the Sysbench version [1.0.14](https://github.com/akopytov/sysbench/tree/1.0.14). You can adjust the order in either of the following two ways:
Adjust the order in which Sysbench scripts create indexes. Sysbench imports data in the order of "Build Table -> Insert Data -> Create Index", which takes more time for TiDB to import data. Users can adjust the order to speed up the import of data. Suppose that you use the Sysbench version [1.0.20](https://github.com/akopytov/sysbench/tree/1.0.20). You can adjust the order in either of the following two ways:

- Download the modified [oltp_common.lua](https://raw.githubusercontent.com/pingcap/tidb-bench/master/sysbench/sysbench-patch/oltp_common.lua) file for TiDB and overwrite the `/usr/share/sysbench/oltp_common.lua` file with it.
- In `/usr/share/sysbench/oltp_common.lua`, move the lines [235](https://github.com/akopytov/sysbench/blob/1.0.14/src/lua/oltp_common.lua#L235)-[240](https://github.com/akopytov/sysbench/blob/1.0.14/src/lua/oltp_common.lua#L240) to be right behind the line 198.
- In `/usr/share/sysbench/oltp_common.lua`, move the lines [235-240](https://github.com/akopytov/sysbench/blob/1.0.20/src/lua/oltp_common.lua#L235-L240) to be right behind the line 198.

> **Note:**
>
Expand All @@ -130,22 +132,8 @@ sysbench --config-file=config oltp_point_select --tables=32 --table-size=1000000

To warm data, we load data from disk into the block cache of memory. The warmed data has significantly improved the overall performance of the system. It is recommended to warm data once after restarting the cluster.

Sysbench 1.0.14 does not provide data warming, so it must be done manually. If you are using [Sysbench of the master version](https://github.com/akopytov/sysbench/tree/master), you can use the data warming feature included in the tool itself.

Take a table sbtest7 in Sysbench as an example. Execute the following SQL to warming up data:

{{< copyable "sql" >}}

```sql
SELECT COUNT(pad) FROM sbtest7 USE INDEX (k_7);
```

Collecting statistics helps the optimizer choose a more accurate execution plan. The `analyze` command can be used to collect statistics on the table sbtest. Each table needs statistics.

{{< copyable "sql" >}}

```sql
ANALYZE TABLE sbtest7;
```bash
sysbench --config-file=config oltp_point_select --tables=32 --table-size=10000000 warmup
```

### Point select test command
Expand Down
2 changes: 1 addition & 1 deletion benchmark/online-workloads-and-add-index-operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,5 +345,5 @@ When the target column of the `ADD INDEX` statement is irrelevant to online work

## Summary

- When you perform frequent write operations (including `INSERT`, `DELETE` and `UPDATE` operations) to the target column of the `ADD INDEX` statement, the default `ADD INDEX` configuration causes relatively frequent write conflicts, which has a great impact on online workloads. At the same time, the `ADD INDEX` operation takes a long time to complete due to continuous retry attempts. In this test, you can modify the product of `tidb_ddl_reorg_worker_cnt` and `tidb_ddl_reorg_batch_size` to 1/32 of the default value. For example, you can set `tidb_ddl_reorg_worker_cnt` to `4` and `tidb_ddl_reorg_batch_size` to `256` for better performance.
- When you perform frequent write operations (including `INSERT`, `DELETE` and `UPDATE` operations) to the target column of the `ADD INDEX` statement, the default `ADD INDEX` configuration causes relatively frequent write conflicts, which has a great impact on online workloads. At the same time, the `ADD INDEX` operation takes a long time to complete due to continuous retry attempts. In this test, you can modify the product of `tidb_ddl_reorg_worker_cnt` and `tidb_ddl_reorg_batch_size` to 1/32 of the default value. For example, you can set `tidb_ddl_reorg_worker_cnt` to `4` and `tidb_ddl_reorg_batch_size` to `256` for better performance.
- When only performing query operations to the target column of the `ADD INDEX` statement or the target column is not directly related to online workloads, you can use the default `ADD INDEX` configuration.
2 changes: 1 addition & 1 deletion best-practices/high-concurrency-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ This document assumes that you have a basic understanding of TiDB. It is recomme

## Highly-concurrent write-intensive scenario

The highly concurrent write scenario often occurs when you perform batch tasks in applications, such as clearing, settlement and so on. This scenario has the following features:
The highly concurrent write scenario often occurs when you perform batch tasks in applications, such as clearing and settlement. This scenario has the following features:

+ A huge volume of data
+ The need to import historical data into database in a short time
Expand Down
4 changes: 2 additions & 2 deletions best-practices/pd-scheduling-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ For hot write regions, `hot-region-scheduler` attempts to redistribute both regi

Cluster topology awareness enables PD to distribute replicas of a region as much as possible. This is how TiKV ensures high availability and disaster recovery capability. PD continuously scans all regions in the background. When PD finds that the distribution of regions is not optimal, it generates an operator to replace peers and redistribute regions.

The component to check region distribution is `replicaChecker`, which is similar to a scheduler except that it cannot be disabled. `replicaChecker` schedules based on the the configuration of `location-labels`. For example, `[zone,rack,host]` defines a three-tier topology for a cluster. PD attempts to schedule region peers to different zones first, or to different racks when zones are insufficient (for example, 2 zones for 3 replicas), or to different hosts when racks are insufficient, and so on.
The component to check region distribution is `replicaChecker`, which is similar to a scheduler except that it cannot be disabled. `replicaChecker` schedules based on the the configuration of `location-labels`. For example, `[zone,rack,host]` defines a three-tier topology for a cluster. PD attempts to schedule region peers to different zones first, or to different racks when zones are insufficient (for example, 2 zones for 3 replicas), or to different hosts when racks are insufficient.

### Scale-down and failure recovery

Expand Down Expand Up @@ -215,7 +215,7 @@ If there is a big difference in the rating of different stores, you need to exam

- The scheduling speed is limited by default for load balancing purpose. You can adjust `leader-schedule-limit` or `region-schedule-limit` to larger values without significantly impacting regular services. In addition, you can also properly ease the restrictions specified by `max-pending-peer-count` and `max-snapshot-count`.
- Other scheduling tasks are running concurrently, which slows down the balancing. In this case, if the balancing takes precedence over other scheduling tasks, you can stop other tasks or limit their speeds. For example, if you take some nodes offline when balancing is in progress, both operations consume the quota of `region-schedule-limit`. In this case, you can limit the speed of scheduler to remove nodes, or simply set `enable-replace-offline-replica = false` to temporarily disable it.
- The scheduling process is too slow. You can check the **Operator step duration** metric to confirm the cause. Generally, steps that do not involve sending and receiving snapshots (such as `TransferLeader`, `RemovePeer`, `PromoteLearner`) should be completed in milliseconds, while steps that involve snapshots (such as `AddLearner` and `AddPeer`) are expected to be completed in tens of seconds. If the duration is obviously too long, it could be caused by high pressure on TiKV or bottleneck in network, etc., which needs specific analysis.
- The scheduling process is too slow. You can check the **Operator step duration** metric to confirm the cause. Generally, steps that do not involve sending and receiving snapshots (such as `TransferLeader`, `RemovePeer`, `PromoteLearner`) should be completed in milliseconds, while steps that involve snapshots (such as `AddLearner` and `AddPeer`) are expected to be completed in tens of seconds. If the duration is obviously too long, it could be caused by high pressure on TiKV or bottleneck in network, which needs specific analysis.

- PD fails to generate the corresponding balancing scheduler. Possible reasons include:

Expand Down
4 changes: 2 additions & 2 deletions br-usage-backup-for-maintain.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ In the preceding command, `--db` and `--table` specify the database name and tab

To back up multiple tables with more criteria, run the `br backup full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`.

Example: Back up `db*.tbl*` data of a table to the `table-filter/2022-01-30/` directory in the `backup-data` bucket of Amazon S3.
Example: Back up `db*.tbl*` data of a table to the `table-filter/2022-01-30/` directory in the `backup-data` bucket of Amazon S3.

{{< copyable "shell-regular" >}}

Expand Down Expand Up @@ -172,7 +172,7 @@ BR supports encrypting backup data at the backup end and at the storage end when
Since TiDB v5.3.0, you can encrypt backup data by configuring the following parameters:

- `--crypter.method`: Encryption algorithm, which can be `aes128-ctr`, `aes192-ctr`, or `aes256-ctr`. The default value is `plaintext`, indicating that data is not encrypted.
- `--crypter.key`: Encryption key in hexadecimal string format. It is a 128-bit (16 bytes) key for the algorithm `aes128-ctr`, 24-byte key for the algorithm `aes192-ctr`, and 32-byte key for the algorithm `aes256-ctr`.
- `--crypter.key`: Encryption key in hexadecimal string format. It is a 128-bit (16 bytes) key for the algorithm `aes128-ctr`, 24-byte key for the algorithm `aes192-ctr`, and 32-byte key for the algorithm `aes256-ctr`.
- `--crypter.key-file`: The key file. You can directly pass in the file path where the key is stored as a parameter without passing in "crypter.key".

Example: Encrypt backup data at the backup end.
Expand Down
2 changes: 1 addition & 1 deletion br/backup-and-restore-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ If the cluster backed up using BR has TiFlash, `TableInfo` stores the TiFlash in

No. BR does not support in-place full restoration of some historical backup.

## How can I use BR for incremental backup in the Kubernetes environment?
## How can I use BR for incremental backup on Kubernetes?

To get the `commitTs` field of the last BR backup, run the `kubectl -n ${namespace} get bk ${name}` command using kubectl. You can use the content of this field as `--lastbackupts`.

Expand Down
4 changes: 2 additions & 2 deletions br/br-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ This document describes the recommended deployment of Backup & Restore (BR) and

Recommended practices when deploying BR:

- In production environments, deploy BR on a node with at least 8 cores CPU and 16 GB memory. Select an appropriate OS version by following [Linux OS version requirements](/hardware-and-software-requirements.md#linux-os-version-requirements).
- In production environments, deploy BR on a node with at least 8 cores CPU and 16 GB memory. Select an appropriate OS version by following [Linux OS version requirements](/hardware-and-software-requirements.md#os-and-platform-requirements).
- Save backup data to Amazon S3, GCS or Azure Blob Storage.
- Allocate sufficient resources for backup and restoration
- Allocate sufficient resources for backup and restoration:

- BR, TiKV nodes, and the backup storage system should provide network bandwidth that is greater than the backup speed. If the target cluster is particularly large, the threshold of backup and restoration speed is limited by the bandwidth of the backup network.
- The backup storage system should also provide sufficient write/read performance (IOPS). Otherwise, the IOPS might become a performance bottleneck during backup or restoration.
Expand Down
Loading

0 comments on commit 3f2bbaf

Please sign in to comment.