From 042d8b040dde979343c7e100d1d21951783210ea Mon Sep 17 00:00:00 2001 From: Wish Date: Tue, 22 Nov 2022 14:33:24 +0800 Subject: [PATCH 1/4] tiflash: Add instructions to improve replica sync performance Signed-off-by: Wish --- tiflash/create-tiflash-replicas.md | 51 ++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index ec47a19a10d1c..e1505c3ee9667 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -131,6 +131,57 @@ To check tables without TiFlash replicas in the database, you can execute the fo SELECT TABLE_NAME FROM information_schema.tables where TABLE_SCHEMA = "" and TABLE_NAME not in (SELECT TABLE_NAME FROM information_schema.tiflash_replica where TABLE_SCHEMA = ""); ``` +## Speed up TiFlash replication + +When TiFlash replicas are added, each TiKV instance performs a full table scan and sends the scanned data to TiFlash as a "snapshot" to create the replica. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can increase the TiFlash replication speed by following these steps. + +1. Temporarily increase the snapshot write speed limit for each TiKV and TiFlash instance by using the [Dynamic Config SQL statement](/dynamic-config.md): + + ```sql + -- The default value for both configs are 100MiB, i.e. the maximum disk bandwidth used for writing snapshots is no more than 100MiB/s. + SET CONFIG tikv `server.snap-max-write-bytes-per-sec` = '300MiB'; + SET CONFIG tiflash `raftstore-proxy.server.snap-max-write-bytes-per-sec` = '300MiB'; + ``` + + After executing these SQL statements, the configuration changes take effect immediately without restarting the cluster. However, since the replication speed is still restricted by the PD limit globally, you cannot observe the speed up for now. + +2. Use [PD Control](/pd-control.md) to progressively ease the new replica speed limit. + + The default new replica speed limit is 30, which means, approximately 30 regions will add TiFlash replicas every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: + + ```shell + tiup ctl:v pd -u http://:2379 store limit all engine tiflash 60 add-peer + ``` + + > In the above command, you need to replace `` with the cluster version and `:2379` with the address of any PD node. For example: + > + > ```shell + > tiup ctl:v6.1.1 pd -u http://192.168.1.4:2379 store limit all engine tiflash 60 add-peer + > ``` + + Within a few minutes, you will observe a significant increase in CPU and disk IO resource usage of the TiFlash nodes, and TiFlash should create replicas faster. At the same time, the TiKV nodes' CPU and disk IO resource usage will also increase. + + If the TiKV and TiFlash nodes still have spare resources at this point and the latency of your online service does not increase significantly, you may consider further easing the limit, for example, triples the original speed: + + ```shell + tiup ctl:v pd -u http://:2379 store limit all engine tiflash 90 add-peer + ``` + +3. After the TiFlash replication is complete, remember to revert to the default configuration to reduce the impact on online services. + + Execute the following PD Control command to restore the default new replica speed limit: + + ```shell + tiup ctl:v pd -u http://:2379 store limit all engine tiflash 30 add-peer + ``` + + Execute the following SQL statements to restore the default snapshot write speed limit: + + ```sql + SET CONFIG tikv `server.snap-max-write-bytes-per-sec` = '100MiB'; + SET CONFIG tiflash `raftstore-proxy.server.snap-max-write-bytes-per-sec` = '100MiB'; + ``` + ## Set available zones From cb54b11414996624763dc66f34b523b483bc3ee6 Mon Sep 17 00:00:00 2001 From: Wish Date: Tue, 22 Nov 2022 14:35:38 +0800 Subject: [PATCH 2/4] grumble Signed-off-by: Wish --- tiflash/create-tiflash-replicas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index e1505c3ee9667..6665e85c3646a 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -238,7 +238,7 @@ When configuring replicas, if you need to distribute TiFlash replicas to multipl 3. PD schedules the replicas based on the labels. In this example, PD respectively schedules two replicas of the table `t` to two available zones. You can use pd-ctl to view the scheduling. ```shell - > tiup ctl: pd -u: store + > tiup ctl:v pd -u http://:2379 store ... "address": "172.16.5.82:23913", From 8c8a8ebaec1577d36da0e32855bda4021a1c5fdf Mon Sep 17 00:00:00 2001 From: Wenxuan Date: Tue, 22 Nov 2022 15:14:48 +0800 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: shichun-0415 <89768198+shichun-0415@users.noreply.github.com> --- tiflash/create-tiflash-replicas.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index 6665e85c3646a..76017e39e8175 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -133,41 +133,41 @@ SELECT TABLE_NAME FROM information_schema.tables where TABLE_SCHEMA = " ## Speed up TiFlash replication -When TiFlash replicas are added, each TiKV instance performs a full table scan and sends the scanned data to TiFlash as a "snapshot" to create the replica. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can increase the TiFlash replication speed by following these steps. +Before TiFlash replicas are added, each TiKV instance performs a full table scan and sends the scanned data to TiFlash as a "snapshot" to create replicas. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can accelerate TiFlash replication by performing the following steps. 1. Temporarily increase the snapshot write speed limit for each TiKV and TiFlash instance by using the [Dynamic Config SQL statement](/dynamic-config.md): ```sql - -- The default value for both configs are 100MiB, i.e. the maximum disk bandwidth used for writing snapshots is no more than 100MiB/s. + -- The default value for both configurations are 100MiB, i.e. the maximum disk bandwidth used for writing snapshots is no more than 100MiB/s. SET CONFIG tikv `server.snap-max-write-bytes-per-sec` = '300MiB'; SET CONFIG tiflash `raftstore-proxy.server.snap-max-write-bytes-per-sec` = '300MiB'; ``` - After executing these SQL statements, the configuration changes take effect immediately without restarting the cluster. However, since the replication speed is still restricted by the PD limit globally, you cannot observe the speed up for now. + After executing these SQL statements, the configuration changes take effect immediately without restarting the cluster. However, since the replication speed is still restricted by the PD limit globally, you cannot observe the acceleration for now. 2. Use [PD Control](/pd-control.md) to progressively ease the new replica speed limit. - The default new replica speed limit is 30, which means, approximately 30 regions will add TiFlash replicas every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: + The default new replica speed limit is 30, which means, approximately 30 Regions add TiFlash replicas every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 60 add-peer ``` - > In the above command, you need to replace `` with the cluster version and `:2379` with the address of any PD node. For example: + > In the preceding command, you need to replace `` with the actual cluster version and `:2379` with the address of any PD node. For example: > > ```shell > tiup ctl:v6.1.1 pd -u http://192.168.1.4:2379 store limit all engine tiflash 60 add-peer > ``` - Within a few minutes, you will observe a significant increase in CPU and disk IO resource usage of the TiFlash nodes, and TiFlash should create replicas faster. At the same time, the TiKV nodes' CPU and disk IO resource usage will also increase. + Within a few minutes, you will observe a significant increase in CPU and disk IO resource usage of the TiFlash nodes, and TiFlash should create replicas faster. At the same time, the TiKV nodes' CPU and disk IO resource usage increases as well. - If the TiKV and TiFlash nodes still have spare resources at this point and the latency of your online service does not increase significantly, you may consider further easing the limit, for example, triples the original speed: + If the TiKV and TiFlash nodes still have spare resources at this point and the latency of your online service does not increase significantly, you can further ease the limit, for example, triple the original speed: ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 90 add-peer ``` -3. After the TiFlash replication is complete, remember to revert to the default configuration to reduce the impact on online services. +3. After the TiFlash replication is complete, revert to the default configuration to reduce the impact on online services. Execute the following PD Control command to restore the default new replica speed limit: From 051479480f8f0f5ec6f27b8da5f56b2527ee5fea Mon Sep 17 00:00:00 2001 From: shichun-0415 <89768198+shichun-0415@users.noreply.github.com> Date: Tue, 22 Nov 2022 18:47:06 +0800 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: Grace Cai --- tiflash/create-tiflash-replicas.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index 76017e39e8175..9ab44f5332816 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -133,9 +133,17 @@ SELECT TABLE_NAME FROM information_schema.tables where TABLE_SCHEMA = " ## Speed up TiFlash replication + + +> **Note:** +> +> This section is not applicable to TiDB Cloud. + + + Before TiFlash replicas are added, each TiKV instance performs a full table scan and sends the scanned data to TiFlash as a "snapshot" to create replicas. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can accelerate TiFlash replication by performing the following steps. -1. Temporarily increase the snapshot write speed limit for each TiKV and TiFlash instance by using the [Dynamic Config SQL statement](/dynamic-config.md): +1. Temporarily increase the snapshot write speed limit for each TiKV and TiFlash instance by using the [Dynamic Config SQL statement](https://docs.pingcap.com/tidb/stable/dynamic-config): ```sql -- The default value for both configurations are 100MiB, i.e. the maximum disk bandwidth used for writing snapshots is no more than 100MiB/s. @@ -145,7 +153,7 @@ Before TiFlash replicas are added, each TiKV instance performs a full table scan After executing these SQL statements, the configuration changes take effect immediately without restarting the cluster. However, since the replication speed is still restricted by the PD limit globally, you cannot observe the acceleration for now. -2. Use [PD Control](/pd-control.md) to progressively ease the new replica speed limit. +2. Use [PD Control](https://docs.pingcap.com/tidb/stable/pd-control) to progressively ease the new replica speed limit. The default new replica speed limit is 30, which means, approximately 30 Regions add TiFlash replicas every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: