Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiflash: Add instructions to improve replica sync performance #11398

Merged
merged 4 commits into from
Nov 22, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 52 additions & 1 deletion tiflash/create-tiflash-replicas.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,57 @@ To check tables without TiFlash replicas in the database, you can execute the fo
SELECT TABLE_NAME FROM information_schema.tables where TABLE_SCHEMA = "<db_name>" and TABLE_NAME not in (SELECT TABLE_NAME FROM information_schema.tiflash_replica where TABLE_SCHEMA = "<db_name>");
```

## Speed up TiFlash replication

shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved
When TiFlash replicas are added, each TiKV instance performs a full table scan and sends the scanned data to TiFlash as a "snapshot" to create the replica. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can increase the TiFlash replication speed by following these steps.
breezewish marked this conversation as resolved.
Show resolved Hide resolved

1. Temporarily increase the snapshot write speed limit for each TiKV and TiFlash instance by using the [Dynamic Config SQL statement](/dynamic-config.md):
shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved

```sql
-- The default value for both configs are 100MiB, i.e. the maximum disk bandwidth used for writing snapshots is no more than 100MiB/s.
breezewish marked this conversation as resolved.
Show resolved Hide resolved
SET CONFIG tikv `server.snap-max-write-bytes-per-sec` = '300MiB';
SET CONFIG tiflash `raftstore-proxy.server.snap-max-write-bytes-per-sec` = '300MiB';
```

After executing these SQL statements, the configuration changes take effect immediately without restarting the cluster. However, since the replication speed is still restricted by the PD limit globally, you cannot observe the speed up for now.
breezewish marked this conversation as resolved.
Show resolved Hide resolved

2. Use [PD Control](/pd-control.md) to progressively ease the new replica speed limit.
shichun-0415 marked this conversation as resolved.
Show resolved Hide resolved

The default new replica speed limit is 30, which means, approximately 30 regions will add TiFlash replicas every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed:
breezewish marked this conversation as resolved.
Show resolved Hide resolved

```shell
tiup ctl:v<CLUSTER_VERSION> pd -u http://<PD_ADDRESS>:2379 store limit all engine tiflash 60 add-peer
```

> In the above command, you need to replace `<CLUSTER_VERSION>` with the cluster version and `<PD_ADDRESS>:2379` with the address of any PD node. For example:
breezewish marked this conversation as resolved.
Show resolved Hide resolved
>
> ```shell
> tiup ctl:v6.1.1 pd -u http://192.168.1.4:2379 store limit all engine tiflash 60 add-peer
> ```

Within a few minutes, you will observe a significant increase in CPU and disk IO resource usage of the TiFlash nodes, and TiFlash should create replicas faster. At the same time, the TiKV nodes' CPU and disk IO resource usage will also increase.
breezewish marked this conversation as resolved.
Show resolved Hide resolved

If the TiKV and TiFlash nodes still have spare resources at this point and the latency of your online service does not increase significantly, you may consider further easing the limit, for example, triples the original speed:
breezewish marked this conversation as resolved.
Show resolved Hide resolved

```shell
tiup ctl:v<CLUSTER_VERSION> pd -u http://<PD_ADDRESS>:2379 store limit all engine tiflash 90 add-peer
```

3. After the TiFlash replication is complete, remember to revert to the default configuration to reduce the impact on online services.
breezewish marked this conversation as resolved.
Show resolved Hide resolved

Execute the following PD Control command to restore the default new replica speed limit:

```shell
tiup ctl:v<CLUSTER_VERSION> pd -u http://<PD_ADDRESS>:2379 store limit all engine tiflash 30 add-peer
```

Execute the following SQL statements to restore the default snapshot write speed limit:

```sql
SET CONFIG tikv `server.snap-max-write-bytes-per-sec` = '100MiB';
SET CONFIG tiflash `raftstore-proxy.server.snap-max-write-bytes-per-sec` = '100MiB';
```

## Set available zones

<CustomContent platform="tidb-cloud">
Expand Down Expand Up @@ -187,7 +238,7 @@ When configuring replicas, if you need to distribute TiFlash replicas to multipl
3. PD schedules the replicas based on the labels. In this example, PD respectively schedules two replicas of the table `t` to two available zones. You can use pd-ctl to view the scheduling.

```shell
> tiup ctl:<version> pd -u<pd-host>:<pd-port> store
> tiup ctl:v<CLUSTER_VERSION> pd -u http://<PD_ADDRESS>:2379 store

...
"address": "172.16.5.82:23913",
Expand Down