Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synchronous replication docs #4207

Merged
merged 10 commits into from
Nov 20, 2020
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@
+ Tutorials
+ [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md)
+ [Three Data Centers in Two Cities Deployment](/three-data-centers-in-two-cities-deployment.md)
+ [Synchronous Replication for Dual Data Centers](/synchronous-replication.md)
+ Best Practices
+ [Use TiDB](/tidb-best-practices.md)
+ [Java Application Development](/best-practices/java-app-best-practices.md)
Expand Down
4 changes: 4 additions & 0 deletions pd-configuration-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,3 +375,7 @@ Configuration items related to the [TiDB Dashboard](/dashboard/dashboard-intro.m
+ Determines whether to enable the telemetry collection feature in TiDB Dashboard.
+ Default value: `true`
+ See [Telemetry](/telemetry.md) for details.

## `replication-mode`

Configuration items related to the replication mode of all Regions. See [Enable synchronous replication in PD configuration file](/synchronous-replication.md#enable-synchronous-replication-in-pd-configuration-file) for details.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions pd-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,8 @@ Usage:
config set cluster-version 1.0.8 // Set the version of the cluster to 1.0.8
```

- `replication-mode` controls the replication mode of a single Region in the dual data center scenario. See [Change replication mode manually](/synchronous-replication.md#change-replication-mode-manually) for details.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

- `leader-schedule-policy` is used to select the scheduling strategy for the leader. You can schedule the leader according to `size` or `count`.

- `scheduler-max-waiting-operator` is used to control the number of waiting operators in each scheduler.
Expand Down
95 changes: 95 additions & 0 deletions synchronous-replication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: Synchronous Replication for Dual Data Centers
summary: Learn how to configure synchronous replication.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
---

# Synchronous Replication for Dual Data Centers

This document introduces how to configure synchronous replication for dual data centers.

> **Warning:**
>
> Synchronous replication is still an experimental feature. Do not use it in a production environment.

In the scenario of dual data centers, one is the primary center and the other is the DR (data recovery) center. When a Region has an odd number of replicas, more replicas are placed in the primary center. When the DR center is down for more than a specified period of time, the asynchronous mode is used by default for the replication between two centers.

To use the synchronous mode, you can configure in the PD configuration file or change the replication mode manually using pd-ctl.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

## Enable synchronous replication in PD configuration file
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

The replication mode is controlled by PD. You can configure in the PD configuration file when deploying a cluster. See the following example:
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

{{< copyable "" >}}

```toml
[replication-mode]
replication-mode = "dr-auto-sync"
[replication-mode.dr-auto-sync]
label-key = "zone"
primary = "z1"
dr = "z2"
primary-replicas = 2
dr-replicas = 1
wait-store-timeout = "1m"
wait-sync-timeout = "1m"
```

In the configuration above:

+ `dr-auto-sync` is the mode to enable synchronous replication.
+ The label key `zone` is used to distinguish different data centers.
+ TiKV instances with the `"z1"` value are considered the primary data center, and TiKV instances with `"z2"` are the DR data center.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
+ `primary-replicas` is the number of replicas that should be placed in the primary data center.
+ `dr-replicas` is the number of replicas that should be placed in the DR data center.
+ `wait-store-timeout` is the time to wait before falling back to asynchronous replication.
+ `wait-sync-timeout` is the time to wait before forcing TiKV to change replication mode (currently not supported).
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

To check the current replication state of the cluster, use the following URL:

{{< copyable "shell-regular" >}}

```bash
% curl http://pd_ip:pd_port/pd/api/v1/replication_mode/status
```

```bash
{
"mode": "dr-auto-sync",
"dr-auto-sync": {
"label-key": "zone",
"state": "sync"
}
}
```

> **Note:**
>
> The replication mode indicates how a single Region is replicated, either `asynchronous` or `synchronous`. The replication state of the cluster indicates how all Regions are replicated, with the options of `async`, `sync-recover`, and `sync`.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

After the cluster state becomes `sync`, it will not become `async` unless the number of down instances is larger than the specified number of replicas in either data center. Once the cluster state becomes `async`, PD requests TiKV to change the replication mode to `asynchronous` and checks whether TiKV instances are recovered from time to time. When the number of down instances is smaller than the number of replicas in both data centers, the cluster enters the `sync-recover` state, and then requests TiKV to change the replication mode to `synchronous`. After all Regions become `synchronous`, the cluster becomes `sync` again.

## Change replication mode manually
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

You can use [`pd-ctl`](/pd-control.md) to change a cluster from `asynchronous` to `synchronous`.

{{< copyable "shell-regular" >}}

```bash
>> config set replication-mode dr-auto-sync
```

Or change back to `asynchronous`:

{{< copyable "shell-regular" >}}

```bash
>> config set replication-mode majority
```

You can also update the label key:

{{< copyable "shell-regular" >}}

```bash
>> config set replication-mode dr-auto-sync label-key dc
```