Skip to content

Commit

Permalink
ticdc: add more docs about unified sorter (#5719) (#5724)
Browse files Browse the repository at this point in the history
  • Loading branch information
ti-chi-bot authored May 31, 2021
1 parent 65fd84a commit a49dda7
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 8 deletions.
3 changes: 3 additions & 0 deletions ticdc/deploy-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_2.log --addr=0.0.0.0:830
cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_3.log --addr=0.0.0.0:8303 --advertise-addr=127.0.0.1:8303
```

## Description of TiCDC `cdc server` command-line parameters

The following are descriptions of options available in the `cdc server` command:

- `gc-ttl`: The TTL (Time To Live) of the service level `GC safepoint` in PD set by TiCDC, in seconds. The default value is `86400`, which means 24 hours.
Expand All @@ -59,3 +61,4 @@ The following are descriptions of options available in the `cdc server` command:
- `cert`: The path of the certificate file used by TiCDC, in the PEM format (optional).
- `key`: The path of the certificate key file used by TiCDC, in the PEM format (optional).
- `config`: The address of the configuration file that TiCDC uses (optional). This option is supported since TiCDC v4.0.13. This option can be used in the TiCDC deployment since TiUP v1.4.0.
- `sort-dir`: Specifies the temporary file directory of the sorting engine. The default value of this configuration item is `/tmp/cdc_sort`. When Unified Sorter is enabled, if this directory on the server is not writable or the available space is insufficient, you need to manually specify a directory in `sort-dir`. Make sure that TiCDC can read and write data in the `sort-dir` path.
29 changes: 21 additions & 8 deletions ticdc/manage-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,13 @@ Execute the following commands to create a replication task:
{{< copyable "shell-regular" >}}

```shell
cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:[email protected]:3306/" --changefeed-id="simple-replication-task"
cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:[email protected]:3306/" --changefeed-id="simple-replication-task" --sort-engine="unified"
```

```shell
Create changefeed successfully!
ID: simple-replication-task
Info: {"sink-uri":"mysql://root:[email protected]:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"memory","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"cyclic-replication":{"enable":false,"replica-id":0,"filter-replica-ids":null,"id-buckets":0,"sync-ddl":false},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null}
Info: {"sink-uri":"mysql://root:[email protected]:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"unified","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"cyclic-replication":{"enable":false,"replica-id":0,"filter-replica-ids":null,"id-buckets":0,"sync-ddl":false},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null}
```

- `--changefeed-id`: The ID of the replication task. The format must match the `^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$` regular expression. If this ID is not specified, TiCDC automatically generates a UUID (the version 4 format) as the ID.
Expand All @@ -102,13 +102,13 @@ Info: {"sink-uri":"mysql://root:[email protected]:3306/","opts":{},"create-time":

- `--start-ts`: Specifies the starting TSO of the `changefeed`. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time.
- `--target-ts`: Specifies the ending TSO of the `changefeed`. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data.
- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must sort the data changes before writing them to the sink. This option supports `memory`/`unified`/`file`.
- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must sort the data changes before writing them to the sink. This option supports `unified` (by default)/`memory`/`file`.

- `memory`: Sorts data changes in memory.
- `unified`: This feature is introduced since v4.0.9 and becomes a stable feature since v4.0.11. When `unified` is used, TiCDC prefers data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. It is recommended to enable this feature when there is a risk of memory shortage, but **NOT recommended** to enable it in v4.0.9 and v4.0.10.
- `unified`: When `unified` is used, TiCDC prefers data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. This is the default value of `--sort-engine` since v4.0.13.
- `memory`: Sorts data changes in memory. It is **NOT recommended** to use it, because memory overflow might occur when a large amount of data is replicated.
- `file`: Entirely uses the disk to store the temporary data. This feature is **deprecated**. It is **NOT recommended** to use it in **any** situation.

- `--sort-dir`: Specifies the temporary file directory of the sorting engine. For TiDB v4.0.12 and later versions, it is **NOT recommended** to use this option in the command `cdc cli changefeed create`. You are recommended to use this option in the command `cdc server` to set the temporary file directory. The default value of this option is `/tmp/cdc_sort`. When the unified sorter is enabled, if the default directory `/tmp/cdc_sort` on the sever is not writable or there is not enough space, you need to manually specify a directory in `sort-dir`. If the directory specified in `sort-dir` is not writable, `changefeed` stops automatically.
- `--sort-dir`: Specifies the temporary file directory of the sorting engine. For TiDB v4.0.12 and later versions, it is **NOT recommended** to use this option [in the command `cdc server` to set the temporary file directory](/ticdc/deploy-ticdc.md#description-of-ticdc-cdc-server-command-line-parameters). You are recommended to use this option in the command `cdc server` to set the temporary file directory. The default value of this option is `/tmp/cdc_sort`. When the unified sorter is enabled, if the default directory `/tmp/cdc_sort` on the sever is not writable or there is not enough space, you need to manually specify a directory in `sort-dir`. If the directory specified in `sort-dir` is not writable, `changefeed` stops automatically.

- `--config`: Specifies the configuration file of the `changefeed`.

Expand Down Expand Up @@ -304,7 +304,7 @@ cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id=simple-repl
"start-ts": 419036036249681921,
"target-ts": 0,
"admin-job-type": 0,
"sort-engine": "memory",
"sort-engine": "unified",
"sort-dir": ".",
"config": {
"case-sensitive": true,
Expand Down Expand Up @@ -795,8 +795,21 @@ Unified sorter is the sorting engine in TiCDC. This feature is introduced since
+ The data replication task in TiCDC is paused for a long time, during which a large amount of incremental data is accumulated and needs to be replicated.
+ The data replication task is started from an early timestamp so it becomes necessary to replicate a large amount of incremental data.
For the changefeeds created using `cdc cli` after v4.0.13, Unified Sorter is enabled by default; for the changefeeds that have existed before v4.0.13, the previous configuration is used.
To check whether or not the Unified Sorter feature is enabled on a changefeed, you can execute the following example command (assuming the IP address of the PD instance is `http://10.0.10.25:2379`):
{{< copyable "shell-regular" >}}
```shell
cdc cli --pd="http://10.0.10.25:2379" changefeed query --changefeed-id=simple-replication-task | grep 'sort-engine'
```
In the output of the above command, if the value of `sort-engine` is "unified", it means that Unified Sorter is enabled on the changefeed.
> **Note:**
>
> + If your servers use mechanical hard drives or other storage devices that have high latency or limited bandwidth, use the unified sorter with caution.
> + The total free capacity of hard drives must be greater than or equal to 128G. If you need to replicate a large amount of historical data, make sure that the free capacity on each node is greater than or equal to the size of the incremental data that needs to be replicated.
> + If your servers do not match the above requirements and you want to disable the unified sorter, you need to manually set `sort-engine` to `memory` for the changefeed.
> + Since v4.0.13, Unified sorter is enabled by default. If your servers do not match the above requirements and you want to disable the unified sorter, you need to manually set `sort-engine` to `memory` for the changefeed.
> + To enable Unified Sorter on an existing changefeed, see the methods provided in [How do I handle the OOM that occurs after TiCDC is restarted after a task interruption?](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-the-oom-that-occurs-after-ticdc-is-restarted-after-a-task-interruption).

0 comments on commit a49dda7

Please sign in to comment.