From 9806acbb6b564357a0b57203459c7bc2edef858e Mon Sep 17 00:00:00 2001 From: JoyinQ <56883733+Joyinqin@users.noreply.github.com> Date: Mon, 31 May 2021 13:01:37 +0800 Subject: [PATCH] This is an automated cherry-pick of #5719 Signed-off-by: ti-chi-bot --- ticdc/deploy-ticdc.md | 7 +++++++ ticdc/manage-ticdc.md | 37 ++++++++++++++++++++++++++++++++++--- 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/ticdc/deploy-ticdc.md b/ticdc/deploy-ticdc.md index 88145680544d2..3d3942f47f081 100644 --- a/ticdc/deploy-ticdc.md +++ b/ticdc/deploy-ticdc.md @@ -46,6 +46,8 @@ cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_2.log --addr=0.0.0.0:830 cdc server --pd=http://10.0.10.25:2379 --log-file=ticdc_3.log --addr=0.0.0.0:8303 --advertise-addr=127.0.0.1:8303 ``` +## Description of TiCDC `cdc server` command-line parameters + The following are descriptions of options available in the `cdc server` command: - `gc-ttl`: The TTL (Time To Live) of the service level `GC safepoint` in PD set by TiCDC, in seconds. The default value is `86400`, which means 24 hours. @@ -58,4 +60,9 @@ The following are descriptions of options available in the `cdc server` command: - `ca`: The path of the CA certificate file used by TiCDC, in the PEM format (optional). - `cert`: The path of the certificate file used by TiCDC, in the PEM format (optional). - `key`: The path of the certificate key file used by TiCDC, in the PEM format (optional). +<<<<<<< HEAD - `config`: The address of the configuration file that TiCDC uses (optional). This option is supported since TiCDC v4.0.13. This option can be used in the TiCDC deployment since TiUP v1.4.0. +======= +- `config`: The address of the configuration file that TiCDC uses (optional). This option is supported since TiCDC v5.0.0. This option can be used in the TiCDC deployment since TiUP v1.4.0. +- `sort-dir`: Specifies the temporary file directory of the sorting engine. The default value of this configuration item is `/tmp/cdc_sort`. When Unified Sorter is enabled, if this directory on the server is not writable or the available space is insufficient, you need to manually specify a directory in `sort-dir`. Make sure that TiCDC can read and write data in the `sort-dir` path. +>>>>>>> 73471fea6 (ticdc: add more docs about unified sorter (#5719)) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index e483a6ff43cf6..3ce92a1538b92 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -80,13 +80,13 @@ Execute the following commands to create a replication task: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" +cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" --sort-engine="unified" ``` ```shell Create changefeed successfully! ID: simple-replication-task -Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"memory","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"cyclic-replication":{"enable":false,"replica-id":0,"filter-replica-ids":null,"id-buckets":0,"sync-ddl":false},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null} +Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":"2020-03-12T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"unified","sort-dir":".","config":{"case-sensitive":true,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"cyclic-replication":{"enable":false,"replica-id":0,"filter-replica-ids":null,"id-buckets":0,"sync-ddl":false},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null} ``` - `--changefeed-id`: The ID of the replication task. The format must match the `^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$` regular expression. If this ID is not specified, TiCDC automatically generates a UUID (the version 4 format) as the ID. @@ -102,6 +102,7 @@ Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time": - `--start-ts`: Specifies the starting TSO of the `changefeed`. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time. - `--target-ts`: Specifies the ending TSO of the `changefeed`. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data. +<<<<<<< HEAD - `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must sort the data changes before writing them to the sink. This option supports `memory`/`unified`/`file`. - `memory`: Sorts data changes in memory. @@ -109,6 +110,15 @@ Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time": - `file`: Entirely uses the disk to store the temporary data. This feature is **deprecated**. It is **NOT recommended** to use it in **any** situation. - `--sort-dir`: Specifies the temporary file directory of the sorting engine. For TiDB v4.0.12 and later versions, it is **NOT recommended** to use this option in the command `cdc cli changefeed create`. You are recommended to use this option in the command `cdc server` to set the temporary file directory. The default value of this option is `/tmp/cdc_sort`. When the unified sorter is enabled, if the default directory `/tmp/cdc_sort` on the sever is not writable or there is not enough space, you need to manually specify a directory in `sort-dir`. If the directory specified in `sort-dir` is not writable, `changefeed` stops automatically. +======= +- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must sort the data changes before writing them to the sink. This option supports `unified` (by default)/`memory`/`file`. + + - `unified`: When `unified` is used, TiCDC prefers data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. This is the default value of `--sort-engine`. + - `memory`: Sorts data changes in memory. It is **NOT recommended** to use this sorting engine, because OOM is easily triggered when you replicate a large amount of data. + - `file`: Entirely uses the disk to store the temporary data. This feature is **deprecated**. It is **NOT recommended** to use it in **any** situation. + +- `--sort-dir`: Specifies the temporary file directory of the sorting engine. It is **NOT recommended** to use this option in the command `cdc cli changefeed create`. You are recommended to use this option [in the command `cdc server` to set the temporary file directory](/ticdc/deploy-ticdc.md#description-of-ticdc-cdc-server-command-line-parameters). The default value of this option is `/tmp/cdc_sort`. When the unified sorter is enabled, if the default directory `/tmp/cdc_sort` on the sever is not writable or there is not enough space, you need to manually specify a directory in `sort-dir`. If the directory specified in `sort-dir` is not writable, `changefeed` stops automatically. +>>>>>>> 73471fea6 (ticdc: add more docs about unified sorter (#5719)) - `--config`: Specifies the configuration file of the `changefeed`. @@ -304,7 +314,7 @@ cdc cli changefeed query --pd=http://10.0.10.25:2379 --changefeed-id=simple-repl "start-ts": 419036036249681921, "target-ts": 0, "admin-job-type": 0, - "sort-engine": "memory", + "sort-engine": "unified", "sort-dir": ".", "config": { "case-sensitive": true, @@ -790,13 +800,34 @@ force-replicate = true ## Unified Sorter +<<<<<<< HEAD Unified sorter is the sorting engine in TiCDC. This feature is introduced since v4.0.9. It can mitigate OOM problems caused by the following scenarios: +======= +Unified sorter is the sorting engine in TiCDC. It can mitigate OOM problems caused by the following scenarios: +>>>>>>> 73471fea6 (ticdc: add more docs about unified sorter (#5719)) + The data replication task in TiCDC is paused for a long time, during which a large amount of incremental data is accumulated and needs to be replicated. + The data replication task is started from an early timestamp so it becomes necessary to replicate a large amount of incremental data. +For the changefeeds created using `cdc cli` after v4.0.13, Unified Sorter is enabled by default; for the changefeeds that have existed before v4.0.13, the previous configuration is used. + +To check whether or not the Unified Sorter feature is enabled on a changefeed, you can execute the following example command (assuming the IP address of the PD instance is `http://10.0.10.25:2379`): + +{{< copyable "shell-regular" >}} + +```shell +cdc cli --pd="http://10.0.10.25:2379" changefeed query --changefeed-id=simple-replication-task | grep 'sort-engine' +``` + +In the output of the above command, if the value of `sort-engine` is "unified", it means that Unified Sorter is enabled on the changefeed. + > **Note:** > > + If your servers use mechanical hard drives or other storage devices that have high latency or limited bandwidth, use the unified sorter with caution. > + The total free capacity of hard drives must be greater than or equal to 128G. If you need to replicate a large amount of historical data, make sure that the free capacity on each node is greater than or equal to the size of the incremental data that needs to be replicated. +<<<<<<< HEAD > + If your servers do not match the above requirements and you want to disable the unified sorter, you need to manually set `sort-engine` to `memory` for the changefeed. +======= +> + Unified sorter is enabled by default. If your servers do not match the above requirements and you want to disable the unified sorter, you need to manually set `sort-engine` to `memory` for the changefeed. +> + To enable Unified Sorter on an existing changefeed, see the methods provided in [How do I handle the OOM that occurs after TiCDC is restarted after a task interruption?](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-the-oom-that-occurs-after-ticdc-is-restarted-after-a-task-interruption). +>>>>>>> 73471fea6 (ticdc: add more docs about unified sorter (#5719))