Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/tidb-lightning: document backend and that system DBs are filtered #1620

Merged
merged 15 commits into from
Nov 15, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dev/reference/tools/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ If you want to download the latest version of [TiDB Lightning](/dev/reference/to

| Package name | OS | Architecture | SHA256 checksum |
|:---|:---|:---|:---|
| [tidb-toolkit-latest-linux-amd64.tar.gz](http://download.pingcap.org/tidb-toolkit-latest-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-latest-linux-amd64.sha256](http://download.pingcap.org/tidb-toolkit-latest-linux-amd64.sha256) |
| [tidb-toolkit-latest-linux-amd64.tar.gz](https://download.pingcap.org/tidb-toolkit-latest-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-latest-linux-amd64.sha256](https://download.pingcap.org/tidb-toolkit-latest-linux-amd64.sha256) |

## DM (Data Migration)

Expand Down
10 changes: 9 additions & 1 deletion dev/reference/tools/tidb-lightning/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,15 @@ driver = "file"
#keep-after-success = false

[tikv-importer]
# The listening address of tikv-importer. Change it to the actual address.
# Delivery back end, can be "importer" or "tidb".
#backend = "importer"
kennytm marked this conversation as resolved.
Show resolved Hide resolved
# The listening address of tikv-importer when back end is "importer". Change it to the actual address.
addr = "172.16.31.10:8287"
# Action to do when trying to insert a duplicated entry in the "tidb" back end.
# - replace: new entry replaces existing entry
# - ignore: keep existing entry, ignore new entry
# - error: report error and quit the program
#on-duplicate = "replace"

[mydumper]
# Block size for file reading. Keep it longer than the longest string of
Expand Down Expand Up @@ -288,6 +295,7 @@ min-available-ratio = 0.05
| -V | Prints program version | |
| -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` |
| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` |
| --backend *backend* | [Delivery back end](/dev/reference/tools/tidb-lightning/tidb_backend.md) (`importer` or `tidb`) | `tikv-importer.backend` |
| --log-file *file* | Log file path | `lightning.log-file` |
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
| --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` |
Expand Down
4 changes: 3 additions & 1 deletion dev/reference/tools/tidb-lightning/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ category: reference

# TiDB Lightning Deployment

This document describes the hardware requirements of TiDB Lightning on separate deployment and mixed deployment, and how to deploy it using Ansible or manually.
This document describes the hardware requirements of TiDB Lightning using the default "Importer" back end, and how to deploy it using Ansible or manually.

If you wish to use the "TiDB" back end, also read [TiDB Lightning "TiDB" Back End](/dev/reference/tools/tidb-lightning/tidb_backend.md) for the changes to the deployment steps.
kennytm marked this conversation as resolved.
Show resolved Hide resolved

## Notes

Expand Down
2 changes: 2 additions & 0 deletions dev/reference/tools/tidb-lightning/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,5 @@ The complete import process is as follows:
The auto-increment ID of a table is computed by the estimated *upper bound* of the number of rows, which is proportional to the total file size of the data files of the table. Therefore, the final auto-increment ID is often much larger than the actual number of rows. This is expected since in TiDB auto-increment is [not necessarily allocated sequentially](/dev/reference/mysql-compatibility.md#auto-increment-id).

7. Finally, `tidb-lightning` switches the TiKV cluster back to "normal mode", so the cluster resumes normal services.

TiDB Lightning also supports using "TiDB" instead of "Importer" as the back end. In this configuration, `tidb-lightning` transforms data into SQL `INSERT` statements and directly execute them on the target cluster, similar to Loader. See [TiDB Lightning "TiDB" Back End](/dev/reference/tools/tidb-lightning/tidb_backend.md) for details.
kennytm marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions dev/reference/tools/tidb-lightning/table-filter.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ ignore-dbs = ["pattern4", "pattern5"]

The pattern can either be a simple name, or a regular expression in [Go dialect](https://golang.org/pkg/regexp/syntax/#hdr-syntax) if it starts with a `~` character.

>**Note:**
kennytm marked this conversation as resolved.
Show resolved Hide resolved
>
> The system databases `INFORMATION_SCHEMA`, `PERFORMANCE_SCHEMA`, `mysql` and `sys` are always black-listed regardless of the table filter settings.

## Filtering tables

```toml
Expand Down
225 changes: 225 additions & 0 deletions dev/reference/tools/tidb-lightning/tidb_backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
---
kennytm marked this conversation as resolved.
Show resolved Hide resolved
title: TiDB Lightning "TiDB" Back End
summary: Choose how to write data into the TiDB cluster.
category: reference
---

# TiDB Lightning "TiDB" Back End

TiDB Lightning supports two back ends: "Importer" and "TiDB". It determines how `tidb-lightning` delivers data into the target cluster.

The "Importer" back end (default) requires `tidb-lightning` to first encode the SQL/CSV data into KV pairs, and relies on the external `tikv-importer` program to sort these KV pairs and ingest directly into the TiKV nodes.
kennytm marked this conversation as resolved.
Show resolved Hide resolved

The "TiDB" back end requires `tidb-lightning` to encode these data into SQL `INSERT` statements, and have these executed directly on the TiDB node.
kennytm marked this conversation as resolved.
Show resolved Hide resolved

| Back end | "Importer" | "TiDB" |
|:---|:---|:---|
| Speed | Fast (~300 GB/hr) | Slow (~50 GB/hr) |
| Resource usage | High | Low |
| ACID respected while importing | No | Yes |
| Target tables | Must be empty | Can be populated |

## Deployment for "TiDB" back end

When using the "TiDB" back end, you no longer need `tikv-importer`. Compared with the [standard deployment procedure](/dev/reference/tools/tidb-lightning/deployment.md),
kennytm marked this conversation as resolved.
Show resolved Hide resolved

* steps involving `tikv-importer` can all be skipped
kennytm marked this conversation as resolved.
Show resolved Hide resolved
* the configuration must be changed to indicate "TiDB" back end is used
kennytm marked this conversation as resolved.
Show resolved Hide resolved

### Ansible deployment

1. The `[importer_server]` section in `inventory.ini` can be left blank.

```ini
...

[importer_server]
# keep empty

[lightning_server]
192.168.20.10

...
```

2. The `tikv_importer_port` setting in `group_vars/all.yml` is ignored, and the file `group_vars/importer_server.yml` does not need to be changed.

But you need to edit `conf/tidb-lightning.yml` and change the `backend` setting to `tidb`.
kennytm marked this conversation as resolved.
Show resolved Hide resolved

```yaml
...
tikv_importer:
backend: "tidb" # <-- change this
...
```

3. Bootstrap and deploy the cluster as usual.

4. Mount the data source for TiDB Lightning as usual.

5. Start `tidb-lightning` as usual.

### Manual deployment

You do not need to download and configure `tikv-importer`.

Before running `tidb-lightning`, add the following lines into the configuration file:

```toml
[tikv-importer]
backend = "tidb"
```

or supplying the `--backend tidb` arguments when executing `tidb-lightning`.

## Conflict resolution

The "TiDB" back end supports importing to an already-populated table. However, the new data may cause unique key conflict with the old data. You can control how to resolve the conflict by this task configuration
kennytm marked this conversation as resolved.
Show resolved Hide resolved

```toml
[tikv-importer]
backend = "tidb"
on-duplicate = "replace" # or "error" or "ignore"
```

| Setting | Behavior on conflict | Equivalent SQL statement |
|:---|:---|:---|
| replace | New entries replace old ones | `REPLACE INTO ...` |
| ignore | Keep old entries and ignore new ones | `INSERT IGNORE INTO ...` |
| error | Abort import | `INSERT INTO ...` |

## Migrating from Loader to TiDB Lightning "TiDB" back end

TiDB Lightning using "TiDB" back end can completely replace functions of [Loader](/dev/reference/tools/loader.md). The following lists how to translate Loader configurations into [TiDB Lightning configurations](/dev/reference/tools/tidb-lightning/config.md).
kennytm marked this conversation as resolved.
Show resolved Hide resolved

<table>
<thead><tr><th>Loader</th><th>TiDB Lightning</th></tr></thread>
<tbody>
<tr><td>

```toml

# logging
log-level = "info"
log-file = "loader.log"

# Prometheus
status-addr = ":8272"

# concurrency
pool-size = 16
```

</td><td>

```toml
[lightning]
# logging
level = "info"
file = "tidb-lightning.log"

# Prometheus
pprof-port = 8289

# concurrency (better left as default)
#region-concurrency = 16
```

</td></tr>
<tr><td>

```toml

# checkpoint database

checkpoint-schema = "tidb_loader"






```

</td><td>

```toml
[checkpoint]
# checkpoint storage
enable = true
schema = "tidb_lightning_checkpoint"
# by default the checkpoint is stored in
# a local file, which is more efficient.
# but you could still choose to store the
# checkpoints in the target database with
# this setting:
#driver = "mysql"
```

</td></tr>
<tr><td>

```toml



```

</td><td>

```toml
[tikv-importer]
# use the "TiDB" back end
backend = "tidb"
```

</td></tr>
<tr><td>

```toml

# data source directory
dir = "/data/export/"
```

</td><td>

```toml
[mydumper]
# data source directory
data-source-dir = "/data/export"
```

</td></tr>

<tr><td>

```toml
[db]
# TiDB connection parameters
host = "127.0.0.1"
port = 4000

user = "root"
password = ""

#sql-mode = ""
```

</td><td>

```toml
[tidb]
# TiDB connection parameters
host = "127.0.0.1"
port = 4000
status-port = 10080 # <- this is required
user = "root"
password = ""

#sql-mode = ""
```

</td></tr>
</tbody>
</table>
2 changes: 1 addition & 1 deletion v2.1/reference/tools/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In addition, the Kafka version of TiDB Binlog is also provided.

| Package name | OS | Architecture | SHA256 checksum |
|:---|:---|:---|:---|
| [tidb-v2.1.16-linux-amd64.tar.gz](http://download.pingcap.org/tidb-v2.1.16-linux-amd64.tar.gz) (TiDB Binlog, TiDB Lightning) | Linux | amd64 |[tidb-v2.1.16-linux-amd64.sha256](http://download.pingcap.org/tidb-v2.1.16-linux-amd64.sha256)|
| [tidb-v2.1.17-linux-amd64.tar.gz](https://download.pingcap.org/tidb-v2.1.17-linux-amd64.tar.gz) (TiDB Binlog, TiDB Lightning) | Linux | amd64 |[tidb-v2.1.17-linux-amd64.sha256](https://download.pingcap.org/tidb-v2.1.17-linux-amd64.sha256)|
| [tidb-binlog-kafka-linux-amd64.tar.gz](http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.tar.gz) (the Kafka version of TiDB Binlog) | Linux | amd64 |[tidb-binlog-kafka-linux-amd64.sha256](http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.sha256)|

## DM (Data Migration)
Expand Down
2 changes: 1 addition & 1 deletion v3.0/reference/tools/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you want to download the 3.0 version of [TiDB Lightning](/v3.0/reference/tool

| Package name | OS | Architecture | SHA256 checksum |
|:---|:---|:---|:---|
| [tidb-toolkit-v3.0.3-linux-amd64.tar.gz](http://download.pingcap.org/tidb-toolkit-v3.0.3-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-v3.0.3-linux-amd64.sha256](http://download.pingcap.org/tidb-toolkit-v3.0.3-linux-amd64.sha256) |
| [tidb-toolkit-v3.0.5-linux-amd64.tar.gz](https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-v3.0.5-linux-amd64.sha256](https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.sha256) |

## DM (Data Migration)

Expand Down
13 changes: 11 additions & 2 deletions v3.0/reference/tools/tidb-lightning/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ category: reference

# TiDB Lightning Deployment

This document describes the hardware requirements of TiDB Lightning on separate deployment and mixed deployment, and how to deploy it using Ansible or manually.
This document describes the hardware requirements of TiDB Lightning using the default "Importer" back end, and how to deploy it using Ansible or manually.

If you wish to use the "TiDB" back end, also read [TiDB Lightning "TiDB" Back End](/v3.0/reference/tools/tidb-lightning/tidb_backend.md) for the changes to the deployment steps.

## Notes

Expand Down Expand Up @@ -343,8 +345,15 @@ Follow the link to download the TiDB Lightning package (choose the same version
# keep-after-success = false

[tikv-importer]
# The listening address of tikv-importer. Change it to the actual address.
# Delivery back end, can be "importer" or "tidb".
# backend = "importer"
# The listening address of tikv-importer when back end is "importer". Change it to the actual address.
addr = "172.16.31.10:8287"
# Action to do when trying to insert a duplicated entry in the "tidb" back end.
# - replace: new entry replaces existing entry
# - ignore: keep existing entry, ignore new entry
# - error: report error and quit the program
# on-duplicate = "replace"

[mydumper]
# Block size for file reading. Keep it longer than the longest string of
Expand Down
6 changes: 5 additions & 1 deletion v3.0/reference/tools/tidb-lightning/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ The complete import process is as follows:

There are two kinds of engine files: *data engines* and *index engines*, each corresponding to two kinds of KV pairs: the row data and secondary indices. Normally, the row data are entirely sorted in the data source, while the secondary indices are out of order. Because of this, the data engines are uploaded as soon as a batch is completed, while the index engines are imported only after all batches of the entire table are encoded.

6. After all engines associated to a table are imported, `tidb-lightning` performs a checksum comparison between the local data source and those calculated from the cluster, to ensure there is no data corruption in the process, and tells TiDB to `ANALYZE` all imported tables, to prepare for optimal query planning.
6. After all engines associated to a table are imported, `tidb-lightning` performs a checksum comparison between the local data source and those calculated from the cluster, to ensure there is no data corruption in the process; tells TiDB to `ANALYZE` all imported tables, to prepare for optimal query planning; and adjusts the `AUTO_INCREMENT` value so future insertions will not cause conflict.

The auto-increment ID of a table is computed by the estimated *upper bound* of the number of rows, which is proportional to the total file size of the data files of the table. Therefore, the final auto-increment ID is often much larger than the actual number of rows. This is expected since in TiDB auto-increment is [not necessarily allocated sequentially](/v3.0/reference/mysql-compatibility.md#auto-increment-id).

7. Finally, `tidb-lightning` switches the TiKV cluster back to "normal mode", so the cluster resumes normal services.

TiDB Lightning also supports using "TiDB" instead of "Importer" as the back end. In this configuration, `tidb-lightning` transforms data into SQL `INSERT` statements and directly execute them on the target cluster, similar to Loader. See [TiDB Lightning "TiDB" Back End](/v3.0/reference/tools/tidb-lightning/tidb_backend.md) for details.
4 changes: 4 additions & 0 deletions v3.0/reference/tools/tidb-lightning/table-filter.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ ignore-dbs = ["pattern4", "pattern5"]

The pattern can either be a simple name, or a regular expression in [Go dialect](https://golang.org/pkg/regexp/syntax/#hdr-syntax) if it starts with a `~` character.

>**Note:**
>
> The system databases `INFORMATION_SCHEMA`, `PERFORMANCE_SCHEMA`, `mysql` and `sys` are always black-listed regardless of the table filter settings.

## Filtering tables

```toml
Expand Down
Loading