From 7d3eaeef125676893e949686044c5e8272a5c8ae Mon Sep 17 00:00:00 2001 From: shichun-0415 <89768198+shichun-0415@users.noreply.github.com> Date: Mon, 7 Mar 2022 17:53:49 +0800 Subject: [PATCH] Update lightning requirements and add memory info (#7768) --- TOC.md | 11 +- migrate-large-mysql-shards-to-tidb.md | 42 +----- tidb-lightning/deploy-tidb-lightning.md | 14 -- tidb-lightning/tidb-lightning-backends.md | 2 +- tidb-lightning/tidb-lightning-faq.md | 17 +-- tidb-lightning/tidb-lightning-requirements.md | 120 ++++++++++++++++++ 6 files changed, 135 insertions(+), 71 deletions(-) create mode 100644 tidb-lightning/tidb-lightning-requirements.md diff --git a/TOC.md b/TOC.md index e7a457019beed..a238905c9ec09 100644 --- a/TOC.md +++ b/TOC.md @@ -226,10 +226,10 @@ - [FAQ](/tidb-binlog/tidb-binlog-faq.md) - TiDB Lightning - [Overview](/tidb-lightning/tidb-lightning-overview.md) - - [Tutorial](/get-started-with-tidb-lightning.md) - - [Deploy](/tidb-lightning/deploy-tidb-lightning.md) - - [Precheck](/tidb-lightning/tidb-lightning-prechecks.md) - - [Configure](/tidb-lightning/tidb-lightning-configuration.md) + - Prechecks and requirements + - [Prechecks](/tidb-lightning/tidb-lightning-prechecks.md) + - [Downstream privilege requirements](/tidb-lightning/tidb-lightning-requirements.md) + - [Downstream storage space requirements](/tidb-lightning/tidb-lightning-requirements.md#downstream-storage-space-requirements) - Key Features - [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md) - [Table Filter](/table-filter.md) @@ -238,6 +238,9 @@ - [Import Data in Parallel](/tidb-lightning/tidb-lightning-distributed-import.md) - [Error Resolution](/tidb-lightning/tidb-lightning-error-resolution.md) - [Web Interface](/tidb-lightning/tidb-lightning-web-interface.md) + - [Tutorial](/get-started-with-tidb-lightning.md) + - [Deploy](/tidb-lightning/deploy-tidb-lightning.md) + - [Configure](/tidb-lightning/tidb-lightning-configuration.md) - [Monitor](/tidb-lightning/monitor-tidb-lightning.md) - [FAQ](/tidb-lightning/tidb-lightning-faq.md) - [Glossary](/tidb-lightning/tidb-lightning-glossary.md) diff --git a/migrate-large-mysql-shards-to-tidb.md b/migrate-large-mysql-shards-to-tidb.md index 518458707d092..cc3ecd9477aa5 100644 --- a/migrate-large-mysql-shards-to-tidb.md +++ b/migrate-large-mysql-shards-to-tidb.md @@ -21,8 +21,8 @@ In this document, you can migrate data following this procedure: 1. Use Dumpling to export full data. In this example, you export 2 tables respectively from 2 upstream databases: - - Export `table1` and `table2` from `my_db1` - - Export `table3` and `table4` from `my_db2` + - Export `table1` and `table2` from `my_db1` + - Export `table3` and `table4` from `my_db2` 2. Start TiDB Lightning to migrate data to `mydb.table5` in TiDB. @@ -34,44 +34,14 @@ Before getting started, see the following documents to prepare for the migration - [Deploy a DM Cluster Using TiUP](/dm/deploy-a-dm-cluster-using-tiup.md) - [Use TiUP to Deploy Dumpling and Lightning](/migration-tools.md) +- [Downstream privilege requirements for Dumpling](/dumpling-overview.md#export-data-from-tidbmysql) +- [Downstream privilege requirements for TiDB Lightning](/tidb-lightning/tidb-lightning-requirements.md#downstream-privilege-requirements) +- [Downstream storage space for TiDB Lightning](/tidb-lightning/tidb-lightning-requirements.md#downstream-storage-space-requirements) - [Privileges required by DM-worker](/dm/dm-worker-intro.md) -- [Upstream Permissions for Lightning](/tidb-lightning/tidb-lightning-faq.md#what-are-the-privilege-requirements-for-the-target-database) -- [Downstream Permissions for Dumpling](/dumpling-overview.md#export-data-from-tidbmysql) - -### Resource requirements - -**Operating system**: Examples in this document use new, clean CentOS 7 instances. You can deploy a virtual machine on your own host locally, or on a vendor-provided cloud platform. TiDB Lightning consumes as much CPU resources as needed by default, so it is recommended to deploy TiDB Lightning on a dedicated machine. If you do not have a dedicated machine for TiDB Lightning, you can deploy TiDB Lightning on a shared machine with other components (such as `tikv-server`) and limit TiDB Lightning's CPU usage by configuring `region-concurrency` to 75% of the number of logical CPUs. - -**Memory and CPU**: TiDB Lightning consumes high resources, so it is recommended to allocate more than 64 GB of memory and 32-core CPU for TiDB Lightning. To get the best performance, make sure the CPU core to memory (GB) ratio is more than 1:2. - -**Disk space**: - -- Dumpling requires enough disk space to store the whole data source. SSD is recommended. -- During the import, TiDB Lightning needs temporary space to store the sorted key-value pairs. The disk space should be enough to hold the largest single table from the data source. -- If the full data volume is large, you can increase the binlog storage time in the upstream. This is to ensure that the binlogs are not lost during the incremental replication. - -**Note**: You cannot calculate the exact data volume exported by Dumpling from MySQL, but you can estimate the data volume by using the following SQL statement to summarize the `data-length` field in the `information_schema.tables` table: - -{{< copyable "" >}} - -```sql -/* Calculate the size of all schemas, in MiB. Replace ${schema_name} with your schema name. */ -SELECT table_schema,SUM(data_length)/1024/1024 AS data_length,SUM(index_length)/1024/1024 AS index_length,SUM(data_length+index_length)/1024/1024 AS SUM FROM information_schema.tables WHERE table_schema = "${schema_name}" GROUP BY table_schema; - -/* Calculate the size of the largest table, in MiB. Replace ${schema_name} with your schema name. */ -SELECT table_name,table_schema,SUM(data_length)/1024/1024 AS data_length,SUM(index_length)/1024/1024 AS index_length,SUM(data_length+index_length)/1024/1024 AS SUM from information_schema.tables WHERE table_schema = "${schema_name}" GROUP BY table_name,table_schema ORDER BY SUM DESC LIMIT 5; -``` - -### Disk space for the target TiKV cluster - -The target TiKV cluster must have enough disk space to store the imported data. In addition to [the standard hardware requirements](/hardware-and-software-requirements.md), the storage space of the target TiKV cluster must be larger than **the size of the data source x [the number of replicas](/faq/deploy-and-maintain-faq.md#is-the-number-of-replicas-in-each-region-configurable-if-yes-how-to-configure-it) x 2**. For example, if the cluster uses 3 replicas by default, the target TiKV cluster must have a storage space larger than 6 times the size of the data source. The formula has `x 2` because: - -- Index might take extra space. -- RocksDB has a space amplification effect. ### Check conflicts for Sharded Tables -If the migration involves merging data from different sharded tables, primary key or unique index conflicts may occur during the merge. Therefore, before migration, you need to take a deep look at the current sharding scheme from the business point of view, and find a way to avoid the conflicts. For more details, see [Handle conflicts between primary keys or unique indexes across multiple sharded tables](/dm/shard-merge-best-practices.md#handle-conflicts-between-primary-keys-or-unique-indexes-across-multiple-sharded-tables). The following is a brief description. +If the migration involves merging data from different sharded tables, primary key or unique index conflicts may occur during the merge. Therefore, before migration, you need to take a deep look at the current sharding scheme from the business point of view, and find a way to avoid conflicts. For more details, see [Handle conflicts between primary keys or unique indexes across multiple sharded tables](/dm/shard-merge-best-practices.md#handle-conflicts-between-primary-keys-or-unique-indexes-across-multiple-sharded-tables). The following is a brief description. Assume that tables 1~4 have the same table structure as follows. diff --git a/tidb-lightning/deploy-tidb-lightning.md b/tidb-lightning/deploy-tidb-lightning.md index b5c6f9a3f9663..7a93615ed5452 100644 --- a/tidb-lightning/deploy-tidb-lightning.md +++ b/tidb-lightning/deploy-tidb-lightning.md @@ -20,20 +20,6 @@ Before starting TiDB Lightning, note that: bin/tidb-lightning-ctl --switch-mode=normal ``` -- TiDB Lightning is required to have the following privileges in the downstream TiDB: - - | Privilege | Scope | - |----:|:------| - | SELECT | Tables | - | INSERT | Tables | - | UPDATE | Tables | - | DELETE | Tables | - | CREATE | Databases, tables | - | DROP | Databases, tables | - | ALTER | Tables | - - If the `checksum` configuration item of TiDB Lightning is set to `true`, then the admin user privileges in the downstream TiDB need to be granted to TiDB Lightning. - ## Hardware requirements `tidb-lightning` is a resource-intensive program. It is recommended to deploy it as follows. diff --git a/tidb-lightning/tidb-lightning-backends.md b/tidb-lightning/tidb-lightning-backends.md index 6c9e57131d0bd..84f5e205a36c3 100644 --- a/tidb-lightning/tidb-lightning-backends.md +++ b/tidb-lightning/tidb-lightning-backends.md @@ -62,7 +62,7 @@ When using the TiDB-backend, deploying `tikv-importer` is not necessary. Compare The speed of TiDB Lightning using TiDB-backend is limited by the SQL processing speed of TiDB. Therefore, even a lower-end machine may max out the possible performance. The recommended hardware configuration is: -* 16 logical cores CPU +* 4 logical cores CPU * An SSD large enough to store the entire data source, preferring higher read speed * 1 Gigabit network card diff --git a/tidb-lightning/tidb-lightning-faq.md b/tidb-lightning/tidb-lightning-faq.md index ddb0afc8a1ea5..f12af703887db 100644 --- a/tidb-lightning/tidb-lightning-faq.md +++ b/tidb-lightning/tidb-lightning-faq.md @@ -16,22 +16,7 @@ Yes. ## What are the privilege requirements for the target database? -TiDB Lightning requires the following privileges: - -* SELECT -* UPDATE -* ALTER -* CREATE -* DROP - -If the [TiDB-backend](/tidb-lightning/tidb-lightning-backends.md#tidb-lightning-tidb-backend) is chosen, or the target database is used to store checkpoints, it additionally requires these privileges: - -* INSERT -* DELETE - -The Local-backend and Importer-backend do not require these two privileges because data is ingested into TiKV directly, which bypasses the entire TiDB privilege system. This is secure as long as the ports of TiKV, TiKV Importer and TiDB Lightning are not reachable outside the cluster. - -If the `checksum` configuration of TiDB Lightning is set to `true`, then the admin user privileges in the downstream TiDB need to be granted to TiDB Lightning. +For details about the permissions, see [Prerequisites for using TiDB Lightning](/tidb-lightning/tidb-lightning-requirements.md). ## TiDB Lightning encountered an error when importing one table. Will it affect other tables? Will the process be terminated? diff --git a/tidb-lightning/tidb-lightning-requirements.md b/tidb-lightning/tidb-lightning-requirements.md new file mode 100644 index 0000000000000..9b61d292a32c9 --- /dev/null +++ b/tidb-lightning/tidb-lightning-requirements.md @@ -0,0 +1,120 @@ +--- +title: Prerequisites for using TiDB Lightning +summary: Learn prerequisites for running TiDB Lightning. +--- + +# Prerequisites for using TiDB Lightning + +Before using TiDB Lightning, you need to check whether the environment meets the requirements. This helps reduce errors during import and ensures import success. + +## Downstream privilege requirements + +Based on the import mode and features enabled, downstream database users should be granted with different privileges. The following table provides a reference. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FeatureScopeRequired privilegeRemarks
MandatoryBasic functionsTarget tableCREATE, SELECT, INSERT, UPDATE, DELETE, DROP, ALTERDROP is required only when tidb-lightning-ctl runs the checkpoint-destroy-all command
Target databaseCREATE
Mandatorytidb-backendinformation_schema.columnsSELECT
local-backendmysql.tidbSELECT
-SUPER
-RESTRICTED_VARIABLES_ADMIN,RESTRICTED_TABLES_ADMINRequired when the target TiDB enables SEM
RecommendedConflict detection, max-errorSchema configured for lightning.task-info-schema-nameSELECT, INSERT, UPDATE, DELETE, CREATE, DROPIf not required, the value must be set to ""
OptionalParallel importSchema configured for lightning.meta-schema-nameSELECT, INSERT, UPDATE, DELETE, CREATE, DROPIf not required, the value must be set to ""
Optionalcheckpoint.driver = “mysql”checkpoint.schema settingSELECT,INSERT,UPDATE,DELETE,CREATE,DROPRequired when checkpoint information is stored in databases, instead of files
+ +## Downstream storage space requirements + +The target TiKV cluster must have enough disk space to store the imported data. In addition to the [standard hardware requirements](/hardware-and-software-requirements.md), the storage space of the target TiKV cluster must be larger than **the size of the data source x the number of replicas x 2**. For example, if the cluster uses 3 replicas by default, the target TiKV cluster must have a storage space larger than 6 times the size of the data source. The formula has x 2 because: + +- Indexes might take extra space. +- RocksDB has a space amplification effect. + +It is difficult to calculate the exact data volume exported by Dumpling from MySQL. However, you can estimate the data volume by using the following SQL statement to summarize the data-length field in the information_schema.tables table: + +Calculate the size of all schemas, in MiB. Replace ${schema_name} with your schema name. + +```sql +select table_schema,sum(data_length)/1024/1024 as data_length,sum(index_length)/1024/1024 as index_length,sum(data_length+index_length)/1024/1024 as sum from information_schema.tables where table_schema = "${schema_name}" group by table_schema; +``` + +Calculate the size of the largest table, in MiB. Replace ${schema_name} with your schema name. + +{{< copyable "sql" >}} + +```sql +select table_name,table_schema,sum(data_length)/1024/1024 as data_length,sum(index_length)/1024/1024 as index_length,sum(data_length+index_length)/1024/1024 as sum from information_schema.tables where table_schema = "${schema_name}" group by table_name,table_schema order by sum desc limit 5; +``` + +## Resource requirements + +**Operating system**: The example in this document uses fresh CentOS 7 instances. You can deploy a virtual machine either on your local host or in the cloud. Because TiDB Lightning consumes as much CPU resources as needed by default, it is recommended that you deploy it on a dedicated server. If this is not possible, you can deploy it on a single server together with other TiDB components (for example, tikv-server) and then configure `region-concurrency` to limit the CPU usage from TiDB Lightning. Usually, you can configure the size to 75% of the logical CPU. + +**Memory and CPU**: + +The CPU and memory consumed by TiDB Lightning vary with the backend mode. Run TiDB Lightning in an environment that supports the optimal import performance based on the backend you use. + +- Local-backend: TiDB lightning consumes much CPU and memory in this mode. It is recommended that you allocate CPU higher than 32 cores and memory greater than 64 GiB. + +> **Note**: +> +> When data to be imported is large, one parallel import may consume about 2 GiB memory. In this case, the total memory usage can be `region-concurrency` x 2 GiB. `region-concurrency` is the same as the number of logical CPUs. If the memory size (GiB) is less than twice of the CPU or OOM ocurs during the import, you can decrease `region-concurrency` to address OOM. + +- TiDB-backend: In this mode, the performance bottlneck lies in TiDB. It is recommended that you allocate 4-core CPU and 8 GiB memory for TiDB Lightning. If the TiDB cluster does not reach the write threshold in an import, you can increase `region-concurrency`. +- Importer-backend: In this mode, resource consumption is nearly the same as that in Local-backend. Importer-backend is not recommended and you are advised to use Local-backend if you have no particular requirements. + +**Storage space**: The `sorted-kv-dir` configuration item specifies the temporary storage directory for the sorted key-value files. The directory must be empty, and the storage space must be enough to store the largest single table from the data source. For better import performance, it is recommended to use a directory different from `data-source-dir` and use flash storage and exclusive I/O for the directory.