From a661e3ee38f8cfbc037a4c22ea047c493c95a167 Mon Sep 17 00:00:00 2001 From: yikeke Date: Mon, 27 Aug 2018 13:48:08 +0800 Subject: [PATCH 1/2] *: add slow-query doc Via: https://github.com/pingcap/docs-cn/pull/849/ --- FAQ.md | 2 +- README.md | 1 + op-guide/tidb-config-file.md | 6 +++ sql/slow-query.md | 94 ++++++++++++++++++++++++++++++++++++ tools/tikv-control.md | 2 +- trouble-shooting.md | 3 +- 6 files changed, 105 insertions(+), 3 deletions(-) create mode 100644 sql/slow-query.md diff --git a/FAQ.md b/FAQ.md index 05b92ac818f6a..caf43da05f9ad 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1052,6 +1052,6 @@ When you execute the `LOAD DATA LOCAL` statement but the MySQL client does not a The solution is to use the `--local-infile=1` option when you start the MySQL client. For example, use command like `mysql --local-infile=1 -u root -h 127.0.0.1 -P 4000`. The default value of `local-infile` is different in different versions of MySQL client, therefore you need to configure it in some MySQL clients and do not need to configure it in some others. -#### ERROR 9001 (HY000): PD server timeoutstart timestamp may fall behind safepoint +#### ERROR 9001 (HY000): PD server timeout start timestamp may fall behind safe point This error occurs when TiDB fails to access PD. A worker in the TiDB background continuously queries the safepoint from PD and this error occurs if it fails to query within 100s. Generally it is because the PD failure or network failure between TiDB and PD. For the details of common errors, see [Error Number and Fault Diagnosis](sql/error.md). diff --git a/README.md b/README.md index 3ad35c4f2079b..b5057591503ae 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,7 @@ - [Error Codes and Troubleshooting](sql/error.md) - [Compatibility with MySQL](sql/mysql-compatibility.md) - [TiDB Memory Control](sql/tidb-memory-control.md) + - [SLOW-QUERY](sql/slow-query.md) + Advanced Usage - [Read Data From History Versions](op-guide/history-read.md) - [Garbage Collection (GC)](op-guide/gc.md) diff --git a/op-guide/tidb-config-file.md b/op-guide/tidb-config-file.md index eadb42a3821ab..9e615b509c73f 100644 --- a/op-guide/tidb-config-file.md +++ b/op-guide/tidb-config-file.md @@ -50,6 +50,12 @@ Configuration about log. - Default: false - If you set the value to true, the log does not output timestamp +### `slow-query-file` + +- The file name of the slow query log +- Default: "" +- After you set it, the slow query log outputs to this file separately + ### `slow-threshold` - To output the threshold value of consumed time in the slow log diff --git a/sql/slow-query.md b/sql/slow-query.md new file mode 100644 index 0000000000000..37969c4027955 --- /dev/null +++ b/sql/slow-query.md @@ -0,0 +1,94 @@ +--- + title: Slow Query Log + summary: Use the slow query log to locate inappropriate SQL statements + category: user guide + --- + + # Slow Query Log + + An inappropriate SQL statement can increase the pressure on the entire cluster, resulting in a longer response time. To solve this problem, you can use the slow query log to locate the questionable statements and thus improve the performance. + + ### Obtain the log + +By grepping the keyword `SLOW_QUERY` in the log file of TiDB, you can obtain the logs of statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold). + +`slow-threshold` can be edited in the configuration file and its default value is 300ms. If you configure the [slow-query-file](../op-guide/tidb-config-file.md#slow-query-file), all of the slow query log will be written in this file. + + ### Usage + +``` +2018/08/20 19:52:08.632 adapter.go:363: [warning] [SLOW_QUERY] cost_time:18.647928814s +process_time:1m6.768s wait_time:12m11.212s backoff_time:600ms request_count:2058 +total_keys:1869712 processed_keys:1869710 succ:true con:3 user:root@127.0.0.1 +txn_start_ts:402329674704224261 database:test table_ids:[31],index_ids:[1], +sql:select count(c) from sbtest1 use index (k_1) +``` + +### Fields Explanation + +#### `cost_time` + +The execution time of this statement. Only the statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold) output the slow query log. + +#### `process_time` + +The total processing time that TiKV spends on this statement. Since data is sent to TiKV concurrently for execution, this value may exceed `cost_time`. + +#### `wait_time` + +The total waiting time that TiKV spends on this statement. Since the `coprocessor` of TiKV runs a limited number of threads, requests may queue up when all threads of `coprocessor` are working. When some request in the queue takes a long time to process, the waiting time of the following requests will increase. + +#### `backoff_time` + +The waiting time that TiKV spends before retrying when the mistakes in need of retrying occur. The common mistakes which requires retrying include: lock, Region split, TiKV server is busy, etc. + +#### `request_count` + +The number of `coprocessor` requests that this statement sends. + +#### `total_keys` + +The number of keys that `coprocessor` has scanned. + +#### `processed_keys` + +The number of keys that `coprocessor` has processed. Compared with `total_keys`, `processed_keys` +does not include the old versions of MVCC or the MVCC delete flags. A great difference between `processed_keys` and `total_keys` indicates that the number of old versions are relatively large. + +#### `succ` + +Whether the execution of the request succeeds or not. + +#### `con` + +Connection ID or session ID. For example, you can use the keyword `con:3` to grep the log whose session ID is 3. + +#### `user` + +The username of the executed statement. + +#### `txn_start_ts` + +The start timestamp of the transaction, that is, the ID of the transaction. You can use this value to grep the transaction-related logs. + +#### `database` + +The current database. + +#### `table_ids` + +The IDs of the tables involved in the statement. + +#### `index_ids` + +The IDs of the indexes involved in the statement. + +#### `sql` + +The SQL statements. + +### Ways of locating questionable statements + +Not all of the `SLOW_QUERY` statements are questionable. Only those whose `process_time` is very large will increase the pressure on the entire cluster. + +The statements whose `wait_time` is very large and `process_time` is very small are usually not questionable. They are blocked by the real questionable statements so that they have to wait in the execution queue and thus have a much longer response time. diff --git a/tools/tikv-control.md b/tools/tikv-control.md index 144ea870682d8..c183beb76a857 100644 --- a/tools/tikv-control.md +++ b/tools/tikv-control.md @@ -243,7 +243,7 @@ success! Use the `recover-mvcc` command in circumstances where TiKV cannot run normally caused by MVCC data corruption. It cross-checks 3 CFs ("default", "write", "lock") to recover from various kinds of inconsistency. -Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd`option to specify PD endpoints. +Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd` option to specify PD endpoints. ```bash $ tikv-ctl --db /path/to/tikv/db recover-mvcc --regions 1001,1002 --pd 127.0.0.1:2379 diff --git a/trouble-shooting.md b/trouble-shooting.md index 3e29392e6f763..db1b7edfe26c3 100644 --- a/trouble-shooting.md +++ b/trouble-shooting.md @@ -90,7 +90,8 @@ Before starting the process, make sure the result of `ulimit -n` is large enough ## Database access times out and the system load is too high -Provide the following information: +First, check the [SLOW-QUERY](./sql/slow-query.md) log and see if it's because of some inappropriate SQL statement. +If you failed to solve the problem, provide the following information: + The deployment topology - How many `tidb-server`/`pd-server`/`tikv-server` instances are deployed? From 3f1d64753f5361229c9c6d20e17df745aa199cc4 Mon Sep 17 00:00:00 2001 From: yikeke Date: Tue, 28 Aug 2018 17:19:13 +0800 Subject: [PATCH 2/2] *: address the comment --- README.md | 2 +- op-guide/tidb-config-file.md | 2 +- sql/slow-query.md | 55 +++++++++++++++++++----------------- trouble-shooting.md | 2 +- 4 files changed, 32 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index b5057591503ae..8701406e7571f 100644 --- a/README.md +++ b/README.md @@ -71,7 +71,7 @@ - [Error Codes and Troubleshooting](sql/error.md) - [Compatibility with MySQL](sql/mysql-compatibility.md) - [TiDB Memory Control](sql/tidb-memory-control.md) - - [SLOW-QUERY](sql/slow-query.md) + - [Slow Query Log](sql/slow-query.md) + Advanced Usage - [Read Data From History Versions](op-guide/history-read.md) - [Garbage Collection (GC)](op-guide/gc.md) diff --git a/op-guide/tidb-config-file.md b/op-guide/tidb-config-file.md index 9e615b509c73f..2a1eaaa5340fb 100644 --- a/op-guide/tidb-config-file.md +++ b/op-guide/tidb-config-file.md @@ -54,7 +54,7 @@ Configuration about log. - The file name of the slow query log - Default: "" -- After you set it, the slow query log outputs to this file separately +- After you set it, the slow query log is output to this file separately ### `slow-threshold` diff --git a/sql/slow-query.md b/sql/slow-query.md index 37969c4027955..0a7378cced0f8 100644 --- a/sql/slow-query.md +++ b/sql/slow-query.md @@ -1,20 +1,22 @@ --- - title: Slow Query Log - summary: Use the slow query log to locate inappropriate SQL statements - category: user guide - --- +title: Slow Query Log +summary: Use the slow query log to identify problematic SQL statements. +category: user guide +--- + +# Slow Query Log - # Slow Query Log +The slow query log is a record of SQL statements that took a long time to perform. - An inappropriate SQL statement can increase the pressure on the entire cluster, resulting in a longer response time. To solve this problem, you can use the slow query log to locate the questionable statements and thus improve the performance. +A problematic SQL statement can increase the pressure on the entire cluster, resulting in a longer response time. To solve this problem, you can use the slow query log to identify the problematic statements and thus improve the performance. - ### Obtain the log +### Obtain the log -By grepping the keyword `SLOW_QUERY` in the log file of TiDB, you can obtain the logs of statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold). +By `grep` the keyword `SLOW_QUERY` in the log file of TiDB, you can obtain the logs of statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold). -`slow-threshold` can be edited in the configuration file and its default value is 300ms. If you configure the [slow-query-file](../op-guide/tidb-config-file.md#slow-query-file), all of the slow query log will be written in this file. +You can edit `slow-threshold` in the configuration file and its default value is 300ms. If you configure the [slow-query-file](../op-guide/tidb-config-file.md#slow-query-file), all the slow query logs will be written in this file. - ### Usage +### Usage example ``` 2018/08/20 19:52:08.632 adapter.go:363: [warning] [SLOW_QUERY] cost_time:18.647928814s @@ -24,36 +26,37 @@ txn_start_ts:402329674704224261 database:test table_ids:[31],index_ids:[1], sql:select count(c) from sbtest1 use index (k_1) ``` -### Fields Explanation +### Fields description + +This section describes fields in the slow query log based on the usage example above. #### `cost_time` -The execution time of this statement. Only the statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold) output the slow query log. +The execution time of this statement. Only the statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold) output this log. #### `process_time` -The total processing time that TiKV spends on this statement. Since data is sent to TiKV concurrently for execution, this value may exceed `cost_time`. +The total processing time of this statement in TiKV. Because data is sent to TiKV concurrently for execution, this value might exceed `cost_time`. #### `wait_time` -The total waiting time that TiKV spends on this statement. Since the `coprocessor` of TiKV runs a limited number of threads, requests may queue up when all threads of `coprocessor` are working. When some request in the queue takes a long time to process, the waiting time of the following requests will increase. +The total waiting time of this statement in TiKV. Because the Coprocessor of TiKV runs a limited number of threads, requests might queue up when all threads of Coprocessor are working. When a request in the queue takes a long time to process, the waiting time of the subsequent requests will increase. #### `backoff_time` -The waiting time that TiKV spends before retrying when the mistakes in need of retrying occur. The common mistakes which requires retrying include: lock, Region split, TiKV server is busy, etc. +The waiting time before retry when this statement encounters errors that require a retry. The common errors as such include: lock occurs, Region split, the TiKV server is busy. #### `request_count` -The number of `coprocessor` requests that this statement sends. +The number of Coprocessor requests that this statement sends. #### `total_keys` -The number of keys that `coprocessor` has scanned. +The number of keys that Coprocessor has scanned. #### `processed_keys` -The number of keys that `coprocessor` has processed. Compared with `total_keys`, `processed_keys` -does not include the old versions of MVCC or the MVCC delete flags. A great difference between `processed_keys` and `total_keys` indicates that the number of old versions are relatively large. +The number of keys that Coprocessor has processed. Compared with `total_keys`, `processed_keys` does not include the old versions of MVCC or the MVCC `delete` marks. A great difference between `processed_keys` and `total_keys` indicates that the number of old versions are relatively large. #### `succ` @@ -61,15 +64,15 @@ Whether the execution of the request succeeds or not. #### `con` -Connection ID or session ID. For example, you can use the keyword `con:3` to grep the log whose session ID is 3. +Connection ID (session ID). For example, you can use the keyword `con:3` to `grep` the log whose session ID is 3. #### `user` -The username of the executed statement. +The name of the user who executes this statement. #### `txn_start_ts` -The start timestamp of the transaction, that is, the ID of the transaction. You can use this value to grep the transaction-related logs. +The start timestamp of the transaction, that is, the ID of the transaction. You can use this value to `grep` the transaction-related logs. #### `database` @@ -85,10 +88,10 @@ The IDs of the indexes involved in the statement. #### `sql` -The SQL statements. +The SQL statement. -### Ways of locating questionable statements +### Identify problematic SQL statements -Not all of the `SLOW_QUERY` statements are questionable. Only those whose `process_time` is very large will increase the pressure on the entire cluster. +Not all of the `SLOW_QUERY` statements are problematic. Only those whose `process_time` is very large will increase the pressure on the entire cluster. -The statements whose `wait_time` is very large and `process_time` is very small are usually not questionable. They are blocked by the real questionable statements so that they have to wait in the execution queue and thus have a much longer response time. +The statements whose `wait_time` is very large and `process_time` is very small are usually not problematic. The large `wait_time` is because the statement is blocked by real problematic statements and it has to wait in the execution queue, which leads to a much longer response time. diff --git a/trouble-shooting.md b/trouble-shooting.md index db1b7edfe26c3..f9986f4bbc2ca 100644 --- a/trouble-shooting.md +++ b/trouble-shooting.md @@ -90,7 +90,7 @@ Before starting the process, make sure the result of `ulimit -n` is large enough ## Database access times out and the system load is too high -First, check the [SLOW-QUERY](./sql/slow-query.md) log and see if it's because of some inappropriate SQL statement. +First, check the [SLOW-QUERY](./sql/slow-query.md) log and see if it is because of some inappropriate SQL statement. If you failed to solve the problem, provide the following information: + The deployment topology