Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add slow-query doc #587

Merged
merged 2 commits into from
Aug 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -1052,6 +1052,6 @@ When you execute the `LOAD DATA LOCAL` statement but the MySQL client does not a

The solution is to use the `--local-infile=1` option when you start the MySQL client. For example, use command like `mysql --local-infile=1 -u root -h 127.0.0.1 -P 4000`. The default value of `local-infile` is different in different versions of MySQL client, therefore you need to configure it in some MySQL clients and do not need to configure it in some others.

#### ERROR 9001 (HY000): PD server timeoutstart timestamp may fall behind safepoint
#### ERROR 9001 (HY000): PD server timeout start timestamp may fall behind safe point

This error occurs when TiDB fails to access PD. A worker in the TiDB background continuously queries the safepoint from PD and this error occurs if it fails to query within 100s. Generally it is because the PD failure or network failure between TiDB and PD. For the details of common errors, see [Error Number and Fault Diagnosis](sql/error.md).
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
- [Error Codes and Troubleshooting](sql/error.md)
- [Compatibility with MySQL](sql/mysql-compatibility.md)
- [TiDB Memory Control](sql/tidb-memory-control.md)
- [Slow Query Log](sql/slow-query.md)
+ Advanced Usage
- [Read Data From History Versions](op-guide/history-read.md)
- [Garbage Collection (GC)](op-guide/gc.md)
Expand Down
6 changes: 6 additions & 0 deletions op-guide/tidb-config-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ Configuration about log.
- Default: false
- If you set the value to true, the log does not output timestamp

### `slow-query-file`

- The file name of the slow query log
- Default: ""
- After you set it, the slow query log is output to this file separately

### `slow-threshold`

- To output the threshold value of consumed time in the slow log
Expand Down
97 changes: 97 additions & 0 deletions sql/slow-query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
title: Slow Query Log
summary: Use the slow query log to identify problematic SQL statements.
category: user guide
---

# Slow Query Log

The slow query log is a record of SQL statements that took a long time to perform.

A problematic SQL statement can increase the pressure on the entire cluster, resulting in a longer response time. To solve this problem, you can use the slow query log to identify the problematic statements and thus improve the performance.

### Obtain the log

By `grep` the keyword `SLOW_QUERY` in the log file of TiDB, you can obtain the logs of statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold).

You can edit `slow-threshold` in the configuration file and its default value is 300ms. If you configure the [slow-query-file](../op-guide/tidb-config-file.md#slow-query-file), all the slow query logs will be written in this file.

### Usage example

```
2018/08/20 19:52:08.632 adapter.go:363: [warning] [SLOW_QUERY] cost_time:18.647928814s
process_time:1m6.768s wait_time:12m11.212s backoff_time:600ms request_count:2058
total_keys:1869712 processed_keys:1869710 succ:true con:3 user:[email protected]
txn_start_ts:402329674704224261 database:test table_ids:[31],index_ids:[1],
sql:select count(c) from sbtest1 use index (k_1)
```

### Fields description

This section describes fields in the slow query log based on the usage example above.

#### `cost_time`

The execution time of this statement. Only the statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold) output this log.

#### `process_time`

The total processing time of this statement in TiKV. Because data is sent to TiKV concurrently for execution, this value might exceed `cost_time`.

#### `wait_time`

The total waiting time of this statement in TiKV. Because the Coprocessor of TiKV runs a limited number of threads, requests might queue up when all threads of Coprocessor are working. When a request in the queue takes a long time to process, the waiting time of the subsequent requests will increase.

#### `backoff_time`

The waiting time before retry when this statement encounters errors that require a retry. The common errors as such include: lock occurs, Region split, the TiKV server is busy.

#### `request_count`

The number of Coprocessor requests that this statement sends.

#### `total_keys`

The number of keys that Coprocessor has scanned.

#### `processed_keys`

The number of keys that Coprocessor has processed. Compared with `total_keys`, `processed_keys` does not include the old versions of MVCC or the MVCC `delete` marks. A great difference between `processed_keys` and `total_keys` indicates that the number of old versions are relatively large.

#### `succ`

Whether the execution of the request succeeds or not.

#### `con`

Connection ID (session ID). For example, you can use the keyword `con:3` to `grep` the log whose session ID is 3.

#### `user`

The name of the user who executes this statement.

#### `txn_start_ts`

The start timestamp of the transaction, that is, the ID of the transaction. You can use this value to `grep` the transaction-related logs.

#### `database`

The current database.

#### `table_ids`

The IDs of the tables involved in the statement.

#### `index_ids`

The IDs of the indexes involved in the statement.

#### `sql`

The SQL statement.

### Identify problematic SQL statements

Not all of the `SLOW_QUERY` statements are problematic. Only those whose `process_time` is very large will increase the pressure on the entire cluster.

The statements whose `wait_time` is very large and `process_time` is very small are usually not problematic. The large `wait_time` is because the statement is blocked by real problematic statements and it has to wait in the execution queue, which leads to a much longer response time.
2 changes: 1 addition & 1 deletion tools/tikv-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ success!

Use the `recover-mvcc` command in circumstances where TiKV cannot run normally caused by MVCC data corruption. It cross-checks 3 CFs ("default", "write", "lock") to recover from various kinds of inconsistency.

Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd`option to specify PD endpoints.
Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd` option to specify PD endpoints.

```bash
$ tikv-ctl --db /path/to/tikv/db recover-mvcc --regions 1001,1002 --pd 127.0.0.1:2379
Expand Down
3 changes: 2 additions & 1 deletion trouble-shooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ Before starting the process, make sure the result of `ulimit -n` is large enough

## Database access times out and the system load is too high

Provide the following information:
First, check the [SLOW-QUERY](./sql/slow-query.md) log and see if it is because of some inappropriate SQL statement.
If you failed to solve the problem, provide the following information:

+ The deployment topology
- How many `tidb-server`/`pd-server`/`tikv-server` instances are deployed?
Expand Down