Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add slow-query doc #587

Merged
merged 2 commits into from
Aug 29, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -1052,6 +1052,6 @@ When you execute the `LOAD DATA LOCAL` statement but the MySQL client does not a

The solution is to use the `--local-infile=1` option when you start the MySQL client. For example, use command like `mysql --local-infile=1 -u root -h 127.0.0.1 -P 4000`. The default value of `local-infile` is different in different versions of MySQL client, therefore you need to configure it in some MySQL clients and do not need to configure it in some others.

#### ERROR 9001 (HY000): PD server timeoutstart timestamp may fall behind safepoint
#### ERROR 9001 (HY000): PD server timeout start timestamp may fall behind safe point

This error occurs when TiDB fails to access PD. A worker in the TiDB background continuously queries the safepoint from PD and this error occurs if it fails to query within 100s. Generally it is because the PD failure or network failure between TiDB and PD. For the details of common errors, see [Error Number and Fault Diagnosis](sql/error.md).
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
- [Error Codes and Troubleshooting](sql/error.md)
- [Compatibility with MySQL](sql/mysql-compatibility.md)
- [TiDB Memory Control](sql/tidb-memory-control.md)
- [SLOW-QUERY](sql/slow-query.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Slow Query Log

+ Advanced Usage
- [Read Data From History Versions](op-guide/history-read.md)
- [Garbage Collection (GC)](op-guide/gc.md)
Expand Down
6 changes: 6 additions & 0 deletions op-guide/tidb-config-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ Configuration about log.
- Default: false
- If you set the value to true, the log does not output timestamp

### `slow-query-file`

- The file name of the slow query log
- Default: ""
- After you set it, the slow query log outputs to this file separately
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputs -> is output


### `slow-threshold`

- To output the threshold value of consumed time in the slow log
Expand Down
94 changes: 94 additions & 0 deletions sql/slow-query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
title: Slow Query Log
summary: Use the slow query log to locate inappropriate SQL statements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a period at the end of the sentence.

category: user guide
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata cannot display correctly. You can click View to preview. Delete the extra space at the beginning of each line from line 2 to line 5.


# Slow Query Log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls also delete the extra space at the beginning of the line. Other similar issues in this doc:

  • line 9
  • line 11
  • line 17


An inappropriate SQL statement can increase the pressure on the entire cluster, resulting in a longer response time. To solve this problem, you can use the slow query log to locate the questionable statements and thus improve the performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think we'd better describe what is "Slow Query Log" first. Add the following sentence to the beginning:
    The slow query log is a record of SQL statements that took a long time to perform.
  • The word "questionable" is not appropriate, which is usually used to express "not true or correct; not likely to be good/honest". Use the word "problematic" instead. Ref: http://www.oracle.com/technetwork/tutorials/tuningadv-1921876.html


### Obtain the log

By grepping the keyword `SLOW_QUERY` in the log file of TiDB, you can obtain the logs of statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grepping -> grep (with the inline code format)


`slow-threshold` can be edited in the configuration file and its default value is 300ms. If you configure the [slow-query-file](../op-guide/tidb-config-file.md#slow-query-file), all of the slow query log will be written in this file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • slow-threshold can be edited -> You can edit slow-threshold
  • all of the slow query log -> all the slow query logs


### Usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Usage/Example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR you can use Usage Example


```
2018/08/20 19:52:08.632 adapter.go:363: [warning] [SLOW_QUERY] cost_time:18.647928814s
process_time:1m6.768s wait_time:12m11.212s backoff_time:600ms request_count:2058
total_keys:1869712 processed_keys:1869710 succ:true con:3 user:[email protected]
txn_start_ts:402329674704224261 database:test table_ids:[31],index_ids:[1],
sql:select count(c) from sbtest1 use index (k_1)
```

### Fields Explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this sentence here: "This section explains fields in the slow query log based on the usage example above."

#### `cost_time`

The execution time of this statement. Only the statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold) output the slow query log.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output the slow query log -> output this log


#### `process_time`

The total processing time that TiKV spends on this statement. Since data is sent to TiKV concurrently for execution, this value may exceed `cost_time`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • that TiKV spends on this statement -> of this statement in TiKV
  • Since -> Because (Pls also update other similar "may" involved in this PR.)
  • may -> might (Generally, we use "might" in our documentation, because "may" has other meanings. Pls also update other similar "may" involved in this PR.)


#### `wait_time`

The total waiting time that TiKV spends on this statement. Since the `coprocessor` of TiKV runs a limited number of threads, requests may queue up when all threads of `coprocessor` are working. When some request in the queue takes a long time to process, the waiting time of the following requests will increase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • that TiKV spends on this statement -> of this statement in TiKV
  • coprocessor -> Coprocessor (without inline code format, consistent with its occurrences in our doc/blog)
  • some request -> some requests
  • following -> subsequent


#### `backoff_time`

The waiting time that TiKV spends before retrying when the mistakes in need of retrying occur. The common mistakes which requires retrying include: lock, Region split, TiKV server is busy, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> The waiting time before retry when this statement encounters errors that require a retry. The common such errors include: lock occurs, Region split, the TiKV server is busy.


#### `request_count`

The number of `coprocessor` requests that this statement sends.

#### `total_keys`

The number of keys that `coprocessor` has scanned.

#### `processed_keys`

The number of keys that `coprocessor` has processed. Compared with `total_keys`, `processed_keys`
does not include the old versions of MVCC or the MVCC delete flags. A great difference between `processed_keys` and `total_keys` indicates that the number of old versions are relatively large.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Pls move this line to the end of the last line. Do not break a line manually.
  • MVCC delete flags -> MVCC delete marks


#### `succ`

Whether the execution of the request succeeds or not.

#### `con`

Connection ID or session ID. For example, you can use the keyword `con:3` to grep the log whose session ID is 3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection ID or session ID. -> Connection ID (session ID).

Note: To avoid the user think they are two different IDs.


#### `user`

The username of the executed statement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> The username of the user who executes this statement.


#### `txn_start_ts`

The start timestamp of the transaction, that is, the ID of the transaction. You can use this value to grep the transaction-related logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep -> grep


#### `database`

The current database.

#### `table_ids`

The IDs of the tables involved in the statement.

#### `index_ids`

The IDs of the indexes involved in the statement.

#### `sql`

The SQL statements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statements -> statement


### Ways of locating questionable statements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Identify problematic SQL statements


Not all of the `SLOW_QUERY` statements are questionable. Only those whose `process_time` is very large will increase the pressure on the entire cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questionable -> problematic


The statements whose `wait_time` is very large and `process_time` is very small are usually not questionable. They are blocked by the real questionable statements so that they have to wait in the execution queue and thus have a much longer response time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • questionable -> problematic
  • Modify the 2nd sentence to: The large wait_time is because the statement is blocked by real problematic statements and it has to wait in the execution queue, which leads to a much longer response time.

2 changes: 1 addition & 1 deletion tools/tikv-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ success!

Use the `recover-mvcc` command in circumstances where TiKV cannot run normally caused by MVCC data corruption. It cross-checks 3 CFs ("default", "write", "lock") to recover from various kinds of inconsistency.

Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd`option to specify PD endpoints.
Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd` option to specify PD endpoints.

```bash
$ tikv-ctl --db /path/to/tikv/db recover-mvcc --regions 1001,1002 --pd 127.0.0.1:2379
Expand Down
3 changes: 2 additions & 1 deletion trouble-shooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ Before starting the process, make sure the result of `ulimit -n` is large enough

## Database access times out and the system load is too high

Provide the following information:
First, check the [SLOW-QUERY](./sql/slow-query.md) log and see if it's because of some inappropriate SQL statement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's -> it is

If you failed to solve the problem, provide the following information:

+ The deployment topology
- How many `tidb-server`/`pd-server`/`tikv-server` instances are deployed?
Expand Down