You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an enhancement task following #34106 and tikv/tikv#12362. As more information would be collected in these two enhancement tasks, the slow query log could also be enhanced so that users could locate the root cause of slowness within 5 minutes. Previously it was quite difficult to get to know what really slows down specific queries, for example, the issue described in #28937.
Background
The current slow log mainly contains two parts, the first level fields such as Query_time, Compile_time and the executor runtime information which could be decoded by using select tidb_decode_plan(). The problem is:
Some important execution paths are not tracked. For example the kv request processing in tidb batch client.
The information display lacks a user-perceivable latency perspective. For example, the execution model is like a waterfall driven by the Next call but the coprocessor tasks and 2pc prewrite processing are other working pipelines. Recording single cop task processing duration does not reflect user-perceivable latency directly.
Target
Users could locate the root cause of slowness based on the slow log record within 5 minutes. The TiDB diagnose engine could provide improvement advice accordingly.
Design
The overall idea
We could split the query execution into three phases and record these main durations in the slow log first-level fields, so we could easily tell which is the major source of the latency. Then the task details could be recorded in the execution plan tree. Currently, there's already an execution plan tree in which the details of the root and cop tasks are recorded.
Add Protocol_time, it represents the execution duration after receiving the mysql packet and before starting to process the mysql command.
Add Token_wait_time, it is a sub-part of Protocol_time that represents the duration waiting for the token so it's allowed to process the mysql command.
Add Load_variable_time, it is the time reading the global variables for the current execution. Usually, it should be small.
So the duration of this phase is:
protocol process -> parse -> load global variables -> compile -> wait ts
These are user-perceivable durations and these executions are within a single tidb goroutine.
The execution phase
Add Executor_build_time, it's the time building the executors. In some situations the for_update_ts could be updated so there may be a sync wait_ts duration in this stage.
Add Executor_open_time, it's the time opening the executors. For example, the CopClient may try to start specific numbers of cop workers to send cop requests and process responses.
Add Loop_time, it's the actual time used doing a Next move. This is the most important part of the execution handling, there could be different tasks for different executors. For pessimistic transaction statements, there would be another phase trying to lock the related keys.
The execution finishing phase
Add Response_time, it's the time responding mysql packets to the mysql client.
Add Close_time, it's the time finishing the current query processing, for a select statement it needs to close the internal result_set. If the close processing is slow, the current session or connection could not process the next mysql command or packet in time.
The detail duration of the Loop_time
As mentioned above, the most important duration is the Loop_time phase in which there could be complex calculations and kv request round trips. Also, we need to figure out a way to display the information in a user-perceivable way.
Note these detail enhancements could be implemented in the execution plan tree which is already used by the slow log. These changes would not change the slow log first level fields but the content of select tidb_decode_plan().
The read path
Each time a Next is triggered, there are some differences between the root type tasks and the cop type tasks.
The root tasks
The time used to prepare a new row chunk. There could be some producer-consumer model doing complex computation like aggregation or join.
Refactor the time field. Try to separate the user-perceivable duration and the background task processing duration. Also the data preparation duration if the consumption is faster than the processing.
Add the task detail duration for background tasks, for example, there could be some parallel processing of aggregation, join.
The cop tasks
Refactor the time field. Try to separate the user-perceivable duration and the background task processing duration. Also the data preparation duration if the consumption is faster than the processing.
Enhancement
This is an enhancement task following #34106 and tikv/tikv#12362. As more information would be collected in these two enhancement tasks, the slow query log could also be enhanced so that users could locate the root cause of slowness within 5 minutes. Previously it was quite difficult to get to know what really slows down specific queries, for example, the issue described in #28937.
Background
The current slow log mainly contains two parts, the first level fields such as
Query_time
,Compile_time
and the executor runtime information which could be decoded by usingselect tidb_decode_plan()
. The problem is:Next
call but the coprocessor tasks and 2pc prewrite processing are other working pipelines. Recording single cop task processing duration does not reflect user-perceivable latency directly.Target
Users could locate the root cause of slowness based on the slow log record within 5 minutes. The TiDB diagnose engine could provide improvement advice accordingly.
Design
The overall idea
We could split the query execution into three phases and record these main durations in the slow log first-level fields, so we could easily tell which is the major source of the latency. Then the task details could be recorded in the execution plan tree. Currently, there's already an execution plan tree in which the details of the root and cop tasks are recorded.
The execution preparing phase
Protocol_time
, it represents the execution duration after receiving the mysql packet and before starting to process the mysql command.Token_wait_time
, it is a sub-part ofProtocol_time
that represents the duration waiting for the token so it's allowed to process the mysql command.Load_variable_time
, it is the time reading the global variables for the current execution. Usually, it should be small.So the duration of this phase is:
These are user-perceivable durations and these executions are within a single tidb goroutine.
The execution phase
Executor_build_time
, it's the time building the executors. In some situations thefor_update_ts
could be updated so there may be a syncwait_ts
duration in this stage.Executor_open_time
, it's the time opening the executors. For example, theCopClient
may try to start specific numbers of cop workers to send cop requests and process responses.Loop_time
, it's the actual time used doing aNext
move. This is the most important part of the execution handling, there could be different tasks for different executors. For pessimistic transaction statements, there would be another phase trying to lock the related keys.The execution finishing phase
Response_time
, it's the time responding mysql packets to the mysql client.Close_time
, it's the time finishing the current query processing, for aselect
statement it needs to close the internalresult_set
. If the close processing is slow, the current session or connection could not process the next mysql command or packet in time.The detail duration of the
Loop_time
As mentioned above, the most important duration is the
Loop_time
phase in which there could be complex calculations and kv request round trips. Also, we need to figure out a way to display the information in a user-perceivable way.Note these detail enhancements could be implemented in the execution plan tree which is already used by the slow log. These changes would not change the slow log first level fields but the content of
select tidb_decode_plan()
.The read path
Each time a
Next
is triggered, there are some differences between the root type tasks and the cop type tasks.The root tasks
The time used to prepare a new row chunk. There could be some producer-consumer model doing complex computation like aggregation or join.
time
field. Try to separate the user-perceivable duration and the background task processing duration. Also the data preparation duration if the consumption is faster than the processing.task
detail duration for background tasks, for example, there could be some parallel processing of aggregation, join.The cop tasks
time
field. Try to separate the user-perceivable duration and the background task processing duration. Also the data preparation duration if the consumption is faster than the processing.cop task
details for a single task based on the information from Performance Diagnosis Enhancements #34106 and Performance Diagnosis Enhancements tikv/tikv#12362.The write path
Split the execution into two phases the transaction ongoing and transaction committing.
Transaction ongoing phase
time
field. Try to separate the user-perceivable duration and the concurrent key locking phase.pessimistic lock
kv request details based on the information from Performance Diagnosis Enhancements #34106 and Performance Diagnosis Enhancements tikv/tikv#12362.Transaction committing phase
time
field. Try to separate the user-perceivable duration and the commit duration of the two phase committer.prewrite
kv request details based on the information from Performance Diagnosis Enhancements #34106 and Performance Diagnosis Enhancements tikv/tikv#12362.The text was updated successfully, but these errors were encountered: