Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: log slow coprocessor task in detail #6344

Merged
merged 12 commits into from
Apr 26, 2018
57 changes: 54 additions & 3 deletions store/tikv/coprocessor.go
Original file line number Diff line number Diff line change
Expand Up @@ -611,14 +611,14 @@ func (worker *copIteratorWorker) handleTaskOnce(bo *Backoffer, task *copTask, ch
if err != nil {
return nil, errors.Trace(err)
}
// Set task.storeAddr field so its task.String() method have the store address information.
task.storeAddr = sender.storeAddr
costTime := time.Since(startTime)
if costTime > minLogCopTaskTime {
log.Infof("[TIME_COP_TASK] %s%s %s", costTime, bo, task)
worker.logTimeCopTask(costTime, task, bo, resp)
}
metrics.TiKVCoprocessorCounter.WithLabelValues("handle_task").Inc()
metrics.TiKVCoprocessorHistogram.Observe(costTime.Seconds())
// Set task.storeAddr field so its task.String() method have the store address information.
task.storeAddr = sender.storeAddr

if task.cmdType == tikvrpc.CmdCopStream {
return worker.handleCopStreamResult(bo, resp.CopStream, task, ch)
Expand All @@ -628,6 +628,57 @@ func (worker *copIteratorWorker) handleTaskOnce(bo *Backoffer, task *copTask, ch
return worker.handleCopResponse(bo, resp.Cop, task, ch, nil)
}

const (
minLogBackoffTime = 100
minLogKVProcessTime = 100
minLogKVWaitTime = 200
minLogScanProcessedInfo = 100000
minLogScanTotalInfo = 200000
)

func (worker *copIteratorWorker) logTimeCopTask(costTime time.Duration, task *copTask, bo *Backoffer, resp *tikvrpc.Response) {
logStr := fmt.Sprintf("[TIME_COP_TASK] cop_resp_time:%s, cop_start_ts:%d, cop_region_id:%d, cop_store_addr:%s", costTime, worker.req.StartTs, task.region.id, task.storeAddr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add time unit for the cop_resp_time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's time.Duration type, which unit is included.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/cop_start_ts/txnID/ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to print the 'cop_' prefix

if bo.totalSleep > minLogBackoffTime {
logStr += fmt.Sprintf(", cop_backoff_ms:%d", bo.totalSleep)
}
var detail *kvrpcpb.ExecDetails
if task.cmdType == tikvrpc.CmdCopStream {
detail = resp.CopStream.ExecDetails
} else {
detail = resp.Cop.ExecDetails
}
if detail != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many if in if.. Could we log.Info through defer function, and return directly if detail==nil?

Copy link
Member Author

@coocood coocood Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defer is used for release resource.
It looks weird if the normal execution code is done in defer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Yuxing.

if detail.HandleTime != nil {
processMs := detail.HandleTime.ProcessMs
waitMs := detail.HandleTime.WaitMs
if processMs > minLogKVProcessTime {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreMouche Does coprocessor also has a threshold to fill HandleTime.ProcessMs and HandleTime.WaitMs for slow cop requests ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coocood Maybe we just need to print HandleTime.ProcessMs and HandleTime.WaitMs once they are set, no need to use another thresholds like minLogKVProcessTime and minLogKVWaitTime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we set the HandleTime and ScanDetial to true, every request will return them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will we set ScanDetail to true ? @coocood

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScanDetail is not set to true now. it's set only when the duration exceeds the threshold(1s) in TiKV.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should remove these minXXXX thresholds defined in this file and do not check them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TiDB should control what to log, without depending on TiKV's implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zz-jason If we log everything, the most important ones will be obscure.

logStr += fmt.Sprintf(", cop_kv_process_ms:%d", processMs)
}
if waitMs > minLogKVWaitTime {
logStr += fmt.Sprintf(", cop_kv_wait_ms:%d", waitMs)
}
}
if detail.ScanDetail != nil {
logStr = appendScanDetail(logStr, "write", detail.ScanDetail.Write)
logStr = appendScanDetail(logStr, "data", detail.ScanDetail.Data)
logStr = appendScanDetail(logStr, "lock", detail.ScanDetail.Lock)
}
}
log.Info(logStr)
}

func appendScanDetail(logStr string, columnFamily string, scanInfo *kvrpcpb.ScanInfo) string {
if scanInfo != nil {
if scanInfo.Total > minLogScanTotalInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to check this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not log messages that is not interesting.

logStr += fmt.Sprintf(", cop_scan_total_%s:%d", columnFamily, scanInfo.Total)
}
if scanInfo.Processed > minLogScanProcessedInfo {
logStr += fmt.Sprintf(", cop_scan_processed_%s:%d", columnFamily, scanInfo.Processed)
}
}
return logStr
}

func (worker *copIteratorWorker) handleCopStreamResult(bo *Backoffer, stream *tikvrpc.CopStreamResponse, task *copTask, ch chan<- copResponse) ([]*copTask, error) {
defer stream.Close()
var resp *coprocessor.Response
Expand Down