Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

checksum fails easily by tikv timeout when restore big table with local backend #365

Closed
glorv opened this issue Aug 6, 2020 · 1 comment · Fixed by #369
Closed

checksum fails easily by tikv timeout when restore big table with local backend #365

glorv opened this issue Aug 6, 2020 · 1 comment · Fixed by #369
Labels
difficulty/3-hard Hard issue priority/P2 Medium priority issue severity/major status/WIP Work in progress type/bug This issue is a bug report

Comments

@glorv
Copy link
Contributor

glorv commented Aug 6, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
    When uses lightning to restore big table with local backend, I have seem several times in lightning logs with follows:
[2020/08/05 21:05:39.713 +08:00] [ERROR] [main.go:82] ["tidb lightning encountered error stack info"] [error="restore table `test`.`t` failed: compute remote checksum failed: Error 9002: TiKV server timeout"] 

And further more, in some benchmark, after finishing load data, when manually exec select count(*) from table with big table, the select may fail with:

mysql> select count(*) from t;
ERROR 1105 (HY000): Execution terminated due to exceeding the deadline

By consulting @breeswish , The root cause for this is that tikv process subtask in one region execeeded 1 minute, Either because the region is too large or the task waited for too long.

I thinking the primary cause of this is that With local backend, region key range is not accurate the size of 96M, thus maybe some region is too big.

  1. What did you expect to see?
    checksum should success without retry and select count(*) should return successfully

  2. What did you see instead?

  3. Versions of the cluster

  4. Operation logs

  5. Configuration of the cluster and the task

  6. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible

@glorv glorv added the type/bug This issue is a bug report label Aug 6, 2020
@kennytm kennytm added difficulty/3-hard Hard issue priority/P2 Medium priority issue labels Aug 6, 2020
@kennytm kennytm added the status/WIP Work in progress label Aug 7, 2020
@lance6716
Copy link
Contributor

https://internal.pingcap.net/jira/browse/TIDB-4758

for your reference

@glorv glorv closed this as completed in #369 Sep 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
difficulty/3-hard Hard issue priority/P2 Medium priority issue severity/major status/WIP Work in progress type/bug This issue is a bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants