This repository has been archived by the owner on Dec 8, 2021. It is now read-only.
checksum fails easily by tikv timeout when restore big table with local backend #365
Labels
difficulty/3-hard
Hard issue
priority/P2
Medium priority issue
severity/major
status/WIP
Work in progress
type/bug
This issue is a bug report
Bug Report
Please answer these questions before submitting your issue. Thanks!
When uses lightning to restore big table with local backend, I have seem several times in lightning logs with follows:
And further more, in some benchmark, after finishing load data, when manually exec
select count(*) from table
with big table, the select may fail with:By consulting @breeswish , The root cause for this is that tikv process subtask in one region execeeded 1 minute, Either because the region is too large or the task waited for too long.
I thinking the primary cause of this is that With local backend, region key range is not accurate the size of 96M, thus maybe some region is too big.
What did you expect to see?
checksum should success without retry and select count(*) should return successfully
What did you see instead?
Versions of the cluster
Operation logs
Configuration of the cluster and the task
Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible
The text was updated successfully, but these errors were encountered: