-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error handling when reading request and Region meta change in concurrency #1101
Improve error handling when reading request and Region meta change in concurrency #1101
Conversation
Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
f9fd60b
to
fd082d8
Compare
/run-all-tests |
{ | ||
/// Recover from region exception when super batch is enable | ||
if (dag.isBatchCop()) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So even if there is only one region meet region error, TiFlash still needs to read all the region remotely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can optimize it.
For the first time read region [1,2,...,10], if region [1,2] validate fail, then read [3,4,...,10] from local again and push [1,2] to region_retry
.
If the second time of local read fails again, no matter how many regions fail, push [3,4,...,10] to region_retry
.
What do you think about it? @windtalker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean add an extra retry to handle this? Why can't we retry more than one time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that too much retry will make the whole process time out of control... Now we already have retry while doing learner read and reading from remote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's retry it with limited times.
Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
I will add some tests about retrying to read from local storage later. |
Signed-off-by: JaySon-Huang <[email protected]>
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
1 similar comment
/run-all-tests |
… concurrency (#1101) (#1109) * Improve error handling when reading request and Region meta change in concurrency * Add retry from local storage * Move definitions of FailPoints into cpp file Signed-off-by: JaySon-Huang <[email protected]>
What problem does this PR solve?
Issue Number: close #1095
Problem Summary:
If region meta changed between learner read and get streams from storage, we can not ensure the correctness of read data. We should retry those key ranges.
Before this PR, if the super batch is enabled and happens to this error, an error will directly be thrown to users. Users need to retry their queries. We should handle those retry inside TiFlash.
What is changed and how it works?
RegionException
after read from storage, and super batch is enabled, thenregion_retry
and read from remote storage laterRelated changes
Check List
Tests
Side effects
Release note