-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
br/lightning: change KvPair
's row ID type from int64
to []bytes
#41787
br/lightning: change KvPair
's row ID type from int64
to []bytes
#41787
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
KvPair
's row ID type from int64
to []bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
Co-authored-by: lance6716 <[email protected]>
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 9101b3a
|
What problem does this PR solve?
Issue Number: ref #37119
Problem Summary:
In distributed execution framework, adding index job is split into multiple sub-tasks. Different TiDB instance may pick different sub-tasks up and execute them. Some of the sub-tasks may be rescheduled to another TiDB instance if the previous one halted due to some reasons.
In the perspective of the TiDB instance, it must be able to solve the reentrancy problem for the same range. Otherwise, an unexpected "duplicate key" error is reported even if there is no duplicate row actually.
To detect the duplicate key, lightning provides a field
RowID(int64)
to identify the row in the data source, appends it to the key slice and stores the key slice to the local engine. Before importing the data to TiKV, the iterator reads the keys, strips their row IDs, and compares the keys to decide whether they are duplicate.Thus, it is the caller responsibility to guarantee that the same index key has the same
RowID
. For lightning, it is natural to use the "file offset" as theRowID
. However,int64
is not enough to identify a row for adding index.What is changed and how it works?
This PR changes the
RowID
type to[]byte
so that the handles(including int handles and common handles) can be used as theRowID
.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.