Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: pick the first file to check schema #27607

Merged
merged 12 commits into from
Aug 27, 2021

Conversation

3pointer
Copy link
Contributor

@3pointer 3pointer commented Aug 26, 2021

What problem does this PR solve?

Issue Number: close #27605
Problem Summary:
check all files costs too much time.

What is changed and how it works?

What's Changed:
pick one random file to check schema is valid.

Check List

Tests

  • Unit test

Release note

Fix the issue that pre-check cost too much time when import too many files for tables.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 26, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • glorv
  • kennytm

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 26, 2021
@3pointer
Copy link
Contributor Author

/component lightning

@ti-chi-bot ti-chi-bot added the component/lightning This issue is related to Lightning of TiDB. label Aug 26, 2021
@3pointer
Copy link
Contributor Author

/sig migrate

br/pkg/lightning/restore/check_info.go Outdated Show resolved Hide resolved
br/pkg/lightning/restore/check_info.go Outdated Show resolved Hide resolved
"reflect"
"sort"
"strconv"
"strings"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? move them back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ti-chi-bot ti-chi-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 26, 2021
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 26, 2021
@3pointer
Copy link
Contributor Author

/label needs-cherry-pick-5.2

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 26, 2021
@3pointer 3pointer changed the title lightning: random pick file to check schema lightning: pick the first file to check schema Aug 26, 2021
@kennytm
Copy link
Contributor

kennytm commented Aug 26, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 772016b

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 26, 2021
@ti-chi-bot ti-chi-bot merged commit be44d2c into pingcap:master Aug 27, 2021
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Aug 27, 2021
@ti-srebot
Copy link
Contributor

cherry pick to release-5.2 in PR #27623

joccau pushed a commit to joccau/tidb that referenced this pull request Sep 7, 2021
glorv pushed a commit to glorv/tidb that referenced this pull request Sep 8, 2021
@zhangjinpeng87
Copy link
Contributor

Is it safe to check schema for one random file? cc @lonng

@glorv
Copy link
Contributor

glorv commented Oct 11, 2021

Is it safe to check schema for one random file? cc @lonng

In the common case that the source files are exported by mydumper/dumpling or other similar tools, source files' schema should be the same, so only check one scheme should be enough.

In other case, e.g. user manually created source files with different columns, this check may not find potential error and lightning will still raise an error after parse the incompatible file. Since this kind of situation is rare, so I think this compromise between performance and correctness is ok.

cc @IANTHEREAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/lightning This issue is related to Lightning of TiDB. needs-cherry-pick-release-5.2 release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/migrate size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DBaaS : v5.2.0 lightning import parquet file in pending
6 participants