Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential data inconsistency under heavy ddl operation #5046

Merged

Conversation

lidezhu
Copy link
Contributor

@lidezhu lidezhu commented Jun 2, 2022

What problem does this PR solve?

Issue Number: ref #5032

Problem Summary: Currently we cache decoding schema for decoding raft data if a table schema doesn't change. And we judge whether a table schema has changed based on the table schema version.

But the schema version is not strictly consistent with the actual table schema which can be seen at https://github.com/pingcap/tiflash/blob/master/dbms/src/TiDB/Schema/SchemaBuilder.cpp#L362. That is when applying different schema changes in a diff, the table schema version will be set to the latest schema version after the first schema change is applied.

More concretely, when a lossy ddl change occurs, it will trigger drop column and add column schema changes and also rewrite some data at the same time. 
The schema changes will be applied one by one, and because tiflash updates the table schema version ahead of time when applying schema diff, after applying the drop column schema change, it will update the schema version to the latest schema version.

And if the decode thread tries to obtain the current schema for decoding data before the add column is applied, the current schema and the latest schema version will be cached. Then after the subsequent add column operation is applied, the table schema version will not be updated, so the cache of the decode thread will not be invalidated.

Therefore, the decode thread will decode the new data with an older schema, considering that the new added column is a dropped column and discarding it.

In addition, there is a lower chance of triggering this problem in the case of frequent add column and insert data.

cherry pick of #5044

What is changed and how it works?

Add an internal schema version for DecodingStorageSchemaSnapshot.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix potential data inconsistency under heavy ddl operation

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jun 2, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • JaySon-Huang
  • flowbehappy

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/cherry-pick-not-approved size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 2, 2022
@lidezhu lidezhu self-assigned this Jun 2, 2022
@lidezhu lidezhu added type/bugfix This PR fixes a bug. CHERRY-PICK cherry pick labels Jun 2, 2022
@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl-6.1 branch 3 times, most recently from 443ba75 to 4d64111 Compare June 2, 2022 02:39
@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl-6.1 branch from 4d64111 to 05643a5 Compare June 2, 2022 02:40
@lidezhu lidezhu changed the title Fix potential decoding error under heavy ddl operation Fix potential data inconsistency under heavy ddl operation Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/run-all-tests

@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl-6.1 branch from 1ca99aa to bea8447 Compare June 2, 2022 03:32
@sre-bot sre-bot added the cherry-pick-approved Cherry pick PR approved by release team. label Jun 2, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Jun 2, 2022

Coverage for changed files

Filename                                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/FailPoint.cpp                                         458                53    88.43%           6                 0   100.00%          56                 4    92.86%         152                53    65.13%
Debug/dbgFuncSchema.cpp                                       50                50     0.00%           5                 5     0.00%          77                77     0.00%          30                30     0.00%
Storages/IManageableStorage.h                                 20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
Storages/StorageDeltaMerge.cpp                               679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
Storages/StorageDeltaMerge.h                                  11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Storages/Transaction/DecodingStorageSchemaSnapshot.h          35                 1    97.14%           1                 0   100.00%          61                 1    98.36%          26                 2    92.31%
Storages/Transaction/PartitionStreams.cpp                    262               213    18.70%          21                13    38.10%         569               391    31.28%         138               117    15.22%
Storages/Transaction/SchemaBuilder.cpp                       846               805     4.85%          47                43     8.51%        1065               993     6.76%         492               472     4.07%
Storages/Transaction/TiDBSchemaSyncer.h                      140               132     5.71%          13                 9    30.77%         125               100    20.00%          52                51     1.92%
Storages/Transaction/tests/RowCodecTestUtils.h                80                 4    95.00%          14                 0   100.00%         168                 1    99.40%          30                 2    93.33%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                       2581              1610    37.62%         196               120    38.78%        3483              2336    32.93%        1298               970    25.27%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18239      9791             46.32%    204265  98086        51.98%

full coverage report (for internal network access only)

@lidezhu lidezhu force-pushed the fix-decode-under-heavy-ddl-6.1 branch from bea8447 to 57d9908 Compare June 2, 2022 03:44
Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/run-all-tests

@sre-bot
Copy link
Collaborator

sre-bot commented Jun 2, 2022

Coverage for changed files

Filename                                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/FailPoint.cpp                                         458                53    88.43%           6                 0   100.00%          56                 4    92.86%         152                53    65.13%
Debug/dbgFuncSchema.cpp                                       50                50     0.00%           5                 5     0.00%          77                77     0.00%          30                30     0.00%
Storages/IManageableStorage.h                                 20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
Storages/StorageDeltaMerge.cpp                               679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
Storages/StorageDeltaMerge.h                                  11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Storages/Transaction/DecodingStorageSchemaSnapshot.h          35                 1    97.14%           1                 0   100.00%          61                 1    98.36%          26                 2    92.31%
Storages/Transaction/PartitionStreams.cpp                    262               213    18.70%          21                13    38.10%         569               391    31.28%         138               117    15.22%
Storages/Transaction/SchemaBuilder.cpp                       846               805     4.85%          47                43     8.51%        1065               993     6.76%         492               472     4.07%
Storages/Transaction/TiDBSchemaSyncer.h                      140               132     5.71%          13                 9    30.77%         125               100    20.00%          52                51     1.92%
Storages/Transaction/tests/RowCodecTestUtils.h                80                 4    95.00%          14                 0   100.00%         168                 1    99.40%          30                 2    93.33%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                       2581              1610    37.62%         196               120    38.78%        3483              2336    32.93%        1298               970    25.27%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18239      9791             46.32%    204265  98052        52.00%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot removed the status/LGT1 Indicates that a PR has LGTM 1. label Jun 2, 2022
@ti-chi-bot ti-chi-bot added the status/LGT2 Indicates that a PR has LGTM 2. label Jun 2, 2022
@lidezhu
Copy link
Contributor Author

lidezhu commented Jun 2, 2022

/merge

@ti-chi-bot
Copy link
Member

@lidezhu: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 57d9908

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 2, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Jun 2, 2022

Coverage for changed files

Filename                                                 Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/FailPoint.cpp                                         458                53    88.43%           6                 0   100.00%          56                 4    92.86%         152                53    65.13%
Debug/dbgFuncSchema.cpp                                       50                50     0.00%           5                 5     0.00%          77                77     0.00%          30                30     0.00%
Storages/IManageableStorage.h                                 20                18    10.00%          20                18    10.00%          38                36     5.26%           0                 0         -
Storages/StorageDeltaMerge.cpp                               679               328    51.69%          58                26    55.17%        1307               725    44.53%         378               243    35.71%
Storages/StorageDeltaMerge.h                                  11                 6    45.45%          11                 6    45.45%          17                 8    52.94%           0                 0         -
Storages/Transaction/DecodingStorageSchemaSnapshot.h          35                 1    97.14%           1                 0   100.00%          61                 1    98.36%          26                 2    92.31%
Storages/Transaction/PartitionStreams.cpp                    262               213    18.70%          21                13    38.10%         569               391    31.28%         138               117    15.22%
Storages/Transaction/SchemaBuilder.cpp                       846               805     4.85%          47                43     8.51%        1065               993     6.76%         492               472     4.07%
Storages/Transaction/TiDBSchemaSyncer.h                      140               132     5.71%          13                 9    30.77%         125               100    20.00%          52                51     1.92%
Storages/Transaction/tests/RowCodecTestUtils.h                80                 4    95.00%          14                 0   100.00%         168                 1    99.40%          30                 2    93.33%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                       2581              1610    37.62%         196               120    38.78%        3483              2336    32.93%        1298               970    25.27%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18239      9791             46.32%    204265  98070        51.99%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot merged commit a547a53 into pingcap:release-6.1 Jun 2, 2022
@lidezhu lidezhu deleted the fix-decode-under-heavy-ddl-6.1 branch June 2, 2022 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CHERRY-PICK cherry pick cherry-pick-approved Cherry pick PR approved by release team. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants