Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch read-index request by start-ts to reduce time cost under high concurrency scenarios #3971

Merged
merged 12 commits into from
Feb 17, 2022
Merged

Conversation

solotzg
Copy link
Contributor

@solotzg solotzg commented Feb 7, 2022

What problem does this PR solve?

Issue Number: close #3555

Problem Summary:

  • If tiflash node contains 30k region and it meet a sql with 10 union. Because each union will result in a whole table scan. There will be 300k read-index tasks at same time.
  • Under high concurrency scenarios, a large amount of read-index tasks will cause the channel between proxy and tikv becomes full easily. It may result to request timeout finally because such task might be drop by any side.

What is changed and how it works?

  • For any region, if a read-index task with start-ts ts_x can get result from leader successfully(without region error or locked info). All other read-index task with start-ts <= ts_x can reuse such result directly. We can sacrifice delay in exchange for concurrency stability.
  • support async read-index-task & timer tidb-engine-ext#43, implementation in proxy side.
  • We introduces a new structure ReadIndexWorker to collect and batch read-index tasks asynchronously.
  • New config items
[flash]
## The number of context to run read-index worker. Set 0 to disable async read-index worker.
# read_index_runner_count = 1
## The minimum duration to handle read-index tasks in each worker.
# read_index_worker_tick_ms = 10

Benchmark

  • set read_index_runner_count = 1 & read_index_worker_tick_ms = 10
  • batch-read-index-v1 is the original implementation of read index
  • async-read-index is the new one.
  • With low concurrency, batch-read-index-v1 has better performance
region count each round 5000, loop count 20, concurrency 1, type `async-read-index`, time cost min 31ms, max 52ms, avg 39ms, median 39ms, 
region count each round 5000, loop count 20, concurrency 4, type `async-read-index`, time cost min 39ms, max 65ms, avg 50ms, median 49ms, 
region count each round 5000, loop count 20, concurrency 16, type `async-read-index`, time cost min 58ms, max 124ms, avg 74ms, median 72ms, 
region count each round 5000, loop count 20, concurrency 32, type `async-read-index`, time cost min 84ms, max 397ms, avg 129ms, median 108ms, 
region count each round 5000, loop count 20, concurrency 64, type `async-read-index`, time cost min 39ms, max 660ms, avg 164ms, median 163ms, 

region count each round 30000, loop count 10, concurrency 1, type `async-read-index`, time cost min 191ms, max 322ms, avg 248ms, median 237ms, 
region count each round 30000, loop count 10, concurrency 4, type `async-read-index`, time cost min 224ms, max 388ms, avg 264ms, median 242ms, 
region count each round 30000, loop count 10, concurrency 8, type `async-read-index`, time cost min 254ms, max 415ms, avg 335ms, median 327ms, 
region count each round 30000, loop count 10, concurrency 16, type `async-read-index`, time cost min 245ms, max 817ms, avg 458ms, median 439ms, 
region count each round 30000, loop count 10, concurrency 32, type `async-read-index`, time cost min 532ms, max 1000ms, avg 749ms, median 738ms, 
region count each round 30000, loop count 10, concurrency 64, type `async-read-index`, time cost min 575ms, max 1189ms, avg 1045ms, median 1074ms, 


// may meet timeout easily if concurrency >= 32 
region count each round 5000, loop count 20, concurrency 1, type `batch-read-index-v1`, time cost min 15ms, max 26ms, avg 16ms, median 17ms, 
region count each round 5000, loop count 20, concurrency 4, type `batch-read-index-v1`, time cost min 21ms, max 85ms, avg 48ms, median 50ms, 
region count each round 5000, loop count 20, concurrency 16, type `batch-read-index-v1`, time cost min 24ms, max 419ms, avg 195ms, median 195ms, 
region count each round 5000, loop count 20, concurrency 32, type `batch-read-index-v1`, time cost min 91ms, max 1015ms, avg 542ms, median 560ms, 
region count each round 5000, loop count 20, concurrency 64, type `batch-read-index-v1`, time cost min 69ms, max 2244ms, avg 1103ms, median 1095ms, 

// may meet timeout easily if concurrency >= 8
region count each round 30000, loop count 10, concurrency 1, type `batch-read-index-v1`, time cost min 98ms, max 123ms, avg 108ms, median 105ms, 
region count each round 30000, loop count 10, concurrency 4, type `batch-read-index-v1`, time cost min 107ms, max 449ms, avg 280ms, median 280ms, 
region count each round 30000, loop count 10, concurrency 8, type `batch-read-index-v1`, time cost min 109ms, max 1675ms, avg 788ms, median 758ms, 
region count each round 30000, loop count 10, concurrency 16, type `batch-read-index-v1`, time cost min 160ms, max 3926ms, avg 2223ms, median 2333ms, 
region count each round 30000, loop count 10, concurrency 32, type `batch-read-index-v1`, time cost min 365ms, max 7432ms, avg 4810ms, median 5699ms, 

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix the issue that learner read process takes too much time under high concurrency scenarios

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 7, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • birdstorm

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 7, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Feb 7, 2022

/run-all-tests

@pingcap pingcap deleted a comment from sre-bot Feb 7, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Feb 7, 2022

/run-unit-test

@solotzg solotzg changed the title [DNM] Batch read-index request by start-ts to reduce time cost under high concurrency scenarios Batch read-index request by start-ts to reduce time cost under high concurrency scenarios Feb 7, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Feb 7, 2022

/hold

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2022
@solotzg solotzg added the type/enhancement The issue or PR belongs to an enhancement. label Feb 7, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Feb 7, 2022

/run-unit-test

std::shared_ptr<RawMockReadIndexTask> data;
};

struct MockRaftStoreProxy : MutexLockWrap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we still have no common Mock class for , I wonder if we can move MockRaftStoreProxy and its auxiliary classes into KVStore, just like MockTiDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding dbms/src/Debug/MockRaftStoreProxy.h

@ti-chi-bot ti-chi-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 13, 2022
@pingcap pingcap deleted a comment from sre-bot Feb 14, 2022
@solotzg solotzg requested a review from CalvinNeo February 14, 2022 10:01
@solotzg solotzg added the type/code-quality-improvement PR that can improve the code quality label Feb 14, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Feb 14, 2022

/run-all-tests

@pingcap pingcap deleted a comment from sre-bot Feb 15, 2022
@pingcap pingcap deleted a comment from sre-bot Feb 15, 2022
@pingcap pingcap deleted a comment from sre-bot Feb 15, 2022
@pingcap pingcap deleted a comment from sre-bot Feb 15, 2022
@pingcap pingcap deleted a comment from sre-bot Feb 15, 2022
Signed-off-by: Zhigao Tong <[email protected]>
Signed-off-by: Zhigao Tong <[email protected]>
@solotzg
Copy link
Contributor Author

solotzg commented Feb 15, 2022

/run-unit-test

@pingcap pingcap deleted a comment from sre-bot Feb 16, 2022
@pingcap pingcap deleted a comment from sre-bot Feb 16, 2022
Copy link
Contributor

@birdstorm birdstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT1 Indicates that a PR has LGTM 1. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 16, 2022
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@solotzg
Copy link
Contributor Author

solotzg commented Feb 17, 2022

/run-all-tests

@solotzg solotzg removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 17, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Feb 17, 2022

Coverage for changed files

Filename                                                   Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Debug/DBGInvoker.cpp                                            30                29     3.33%          10                 9    10.00%         137                75    45.26%          14                14     0.00%
Debug/MockRaftStoreProxy.cpp                                    80                 5    93.75%          28                 2    92.86%         210                12    94.29%          46                 4    91.30%
Debug/MockRaftStoreProxy.h                                       2                 0   100.00%           2                 0   100.00%           6                 0   100.00%           0                 0         -
Debug/ReadIndexStressTest.cpp                                  120               120     0.00%          11                11     0.00%         220               220     0.00%          64                64     0.00%
Storages/Transaction/ApplySnapshot.cpp                         298               129    56.71%          14                 4    71.43%         464               215    53.66%         148                91    38.51%
Storages/Transaction/KVStore.cpp                               389                78    79.95%          40                 2    95.00%         695                80    88.49%         212                80    62.26%
Storages/Transaction/KVStore.h                                   5                 3    40.00%           5                 3    40.00%           5                 3    40.00%           0                 0         -
Storages/Transaction/LearnerRead.cpp                           200               200     0.00%          24                24     0.00%         558               558     0.00%         116               116     0.00%
Storages/Transaction/ProxyFFI.cpp                              130                87    33.08%          41                26    36.59%         328               246    25.00%          58                33    43.10%
Storages/Transaction/ProxyFFI.h                                  3                 2    33.33%           3                 2    33.33%          27                26     3.70%           0                 0         -
Storages/Transaction/ReadIndexWorker.cpp                       298                 8    97.32%          81                 3    96.30%         697                 9    98.71%         168                11    93.45%
Storages/Transaction/ReadIndexWorker.h                          15                 0   100.00%          12                 0   100.00%          36                 0   100.00%           2                 0   100.00%
Storages/Transaction/Region.cpp                                344                46    86.63%          60                 1    98.33%         673                41    93.91%         186                62    66.67%
Storages/Transaction/Region.h                                   20                 2    90.00%          10                 2    80.00%          28                 2    92.86%          14                 4    71.43%
Storages/Transaction/RegionData.cpp                             65                 7    89.23%          20                 1    95.00%         172                12    93.02%          50                16    68.00%
Storages/Transaction/RegionMeta.cpp                             94                 6    93.62%          43                 1    97.67%         294                21    92.86%          48                10    79.17%
Storages/Transaction/RegionMeta.h                                1                 0   100.00%           1                 0   100.00%           7                 0   100.00%           0                 0         -
Storages/Transaction/TMTContext.cpp                             53                37    30.19%          34                19    44.12%         149               102    31.54%          12                11     8.33%
Storages/Transaction/TMTContext.h                                4                 3    25.00%           4                 3    25.00%           4                 3    25.00%           0                 0         -
Storages/Transaction/Utils.h                                     7                 1    85.71%           7                 1    85.71%          17                 1    94.12%           0                 0         -
Storages/Transaction/tests/gtest_kvstore.cpp                  3548               640    81.96%          25                 2    92.00%        1181                29    97.54%        1060               554    47.74%
Storages/Transaction/tests/gtest_read_index_worker.cpp        1204               186    84.55%          19                 2    89.47%         561                11    98.04%         392               169    56.89%
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                         6910              1589    77.00%         494               118    76.11%        6469              1666    74.25%        2590              1239    52.16%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16545      9501             42.57%    182852  96102        47.44%

full coverage report (for internal network access only)

@pingcap pingcap deleted a comment from sre-bot Feb 17, 2022
@solotzg solotzg added the status/can-merge Indicates a PR has been approved by a committer. label Feb 17, 2022
@ti-chi-bot ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Feb 17, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Feb 17, 2022

/merge

@ti-chi-bot
Copy link
Member

@solotzg: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 2144bfe

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 17, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Feb 17, 2022

Coverage for changed files

Filename                                                   Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Debug/DBGInvoker.cpp                                            30                29     3.33%          10                 9    10.00%         137                75    45.26%          14                14     0.00%
Debug/MockRaftStoreProxy.cpp                                    80                 5    93.75%          28                 2    92.86%         210                12    94.29%          46                 4    91.30%
Debug/MockRaftStoreProxy.h                                       2                 0   100.00%           2                 0   100.00%           6                 0   100.00%           0                 0         -
Debug/ReadIndexStressTest.cpp                                  120               120     0.00%          11                11     0.00%         220               220     0.00%          64                64     0.00%
Storages/Transaction/ApplySnapshot.cpp                         298               129    56.71%          14                 4    71.43%         464               215    53.66%         148                91    38.51%
Storages/Transaction/KVStore.cpp                               389                78    79.95%          40                 2    95.00%         695                80    88.49%         212                80    62.26%
Storages/Transaction/KVStore.h                                   5                 3    40.00%           5                 3    40.00%           5                 3    40.00%           0                 0         -
Storages/Transaction/LearnerRead.cpp                           200               200     0.00%          24                24     0.00%         558               558     0.00%         116               116     0.00%
Storages/Transaction/ProxyFFI.cpp                              130                87    33.08%          41                26    36.59%         328               246    25.00%          58                33    43.10%
Storages/Transaction/ProxyFFI.h                                  3                 2    33.33%           3                 2    33.33%          27                26     3.70%           0                 0         -
Storages/Transaction/ReadIndexWorker.cpp                       298                 8    97.32%          81                 3    96.30%         697                 9    98.71%         168                11    93.45%
Storages/Transaction/ReadIndexWorker.h                          15                 0   100.00%          12                 0   100.00%          36                 0   100.00%           2                 0   100.00%
Storages/Transaction/Region.cpp                                344                46    86.63%          60                 1    98.33%         673                41    93.91%         186                62    66.67%
Storages/Transaction/Region.h                                   20                 2    90.00%          10                 2    80.00%          28                 2    92.86%          14                 4    71.43%
Storages/Transaction/RegionData.cpp                             65                 7    89.23%          20                 1    95.00%         172                12    93.02%          50                16    68.00%
Storages/Transaction/RegionMeta.cpp                             94                 6    93.62%          43                 1    97.67%         294                21    92.86%          48                10    79.17%
Storages/Transaction/RegionMeta.h                                1                 0   100.00%           1                 0   100.00%           7                 0   100.00%           0                 0         -
Storages/Transaction/TMTContext.cpp                             53                37    30.19%          34                19    44.12%         149               102    31.54%          12                11     8.33%
Storages/Transaction/TMTContext.h                                4                 3    25.00%           4                 3    25.00%           4                 3    25.00%           0                 0         -
Storages/Transaction/Utils.h                                     7                 1    85.71%           7                 1    85.71%          17                 1    94.12%           0                 0         -
Storages/Transaction/tests/gtest_kvstore.cpp                  3548               640    81.96%          25                 2    92.00%        1181                29    97.54%        1060               554    47.74%
Storages/Transaction/tests/gtest_read_index_worker.cpp        1204               186    84.55%          19                 2    89.47%         561                11    98.04%         392               169    56.89%
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                         6910              1589    77.00%         494               118    76.11%        6469              1666    74.25%        2590              1239    52.16%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
16545      9501             42.57%    182852  96101        47.44%

full coverage report (for internal network access only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/code-quality-improvement PR that can improve the code quality type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High read index duration
5 participants