Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Ccr recovery file chunk size configurable #38370

Merged
merged 6 commits into from
Feb 5, 2019

Conversation

Tim-Brooks
Copy link
Contributor

This commit adds a byte setting ccr.indices.recovery.chunk_size. This
setting configs the size of file chunk requested while recovering from
remote.

@Tim-Brooks Tim-Brooks added >non-issue v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 labels Feb 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

I don't know if our plan is to actually merge this. But I pushed it up here. It un-awaits fix a test that only failed due to a different test failing (because I use that test here). If we do not actually merge this PR, I will do that change elsewhere.

@dliappis - You can use this branch for benchmarking. The setting is ccr.indices.recovery.chunk_size. Theoretically you should see increases in performance as you increase that setting. I sanity checked the setting be running in debug mode and ensuring that it is use if set.

@Tim-Brooks
Copy link
Contributor Author

If we do want to add this setting to production, I will probably need to add a max size (since I do the Math.toIntExact call).

@dliappis
Copy link
Contributor

dliappis commented Feb 5, 2019

Using the geopoint track and 1MB chunk size, the time for the entire recovery from remote task (this includes recovery from remote, plus whatever remaining docs where ops > global checkpoint that got fetched by CCR) went down to:

0:14:04.368000 i.e. 14mins.

Earlier with the default 64KB the same operation took: 3:59:44.900000 (approx 4hrs).

The latency of the connection is ~100-130ms.

The amount of docs/store size can be seen below:

green open leader tzH5IiTRQEKEGOQTRkuIiQ 3 1 60844404 0 8.3gb 4.9gb

green open follower y4trGB0kTNSeum5h109XxA 3 1 60844404 0 6.8gb 3.4gb

Network stats on follower

Node 0

image

Node 1

image

Node 2

image

Rally summary

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/


[INFO] Racing on track [geopoint], challenge [autogenerated-ids-then-recovery-from-remote] and car ['external'] with version [7.0.0-SNAPSHOT].

Running setup-delete-remote-follower-index                                     [100% done]
Running setup-delete-leader-index                                              [100% done]
Running check-cluster-health                                                   [100% done]
Running create-leader-indices                                                  [100% done]
Running wait-green                                                             [100% done]
Running bulk-leader-index-autogenerated-ids                                    [100% done]
Running refresh-after-bulk                                                     [100% done]
Running flush-all-indices                                                      [100% done]
Running join-ccr-clusters                                                      [100% done]
Running disable-recovery-rate-limiting-on-leader                               [100% done]
Running disable-recovery-rate-limiting-on-follower                             [100% done]
Running set-recovery-file-chunk-size-to-1MB-on-follower                        [100% done]
Running start-following-from-leader                                            [100% done]
Running wait-for-followers-to-sync                                             [100% done]
[INFO] Keeping benchmark candidate including index at (may need several GB).

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|   Lap |                                                         Metric |                                Task |      Value |   Unit |
|------:|---------------------------------------------------------------:|------------------------------------:|-----------:|-------:|
|   All |                     Cumulative indexing time of primary shards |                                     |    11.5354 |    min |
|   All |             Min cumulative indexing time across primary shards |                                     |     3.6826 |    min |
|   All |          Median cumulative indexing time across primary shards |                                     |    3.88147 |    min |
|   All |             Max cumulative indexing time across primary shards |                                     |    3.97132 |    min |
|   All |            Cumulative indexing throttle time of primary shards |                                     |          0 |    min |
|   All |    Min cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All | Median cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |    Max cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |                        Cumulative merge time of primary shards |                                     |    10.2602 |    min |
|   All |                       Cumulative merge count of primary shards |                                     |        196 |        |
|   All |                Min cumulative merge time across primary shards |                                     |    2.26245 |    min |
|   All |             Median cumulative merge time across primary shards |                                     |    3.85265 |    min |
|   All |                Max cumulative merge time across primary shards |                                     |     4.1451 |    min |
|   All |               Cumulative merge throttle time of primary shards |                                     |     4.7776 |    min |
|   All |       Min cumulative merge throttle time across primary shards |                                     |   0.789417 |    min |
|   All |    Median cumulative merge throttle time across primary shards |                                     |    1.87313 |    min |
|   All |       Max cumulative merge throttle time across primary shards |                                     |    2.11505 |    min |
|   All |                      Cumulative refresh time of primary shards |                                     |   0.633883 |    min |
|   All |                     Cumulative refresh count of primary shards |                                     |        364 |        |
|   All |              Min cumulative refresh time across primary shards |                                     |   0.206733 |    min |
|   All |           Median cumulative refresh time across primary shards |                                     |   0.210983 |    min |
|   All |              Max cumulative refresh time across primary shards |                                     |   0.216167 |    min |
|   All |                        Cumulative flush time of primary shards |                                     |       0.32 |    min |
|   All |                       Cumulative flush count of primary shards |                                     |         18 |        |
|   All |                Min cumulative flush time across primary shards |                                     |     0.0934 |    min |
|   All |             Median cumulative flush time across primary shards |                                     |   0.101533 |    min |
|   All |                Max cumulative flush time across primary shards |                                     |   0.125067 |    min |
|   All |                                             Total Young Gen GC |                                     |     28.026 |      s |
|   All |                                               Total Old Gen GC |                                     |          0 |      s |
|   All |                                                     Store size |                                     |    8.38973 |     GB |
|   All |                                                  Translog size |                                     |    3.18834 |     GB |
|   All |                                         Heap used for segments |                                     |    31.3554 |     MB |
|   All |                                       Heap used for doc values |                                     | 0.00350189 |     MB |
|   All |                                            Heap used for terms |                                     |    29.2354 |     MB |
|   All |                                            Heap used for norms |                                     |          0 |     MB |
|   All |                                           Heap used for points |                                     |   0.810921 |     MB |
|   All |                                    Heap used for stored fields |                                     |    1.30553 |     MB |
|   All |                                                  Segment count |                                     |         54 |        |
|   All |                                                 Min Throughput | bulk-leader-index-autogenerated-ids |     265589 | docs/s |
|   All |                                              Median Throughput | bulk-leader-index-autogenerated-ids |     269994 | docs/s |
|   All |                                                 Max Throughput | bulk-leader-index-autogenerated-ids |     273376 | docs/s |
|   All |                                        50th percentile latency | bulk-leader-index-autogenerated-ids |     131.36 |     ms |
|   All |                                        90th percentile latency | bulk-leader-index-autogenerated-ids |    182.432 |     ms |
|   All |                                        99th percentile latency | bulk-leader-index-autogenerated-ids |    249.803 |     ms |
|   All |                                      99.9th percentile latency | bulk-leader-index-autogenerated-ids |     457.52 |     ms |
|   All |                                       100th percentile latency | bulk-leader-index-autogenerated-ids |    567.282 |     ms |
|   All |                                   50th percentile service time | bulk-leader-index-autogenerated-ids |     131.36 |     ms |
|   All |                                   90th percentile service time | bulk-leader-index-autogenerated-ids |    182.432 |     ms |
|   All |                                   99th percentile service time | bulk-leader-index-autogenerated-ids |    249.803 |     ms |
|   All |                                 99.9th percentile service time | bulk-leader-index-autogenerated-ids |     457.52 |     ms |
|   All |                                  100th percentile service time | bulk-leader-index-autogenerated-ids |    567.282 |     ms |
|   All |                                                     error rate | bulk-leader-index-autogenerated-ids |          0 |      % |
|   All |                                                 Min Throughput |                   flush-all-indices |       1.94 |  ops/s |
|   All |                                              Median Throughput |                   flush-all-indices |       1.94 |  ops/s |
|   All |                                                 Max Throughput |                   flush-all-indices |       1.94 |  ops/s |
|   All |                                       100th percentile latency |                   flush-all-indices |    515.485 |     ms |
|   All |                                  100th percentile service time |                   flush-all-indices |    515.485 |     ms |
|   All |                                                     error rate |                   flush-all-indices |          0 |      % |
|   All |                                                 Min Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                              Median Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                                 Max Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                       100th percentile latency |          wait-for-followers-to-sync |     844368 |     ms |
|   All |                                  100th percentile service time |          wait-for-followers-to-sync |     844368 |     ms |
|   All |                                                     error rate |          wait-for-followers-to-sync |          0 |      % |


----------------------------------
[INFO] SUCCESS (took 1167 seconds)
----------------------------------

@dliappis
Copy link
Contributor

dliappis commented Feb 5, 2019

One clarification is that the small peak in the network graphs towards the end is due to peer recovery (within the network of the followers) and not related to recovery from remote or CCR.

@dliappis
Copy link
Contributor

dliappis commented Feb 5, 2019

Using 5MB chunk size

An additional experiment with 5MB chunk size (same track, same settings, with the exception of using 0 replicas this time to avoid the network spikes at the end of the benchmark).

  • Time taken for recovery from remote: 0:04:47.406000 (down from 14min when using 1MB chunk). Each node was hovering between 5.6 - 6.7MB/s of network traffic
Rally summary
Auto-updating Rally from origin

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/


[INFO] Racing on track [geopoint], challenge [autogenerated-ids-then-recovery-from-remote] and car ['external'] with version [7.0.0-SNAPSHOT].

Running setup-delete-remote-follower-index                                     [100% done]
Running setup-delete-leader-index                                              [100% done]
Running check-cluster-health                                                   [100% done]
Running create-leader-indices                                                  [100% done]
Running wait-green                                                             [100% done]
Running bulk-leader-index-autogenerated-ids                                    [100% done]
Running refresh-after-bulk                                                     [100% done]
Running flush-all-indices                                                      [100% done]
Running join-ccr-clusters                                                      [100% done]
Running disable-recovery-rate-limiting-on-leader                               [100% done]
Running disable-recovery-rate-limiting-on-follower                             [100% done]
Running set-recovery-file-chunk-size-on-follower                               [100% done]
Running start-following-from-leader                                            [100% done]
Running wait-for-followers-to-sync                                             [100% done]
[INFO] Keeping benchmark candidate including index at (may need several GB).

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|   Lap |                                                         Metric |                                Task |      Value |   Unit |
|------:|---------------------------------------------------------------:|------------------------------------:|-----------:|-------:|
|   All |                     Cumulative indexing time of primary shards |                                     |     11.493 |    min |
|   All |             Min cumulative indexing time across primary shards |                                     |    3.64602 |    min |
|   All |          Median cumulative indexing time across primary shards |                                     |    3.89948 |    min |
|   All |             Max cumulative indexing time across primary shards |                                     |    3.94748 |    min |
|   All |            Cumulative indexing throttle time of primary shards |                                     |          0 |    min |
|   All |    Min cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All | Median cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |    Max cumulative indexing throttle time across primary shards |                                     |          0 |    min |
|   All |                        Cumulative merge time of primary shards |                                     |    3.88052 |    min |
|   All |                       Cumulative merge count of primary shards |                                     |         41 |        |
|   All |                Min cumulative merge time across primary shards |                                     |    1.24525 |    min |
|   All |             Median cumulative merge time across primary shards |                                     |    1.28603 |    min |
|   All |                Max cumulative merge time across primary shards |                                     |    1.34923 |    min |
|   All |               Cumulative merge throttle time of primary shards |                                     |      1.408 |    min |
|   All |       Min cumulative merge throttle time across primary shards |                                     |     0.4442 |    min |
|   All |    Median cumulative merge throttle time across primary shards |                                     |     0.4598 |    min |
|   All |       Max cumulative merge throttle time across primary shards |                                     |      0.504 |    min |
|   All |                      Cumulative refresh time of primary shards |                                     |   0.187767 |    min |
|   All |                     Cumulative refresh count of primary shards |                                     |         84 |        |
|   All |              Min cumulative refresh time across primary shards |                                     |  0.0553833 |    min |
|   All |           Median cumulative refresh time across primary shards |                                     |  0.0585667 |    min |
|   All |              Max cumulative refresh time across primary shards |                                     |  0.0738167 |    min |
|   All |                        Cumulative flush time of primary shards |                                     |   0.509167 |    min |
|   All |                       Cumulative flush count of primary shards |                                     |         18 |        |
|   All |                Min cumulative flush time across primary shards |                                     |     0.1652 |    min |
|   All |             Median cumulative flush time across primary shards |                                     |   0.167733 |    min |
|   All |                Max cumulative flush time across primary shards |                                     |   0.176233 |    min |
|   All |                                             Total Young Gen GC |                                     |     14.692 |      s |
|   All |                                               Total Old Gen GC |                                     |          0 |      s |
|   All |                                                     Store size |                                     |    4.12151 |     GB |
|   All |                                                  Translog size |                                     |     1.5017 |     GB |
|   All |                                         Heap used for segments |                                     |    27.0411 |     MB |
|   All |                                       Heap used for doc values |                                     | 0.00583649 |     MB |
|   All |                                            Heap used for terms |                                     |    25.1515 |     MB |
|   All |                                            Heap used for norms |                                     |          0 |     MB |
|   All |                                           Heap used for points |                                     |   0.784356 |     MB |
|   All |                                    Heap used for stored fields |                                     |    1.09942 |     MB |
|   All |                                                  Segment count |                                     |         62 |        |
|   All |                                                 Min Throughput | bulk-leader-index-autogenerated-ids |     362048 | docs/s |
|   All |                                              Median Throughput | bulk-leader-index-autogenerated-ids |     378572 | docs/s |
|   All |                                                 Max Throughput | bulk-leader-index-autogenerated-ids |     381745 | docs/s |
|   All |                                        50th percentile latency | bulk-leader-index-autogenerated-ids |     83.327 |     ms |
|   All |                                        90th percentile latency | bulk-leader-index-autogenerated-ids |    106.878 |     ms |
|   All |                                        99th percentile latency | bulk-leader-index-autogenerated-ids |    164.788 |     ms |
|   All |                                      99.9th percentile latency | bulk-leader-index-autogenerated-ids |    2031.68 |     ms |
|   All |                                       100th percentile latency | bulk-leader-index-autogenerated-ids |    2324.91 |     ms |
|   All |                                   50th percentile service time | bulk-leader-index-autogenerated-ids |     83.327 |     ms |
|   All |                                   90th percentile service time | bulk-leader-index-autogenerated-ids |    106.878 |     ms |
|   All |                                   99th percentile service time | bulk-leader-index-autogenerated-ids |    164.788 |     ms |
|   All |                                 99.9th percentile service time | bulk-leader-index-autogenerated-ids |    2031.68 |     ms |
|   All |                                  100th percentile service time | bulk-leader-index-autogenerated-ids |    2324.91 |     ms |
|   All |                                                     error rate | bulk-leader-index-autogenerated-ids |          0 |      % |
|   All |                                                 Min Throughput |                   flush-all-indices |       3.33 |  ops/s |
|   All |                                              Median Throughput |                   flush-all-indices |       3.33 |  ops/s |
|   All |                                                 Max Throughput |                   flush-all-indices |       3.33 |  ops/s |
|   All |                                       100th percentile latency |                   flush-all-indices |     300.37 |     ms |
|   All |                                  100th percentile service time |                   flush-all-indices |     300.37 |     ms |
|   All |                                                     error rate |                   flush-all-indices |          0 |      % |
|   All |                                                 Min Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                              Median Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                                 Max Throughput |          wait-for-followers-to-sync |          0 |  ops/s |
|   All |                                       100th percentile latency |          wait-for-followers-to-sync |     287406 |     ms |
|   All |                                  100th percentile service time |          wait-for-followers-to-sync |     287406 |     ms |
|   All |                                                     error rate |          wait-for-followers-to-sync |          0 |      % |


---------------------------------
[INFO] SUCCESS (took 546 seconds)
---------------------------------

Network stats on the follower

Node 0

image

Node 1

image

Node 2

image

Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (also see the benchmarking results).

@Tim-Brooks
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/default-distro

@Tim-Brooks Tim-Brooks merged commit 4a15e2b into elastic:master Feb 5, 2019
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 5, 2019
* master: (23 commits)
  Lift retention lease expiration to index shard (elastic#38380)
  Make Ccr recovery file chunk size configurable (elastic#38370)
  Prevent CCR recovery from missing documents (elastic#38237)
  re-enables awaitsfixed datemath tests (elastic#38376)
  Types removal fix FullClusterRestartIT warnings (elastic#38445)
  Make sure to reject mappings with type _doc when include_type_name is false. (elastic#38270)
  Updates the grok patterns to be consistent with logstash (elastic#27181)
  Ignore type-removal warnings in XPackRestTestHelper (elastic#38431)
  testHlrcFromXContent() should respect assertToXContentEquivalence() (elastic#38232)
  add basic REST test for geohash_grid (elastic#37996)
  Remove DiscoveryPlugin#getDiscoveryTypes (elastic#38414)
  Fix the clock resolution to millis in GetWatchResponseTests (elastic#38405)
  Throw AssertionError when no master (elastic#38432)
  `if_seq_no` and `if_primary_term` parameters aren't wired correctly in REST Client's CRUD API (elastic#38411)
  Enable CronEvalToolTest.testEnsureDateIsShownInRootLocale (elastic#38394)
  Fix failures in BulkProcessorIT#testGlobalParametersAndBulkProcessor. (elastic#38129)
  SQL: Implement CURRENT_DATE (elastic#38175)
  Mute testReadRequestsReturnLatestMappingVersion (elastic#38438)
  [ML] Report index unavailable instead of waiting for lazy node (elastic#38423)
  Update Rollup Caps to allow unknown fields (elastic#38339)
  ...
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Feb 8, 2019
This commit adds a byte setting `ccr.indices.recovery.chunk_size`. This
setting configs the size of file chunk requested while recovering from
remote.
@Tim-Brooks Tim-Brooks deleted the configurable_buffers branch December 18, 2019 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >non-issue v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants