Reduce recovery time with compress or secure transport #36981

dnhatn · 2018-12-24T21:55:26Z

Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the transport communication is secure, we might not be able to saturate the network bandwidth because encrypting/decrypting are compute-intensive.

With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target.

Below are the benchmark results for PMC and NYC_taxis datasets. The benchmark consists of two GCP instances (8CPU, 32GB RAM, 12GiB bandwidth and local SSD).

PMC (20.2 GB)

Transport	Baseline	chunks=1	chunks=2	chunks=3	chunks=4
Plain	184s	137s	106s	105s	106s
TLS	346s	294s	176s	153s	117s
Compress	1556s	1407s	1193s	1183s	1211s
Compress + TLS	n/a	n/a	n/a	n/a	n/a

NYC_Taxis (38.6GB)

Transport	Baseline	chunks=1	chunks=2	chunks=3	chunks=4
Plain	321s	249s	191s	*	*
TLS	618s	539s	323s	290s	213s
Compress	2622s	2421s	2018s	2029s	n/a
Compress + TLS	n/a	n/a	n/a	n/a	n/a

Relates #33844

elasticmachine · 2018-12-24T21:55:27Z

Pinging @elastic/es-distributed

dnhatn · 2018-12-24T22:49:42Z

run gradle build tests 2

original-brownbear

LGTM, just some minor suggestions
(probably still best to wait for a second opinion from someone else :))

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

server/src/main/java/org/elasticsearch/indices/recovery/RemoteRecoveryTargetHandler.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySettings.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

server/src/test/java/org/elasticsearch/indices/recovery/RecoverySourceHandlerTests.java

server/src/test/java/org/elasticsearch/recovery/RelocationIT.java

server/src/test/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetServiceTests.java

dnhatn · 2018-12-27T11:23:44Z

@original-brownbear Thanks for looking. I've addressed all your comments :).

ywelsch

Thanks @dnhatn. I like the simplicity of this PR. I've left some comments and would like to ask you to also run tests with tcp.compress enabled (both for TLS enabled/disabled). It would also be good to see how much throughput we get at these numbers. For example, can we saturate a 10Gbit network with num_chunks = 2?

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySettings.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

dnhatn · 2019-01-02T07:09:45Z

@ywelsch I have addressed all your comments. Would you please have another look?

dnhatn · 2019-01-02T07:13:03Z

I ran a more realistic benchmark which consists of two GCP instances (8CPU, 32GB RAM, 10GiB bandwidth and local SSD). Below are the results for PMC and NYC_taxis datasets.

PMC (20.2 GB)

Transport	Baseline	chunks=1	chunks=2	chunks=3	chunks=4
Plain	184s	137s	106s	105s	106s
TLS	346s	294s	176s	153s	117s
Compress	1556s	1407s	1193s	1183s	1211s
Compress + TLS	n/a	n/a	n/a	n/a	n/a

NYC_Taxis (38.6GB)

Transport	Baseline	chunks=1	chunks=2	chunks=3	chunks=4
Plain	321s	249s	191s	*	*
TLS	618s	539s	323s	290s	213s
Compress	2622s	2421s	2018s	2029s	n/a
Compress + TLS	n/a	n/a	n/a	n/a	n/a

The current approach does not reduce the recovery time much with compression because the compressing is too expensive. It takes 20 minutes to compress 20GB of Lucene index and compressing happens on (and blocks) the recovery thread. The recovery time is reduced linearly with the max_concurrent_file_chunks if we compress the file chunk requests in parallel. However, I don't think we should do it as it would take all the CPU which we should reserve for higher priority tasks such as search. Then should we disable compression for the file-chunk requests regardless of the compression setting given that it's too expensive and only saves around 16% of bandwidth? Notes that we compress chunk by chunk, not the whole file. Here is the benchmark that sends file-chunk requests in parallel for PMC.

Transport	Baseline	par_chunks=2	par_chunks=3	par_chunks=4
Compress	1556s	737s	514s	207s

ywelsch · 2019-01-03T11:23:13Z

The recovery time is reduced linearly with the max_concurrent_file_chunks if we compress the file chunk requests in parallel. However, I don't think we should do it as it would take all the CPU which we should reserve for higher priority tasks such as search.

Note that by default we still throttle the sending of chunks, with which the user can control how much CPU to trade for recovery throughput.

only saves around 16% of bandwidth?

Is the index using best_compression? Is it force-merged?

dnhatn · 2019-01-03T14:01:35Z

Is the index using best_compression? Is it force-merged?

No, the index uses the default index_codec and no force_merged is called.

ywelsch · 2019-01-07T15:15:34Z

With the recent change, it does not pipeline sending requests for different files (i.e. one file needs to be completed before we start with next one)?

dnhatn · 2019-01-07T16:23:00Z

i.e. one file needs to be completed before we start with next one

Yes, the previous change and this change both wait for the completion of the current file before sending the next file. This is because we wait for all outstanding requests when we close the current RecoveryOutputStream.

ywelsch · 2019-01-07T16:57:18Z

How difficult would it be to lift that limitation?

dnhatn · 2019-01-13T22:56:47Z

@ywelsch
I re-ran the recovery benchmark. The new implementation is slightly (but insignificantly) faster than the previous implementation. This is because we continue sending file chunks in parallel even when switching files. I also verified the rate limiter with max_concurrent_file_chunks with max_bytes_per_sec=10mb, 40mb and 100mb. It works perfectly. You can find the result here.

I have responded to your comments. Please give this PR another shot. Thank you.

docs/reference/modules/indices/recovery.asciidoc

dnhatn · 2019-01-14T19:49:12Z

@ywelsch Thanks so much for proposing this idea and reviewing. Thanks @original-brownbear.

* master: (28 commits) Introduce retention lease serialization (elastic#37447) Update Delete Watch to allow unknown fields (elastic#37435) Make finalize step of recovery source non-blocking (elastic#37388) Update the default for include_type_name to false. (elastic#37285) Security: remove SSL settings fallback (elastic#36846) Adding mapping for hostname field (elastic#37288) Relax assertSameDocIdsOnShards assertion Reduce recovery time with compress or secure transport (elastic#36981) Implement ccr file restore (elastic#37130) Fix Eclipse specific compilation issue (elastic#37419) Performance fix. Reduce deprecation calls for the same bulk request (elastic#37415) [ML] Use String rep of Version in map for serialisation (elastic#37416) Cleanup Deadcode in Rest Tests (elastic#37418) Mute IndexShardRetentionLeaseTests.testCommit elastic#37420 unmuted test Remove unused index store in directory service Improve CloseWhileRelocatingShardsIT (elastic#37348) Fix ClusterBlock serialization and Close Index API logic after backport to 6.x (elastic#37360) Update the scroll example in the docs (elastic#37394) Update analysis.asciidoc (elastic#37404) ...

Today file-chunks are sent sequentially one by one in peer-recovery. This is a correct choice since the implementation is straightforward and recovery is network bound in most of the time. However, if the connection is encrypted, we might not be able to saturate the network pipe because encrypting/decrypting are cpu bound rather than network-bound. With this commit, a source node can send multiple (default to 2) file-chunks without waiting for the acknowledgments from the target. Below are the benchmark results for PMC and NYC_taxis. - PMC (20.2 GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| -------- | -------- | -------- | -------- | | Plain | 184s | 137s | 106s | 105s | 106s | | TLS | 346s | 294s | 176s | 153s | 117s | | Compress | 1556s | 1407s | 1193s | 1183s | 1211s | - NYC_Taxis (38.6GB) | Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 | | ----------| ---------| ---------| ---------| ---------| -------- | | Plain | 321s | 249s | 191s | * | * | | TLS | 618s | 539s | 323s | 290s | 213s | | Compress | 2622s | 2421s | 2018s | 2029s | n/a | Relates #33844

Relates #36981

Backport of elastic/elasticsearch#36981

Reduce recovery time with compress or secure transport

5ed1a54

dnhatn added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.7.0 labels Dec 24, 2018

dnhatn requested review from s1monw, bleskes and ywelsch December 24, 2018 21:55

revert unneeded changes

f8fd4e4

original-brownbear approved these changes Dec 27, 2018

View reviewed changes

original-brownbear’s feedback

6f7c627

ywelsch requested review from jasontedor and removed request for s1monw and bleskes December 27, 2018 12:38

ywelsch suggested changes Dec 27, 2018

View reviewed changes

dnhatn added 3 commits December 28, 2018 09:17

Merge branch 'master' into file-chunks

f0e86cf

Merge branch 'master' into file-chunks

aadf87c

throttle with the last written position

3826c55

dnhatn requested a review from ywelsch January 2, 2019 07:09

Merge branch 'master' into file-chunks

c297b88

more feedback

858d526

dnhatn requested a review from ywelsch January 13, 2019 22:56

Replace consumer with ActionListener

15ddec0

ywelsch approved these changes Jan 14, 2019

View reviewed changes

docs/reference/modules/indices/recovery.asciidoc Outdated Show resolved Hide resolved

docs/reference/modules/indices/recovery.asciidoc Outdated Show resolved Hide resolved

dnhatn added 2 commits January 14, 2019 08:35

Merge branch 'master' into file-chunks

620b1f0

doc

9b1ee52

dnhatn merged commit 15aa376 into elastic:master Jan 14, 2019

dnhatn deleted the file-chunks branch January 14, 2019 20:14

dnhatn added the backport pending label Jan 14, 2019

dnhatn removed the backport pending label Jan 15, 2019

dnhatn added a commit that referenced this pull request Jan 15, 2019

Adjust bwc version for max_concurrent_file_chunks

68e2d36

Relates #36981

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

dnhatn mentioned this pull request Mar 6, 2019

Send file chunks asynchronously in peer recovery #39769

Closed

dnhatn mentioned this pull request Jul 16, 2019

Make peer recovery send file chunks async #44040

Merged

kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019

Reduce recovery time with compress or secure transport.

b653c22

Backport of elastic/elasticsearch#36981

kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019

Reduce recovery time with compress or secure transport.

176ba87

Backport of elastic/elasticsearch#36981

kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019

Reduce recovery time with compress or secure transport.

a745979

Backport of elastic/elasticsearch#36981

kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019

Reduce recovery time with compress or secure transport.

041d4d4

Backport of elastic/elasticsearch#36981

kovrus added a commit to crate/crate that referenced this pull request Sep 11, 2019

Reduce recovery time with compress or secure transport.

80fdc78

Backport of elastic/elasticsearch#36981

kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019

Reduce recovery time with compress or secure transport.

b77b1c2

Backport of elastic/elasticsearch#36981

kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019

Reduce recovery time with compress or secure transport.

d82b9b4

Backport of elastic/elasticsearch#36981

kovrus mentioned this pull request Sep 12, 2019

Reduce recovery time with compress or secure transport. crate/crate#9131

Merged

5 tasks

kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019

Reduce recovery time with compress or secure transport.

3c910e9

Backport of elastic/elasticsearch#36981

kovrus added a commit to crate/crate that referenced this pull request Sep 12, 2019

Reduce recovery time with compress or secure transport.

fa58231

Backport of elastic/elasticsearch#36981

mergify bot pushed a commit to crate/crate that referenced this pull request Sep 12, 2019

Reduce recovery time with compress or secure transport.

515731a

Backport of elastic/elasticsearch#36981

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce recovery time with compress or secure transport #36981

Reduce recovery time with compress or secure transport #36981

dnhatn commented Dec 24, 2018 •

edited

Loading

elasticmachine commented Dec 24, 2018

dnhatn commented Dec 24, 2018

original-brownbear left a comment

dnhatn commented Dec 27, 2018

ywelsch left a comment

dnhatn commented Jan 2, 2019

dnhatn commented Jan 2, 2019 •

edited

Loading

ywelsch commented Jan 3, 2019

dnhatn commented Jan 3, 2019

ywelsch commented Jan 7, 2019

dnhatn commented Jan 7, 2019

ywelsch commented Jan 7, 2019

dnhatn commented Jan 13, 2019

dnhatn commented Jan 14, 2019

Reduce recovery time with compress or secure transport #36981

Reduce recovery time with compress or secure transport #36981

Conversation

dnhatn commented Dec 24, 2018 • edited Loading

elasticmachine commented Dec 24, 2018

dnhatn commented Dec 24, 2018

original-brownbear left a comment

Choose a reason for hiding this comment

dnhatn commented Dec 27, 2018

ywelsch left a comment

Choose a reason for hiding this comment

dnhatn commented Jan 2, 2019

dnhatn commented Jan 2, 2019 • edited Loading

ywelsch commented Jan 3, 2019

dnhatn commented Jan 3, 2019

ywelsch commented Jan 7, 2019

dnhatn commented Jan 7, 2019

ywelsch commented Jan 7, 2019

dnhatn commented Jan 13, 2019

dnhatn commented Jan 14, 2019

dnhatn commented Dec 24, 2018 •

edited

Loading

dnhatn commented Jan 2, 2019 •

edited

Loading