Do not force refresh when write indexing buffer #50769

dnhatn · 2020-01-08T22:59:50Z

Today we periodically check the indexing buffer memory every 5 seconds or after we have used 1/30 of the configured memory. If the total used memory is over the threshold, then we refresh the "largest" shards. If refreshing takes longer these intervals (i.e., 5s or 1/30 buffer), then we continue to enqueue refreshes to these shards. This leads to two issues:

The refresh thread pool can be exhausted and other shards can't refresh
Execute too many refreshes for the "largest" shards

With this change, we only refresh the largest shards if they are not refreshing. Here we rely on the periodic check to trigger another refresh if needed. We can harden this by making the ongoing refresh triggers the memory check when it's completed. I opted out this option in this PR for simplicity.

See: https://discuss.elastic.co/t/write-queue-continue-to-rise/213652/

elasticmachine · 2020-01-08T22:59:53Z

Pinging @elastic/es-distributed (:Distributed/Engine)

dnhatn · 2020-01-08T23:04:01Z

server/src/test/java/org/elasticsearch/indices/IndexingMemoryControllerTests.java


-        // TODO: would be cleaner if I could pass this 1kb setting to the single node this test created....
-        IndexingMemoryController imc = new IndexingMemoryController(settings, null, null) {
+    public void testSkipRefreshIfShardIsRefreshingAlready() throws Exception {


I only added this new test. Other tests are unchanged.

ywelsch

LGTM

henningandersen

LGTM.

I left a few smaller comments. Also, it would be good to have a few extra test runs of the server module to ensure it does not introduce spurious test failures.

server/src/test/java/org/elasticsearch/indices/IndexingMemoryControllerTests.java

server/src/test/java/org/elasticsearch/indices/IndexingMemoryControllerIT.java

dnhatn · 2020-01-09T18:05:59Z

Also, it would be good to have a few extra test runs of the server module to ensure it does not introduce spurious test failures.

Yep, I am running it on my CI now.

dnhatn · 2020-01-09T23:17:07Z

@ywelsch @henningandersen Thanks for reviewing.

The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates elastic#50769

The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates #50769

Today we periodically check the indexing buffer memory every 5 seconds or after we have used 1/30 of the configured memory. If the total used memory is over the threshold, then we refresh the "largest" shards. If refreshing takes longer these intervals (i.e., 5s or 1/30 buffer), then we continue to enqueue refreshes to these shards. This leads to two issues: - The refresh thread pool can be exhausted and other shards can't refresh - Execute too many refreshes for the "largest" shards With this change, we only refresh the largest shards if they are not refreshing. Here we rely on the periodic check to trigger another refresh if needed. We can harden this by making the ongoing refresh triggers the memory check when it's completed. I opted out this option in this PR for simplicity. See: https://discuss.elastic.co/t/write-queue-continue-to-rise/213652/

The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates #50769

Today we periodically check the indexing buffer memory every 5 seconds or after we have used 1/30 of the configured memory. If the total used memory is over the threshold, then we refresh the "largest" shards. If refreshing takes longer these intervals (i.e., 5s or 1/30 buffer), then we continue to enqueue refreshes to these shards. This leads to two issues: - The refresh thread pool can be exhausted and other shards can't refresh - Execute too many refreshes for the "largest" shards With this change, we only refresh the largest shards if they are not refreshing. Here we rely on the periodic check to trigger another refresh if needed. We can harden this by making the ongoing refresh triggers the memory check when it's completed. I opted out this option in this PR for simplicity. See: https://discuss.elastic.co/t/write-queue-continue-to-rise/213652/

The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates elastic#50769

Do not force refresh when write indexing buffer

2f1f61e

dnhatn added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.6.0 labels Jan 8, 2020

dnhatn requested review from ywelsch and henningandersen January 8, 2020 22:59

dnhatn requested a review from original-brownbear January 8, 2020 23:00

Merge branch 'master' into indexing-memory

2261b3f

dnhatn commented Jan 8, 2020

View reviewed changes

dnhatn added 2 commits January 8, 2020 21:51

do not check for rejected

e683909

Merge branch 'master' into indexing-memory

f0a05f9

ywelsch approved these changes Jan 9, 2020

View reviewed changes

henningandersen approved these changes Jan 9, 2020

View reviewed changes

server/src/test/java/org/elasticsearch/indices/IndexingMemoryControllerTests.java Outdated Show resolved Hide resolved

server/src/test/java/org/elasticsearch/indices/IndexingMemoryControllerIT.java Outdated Show resolved Hide resolved

dnhatn added 2 commits January 9, 2020 08:52

ensure refresh threadpool stats exist

7475515

arrange tests

4d04e97

dnhatn requested a review from henningandersen January 9, 2020 18:06

dnhatn merged commit 0510af8 into elastic:master Jan 9, 2020

dnhatn deleted the indexing-memory branch January 9, 2020 23:18

dnhatn added the backport pending label Jan 9, 2020

henningandersen mentioned this pull request Jan 10, 2020

Fix testSkipRefreshIfShardIsRefreshingAlready #50856

Merged

dnhatn removed the backport pending label Jan 11, 2020

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not force refresh when write indexing buffer #50769

Do not force refresh when write indexing buffer #50769

dnhatn commented Jan 8, 2020

elasticmachine commented Jan 8, 2020

dnhatn Jan 8, 2020

ywelsch left a comment

henningandersen left a comment

dnhatn commented Jan 9, 2020

dnhatn commented Jan 9, 2020

Do not force refresh when write indexing buffer #50769

Do not force refresh when write indexing buffer #50769

Conversation

dnhatn commented Jan 8, 2020

elasticmachine commented Jan 8, 2020

dnhatn Jan 8, 2020

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

dnhatn commented Jan 9, 2020

dnhatn commented Jan 9, 2020