Block bulk request's index metadata retrieval until cluster state is recovered #46085

polyfractal · 2019-08-28T16:04:17Z

If a bulk request hits a node that has not recovered the CS yet, index metadata will not be available. The actual bulk request will wait/retry until the CS is recovered (due to retry mechanism in BulkOperation), but the resolution of index templates has no retry mechanism. This means default-pipeline processors will be omitted and this can result in incorrect documents being indexed.

This commit adds a blocking/retry mechanism so that index metadata resolution waits until the CS has been recovered.

Questions/concerns

I'm assuming we want to retry if the CS isn't ready... is that a correct assumption?
This follows a similar pattern to BulkOperation/executeBulk(), since it seems the TransportBulkAction is reused (at least in the integ tests) so we need to store state somewhere other than on the action itself.

It does makes BulkOperation somewhat redundant, since that also does cluster state checking. But I was hesitant to change more than was necessary. If desired I can try to meld the two together so there is less redundancy.
Not familiar with this code, so any/all suggestions welcome :)

If a bulk request hits a node that has not recovered the CS yet, index templates (and their ingest pipelines) are not loaded yet. The actual bulk request will wait/retry until the state is recovered, but the resolution of index templates will yield nothing, which means `default-pipeline` processors will be omitted and this can result in incorrect documents being indexed. This commit adds a blocking/retry mechanism so that index metadata resolution waits until the CS has been recovered.

elasticmachine · 2019-08-28T16:04:19Z

Pinging @elastic/es-distributed

polyfractal · 2019-08-28T16:04:40Z

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

+            if (startTime == -1) {
+               startTime = relativeTime();
+            }
+            ClusterBlockException blockException = clusterService.state().blocks().globalBlockedException(ClusterBlockLevel.METADATA_READ);


Is METADATA_READ the correct block to look for?

I think that ClusterBlockLevel.WRITE is more appropriate, since writing data is what we're trying to do.

DaveCTurner

Approach looks good. I left a few initial comments on my way out the door.

DaveCTurner · 2019-08-28T17:03:42Z

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

+            if (startTime == -1) {
+               startTime = relativeTime();
+            }
+            ClusterBlockException blockException = clusterService.state().blocks().globalBlockedException(ClusterBlockLevel.METADATA_READ);


I think that ClusterBlockLevel.WRITE is more appropriate, since writing data is what we're trying to do.

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

DaveCTurner · 2019-08-28T17:11:48Z

modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java

+
+        // so the cluster state can be recovered
+        internalCluster()
+            .startNode(Settings.builder().put(GatewayService.RECOVER_AFTER_NODES_SETTING.getKey(), "1"));


I think this can just be .startNode() - the other node is the master, so this node doesn't do any state recovery.

So I thought so too, but if I didn't reset the recover_after, the test would timeout because it never recovered. I tried to dig into why that was happening but couldn't figure it out... resetting recover_after seemed to be the only fix I could find :/

I tried applying the following patch on top of 676e941 and ran >100 iterations of the test with no failures:

diff --git a/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java b/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java index f406ccc965c..7ff9832e81a 100644 --- a/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java +++ b/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java @@ -249,9 +249,7 @@ public class IngestRestartIT extends ESIntegTestCase { } }); - // so the cluster state can be recovered - internalCluster() - .startNode(Settings.builder().put(GatewayService.RECOVER_AFTER_NODES_SETTING.getKey(), "1")); + internalCluster().startNode(); ensureYellow("index"); assertTrue(latch.await(5, TimeUnit.SECONDS));

Maybe it was a different change. If this still fails for you then I'd like to investigate further, maybe on Zoom?

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

polyfractal · 2019-08-29T17:31:31Z

Thanks @DaveCTurner! Addressed review comments, the predicate definitely made it cleaner. I think everything is constructed correctly, but the global block methods/semantics confuse me a little so extra eyeballs there might be warranted :)

I also moved the start time into the ctor because it was simpler, and should probably be run right after instantiation anyway.

Edit: seems it's breaking some tests. Looking...

DaveCTurner

Looks like you're bitten by an excessively fake mock :(

I left a few more small points.

DaveCTurner · 2019-08-29T19:36:18Z

modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java

+
+        // so the cluster state can be recovered
+        internalCluster()
+            .startNode(Settings.builder().put(GatewayService.RECOVER_AFTER_NODES_SETTING.getKey(), "1"));


I tried applying the following patch on top of 676e941 and ran >100 iterations of the test with no failures:

diff --git a/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java b/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java index f406ccc965c..7ff9832e81a 100644 --- a/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java +++ b/modules/ingest-common/src/test/java/org/elasticsearch/ingest/common/IngestRestartIT.java @@ -249,9 +249,7 @@ public class IngestRestartIT extends ESIntegTestCase { } }); - // so the cluster state can be recovered - internalCluster() - .startNode(Settings.builder().put(GatewayService.RECOVER_AFTER_NODES_SETTING.getKey(), "1")); + internalCluster().startNode(); ensureYellow("index"); assertTrue(latch.await(5, TimeUnit.SECONDS));

Maybe it was a different change. If this still fails for you then I'd like to investigate further, maybe on Zoom?

DaveCTurner · 2019-08-29T19:43:25Z

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

+                        listener.onFailure(blockException);
+                    }
+                }, newState -> newState.blocks().global(ClusterBlockLevel.WRITE).isEmpty());
+                return;


nit: I think if ... else is a bit clearer than if ... return :)

DaveCTurner · 2019-08-29T19:44:23Z

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

+                recoveredObserver.waitForNextChange(new ClusterStateObserver.Listener() {
+                    @Override
+                    public void onNewClusterState(ClusterState newState) {
+                        // predicate passed, begin preparing for the bulk


Rather than a comment, we could assert newState.blocks().global(ClusterBlockLevel.WRITE).isEmpty().

DaveCTurner · 2019-08-29T20:01:47Z

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

+     * A runnable that will ensure the cluster state has been recovered enough to
+     * read index metadata and templates/pipelines.  Will retry up to the bulk's timeout
+     */
+    private final class BulkExecutor extends ActionRunnable<BulkResponse> {


Do we need this class now that we aren't doing any retries ourself? I get that it wraps up the three parameters and the start time but 4 parameters to another method isn't so bad.

DaveCTurner · 2019-08-29T20:06:57Z

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

+
+        @Override
+        protected void doRun() {
+            ClusterState currentState = clusterService.state();


Technically I think we should get the state from recoveredObserver.setAndGetObservedState() to keep the state we observe in sync with the state that the ClusterStateObserver is observing. Alternatively, we could avoid constructing the ClusterStateObserver until we discover we need it, and give it the state we got from the cluster service - there's no need for it to be tracked in a field.

DaveCTurner · 2019-08-29T20:21:50Z

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

+                    @Override
+                    public void onNewClusterState(ClusterState newState) {
+                        // predicate passed, begin preparing for the bulk
+                        prepForBulk(newState);


This is now sometimes on the cluster applier thread, which is probably not what we want. We should add this to the top of prepForBulk, and then make sure to call prepForBulk on the WRITE threadpool here.

assert ClusterApplierService.assertNotClusterStateUpdateThread("TransportBulkAction#prepForBulk");

polyfractal · 2019-11-22T17:29:10Z

Urgh, this PR has languished. I just don't have time to dig in and solve these issues I'm afraid, given my relative lack of experience in this part of the code. When I started to fix the tests at the time, it became clear that the code needed a bit more adjusting than just fixing fragile mocks (e.g. to fix some of the mocks would have required pulling in half the world or even more excessive mocking).

I'm going to close this for now and open a ticket about the issue so it isn't lost. I'll revisit this PR if I find some time but for now it's probably best to admit defeat :(

This PR makes bulk index action to wait for cluster to recover before resolving index templates so that ingest pipelines are correctly processed when the cluster is recovering. Resolves: elastic#49499 Supercedes: elastic#46085

This PR makes bulk index action wait for cluster to recover before resolving index templates so that ingest pipelines are correctly processed when the cluster is recovering. Resolves: #49499 Supercedes: #46085

$polyfractal$

$@polyfractal$ polyfractal added >bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Aug 28, 2019

$@polyfractal$ polyfractal requested a review from DaveCTurner August 28, 2019 16:04

$polyfractal$

polyfractal commented Aug 28, 2019

View reviewed changes

DaveCTurner reviewed Aug 28, 2019

View reviewed changes

$@polyfractal$

Address review comments

676e941

DaveCTurner reviewed Aug 29, 2019

View reviewed changes

$@polyfractal$ polyfractal closed this Nov 22, 2019

$@polyfractal$ polyfractal mentioned this pull request Nov 22, 2019

Index template resolution does not wait for the Cluster State to be recovered #49499

Closed

ywangd mentioned this pull request Sep 22, 2023

Wait for cluster to recover before resolving index template #99797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block bulk request's index metadata retrieval until cluster state is recovered #46085

Block bulk request's index metadata retrieval until cluster state is recovered #46085

$@polyfractal$ polyfractal commented Aug 28, 2019

elasticmachine commented Aug 28, 2019

$@polyfractal$ polyfractal Aug 28, 2019

DaveCTurner Aug 28, 2019

DaveCTurner left a comment

DaveCTurner Aug 28, 2019

DaveCTurner Aug 28, 2019

$@polyfractal$ polyfractal Aug 29, 2019

DaveCTurner Aug 29, 2019

polyfractal commented Aug 29, 2019 •

edited

Loading

DaveCTurner left a comment

DaveCTurner Aug 29, 2019

DaveCTurner Aug 29, 2019

DaveCTurner Aug 29, 2019

DaveCTurner Aug 29, 2019

DaveCTurner Aug 29, 2019

DaveCTurner Aug 29, 2019

polyfractal commented Nov 22, 2019

Block bulk request's index metadata retrieval until cluster state is recovered #46085

Block bulk request's index metadata retrieval until cluster state is recovered #46085

Conversation

polyfractal commented Aug 28, 2019

Questions/concerns

elasticmachine commented Aug 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polyfractal commented Aug 29, 2019 • edited Loading

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polyfractal commented Nov 22, 2019

$@polyfractal$ polyfractal commented Aug 28, 2019

polyfractal commented Aug 29, 2019 •

edited

Loading