Force execution of finish shard bulk request #51957

Tim-Brooks · 2020-02-05T17:52:04Z

Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.

Fixes #51904.

Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes elastic#51904.

elasticmachine · 2020-02-05T17:52:07Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

Tim-Brooks · 2020-02-05T17:52:15Z

How far should we back port this?

jasontedor

I left a comment for consideration.

jasontedor · 2020-02-05T18:36:16Z

server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

+
+                    @Override
+                    protected void doRun() {
+                        finishRequest();


I was thinking that rather than scheduling only finishRequest on the write thread pool, we should execute the entirety of onRejection on the write thread pool (and still forcing execution, since rejecting execution here would be very bad):

elasticsearch/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

Line 160 in 3ad8aa6

ActionListener.wrap(v -> executor.execute(this), this::onRejection)) == false) {

The reason I think this is because we're still doing work in onRejection that is linear in the number of documents in the bulk request. It seems we'd want to get that off of the networking/cluster state applier thread too given that we're going to fork anyway.

I wonder if we should instead let onRejection force-push the onRejection handling onto the queue of the requested executor. The current "direct" handling kind of prioritizes the onRejection handling over everything else in the queue, which I think there is really no good reason for.

Looking at various onRejection handlers, some call listener.onFailure and verifying that none of those do bad things is tricky.

Also notice that onAfter is also called in the caller thread when requests are rejected, this poses similar issues (not that I found a smoking gun there).

Finally, notice that AbstractRunnable.onRejection by default calls onFailure.

I doubt that we careful consider that onAfter and onFailure might run in the current thread when using AbstractRunnable and executing the onRejection handling on the target thread-pool would fix this, making it easier to reason about.

That said, I think this PR is good and I am not objecting to it going in. Following my suggestion is likely to surface a few additional things to resolve.

I made @jasontedor change.

I think maybe Henning's comment about maybe onRejection should be executed on the thread pool anyway is beyond the scope of this PR? Or at least a larger discussion.

@tbrooks8, yes, that is fine, I just found it most natural to put it here. I will open a PR with that change so we can discuss based on that PR instead.

jasontedor

I left a comment, but this LGTM.

jasontedor · 2020-02-05T20:54:43Z

server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java

-                        context, null);
-                }
-                finishRequest();
+                // Force the execution to finish the request


It's probably worth a comment here why we fork to the executor (since it's not obvious from the code).

jasontedor · 2020-02-05T22:15:52Z

How far should we back port this?

I think to the 7.6, 7.x, and master branches.

Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes elastic#51904.

Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes #51904.

Tim-Brooks added >bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.7.0 labels Feb 5, 2020

Tim-Brooks requested a review from jasontedor February 5, 2020 17:52

jasontedor reviewed Feb 5, 2020

View reviewed changes

Dispatch everything

4b0b0eb

Tim-Brooks requested a review from jasontedor February 5, 2020 19:51

jasontedor approved these changes Feb 5, 2020

View reviewed changes

Tim-Brooks added 3 commits February 11, 2020 09:54

Comment

ae403e7

Merge remote-tracking branch 'upstream/master' into force_shard_finish

07135b2

Merge remote-tracking branch 'upstream/master' into force_shard_finish

722236d

Tim-Brooks merged commit 0dd1ed8 into elastic:master Feb 15, 2020

Tim-Brooks added the backport pending label Feb 15, 2020

Tim-Brooks removed the backport pending label Feb 25, 2020

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

Tim-Brooks deleted the force_shard_finish branch April 30, 2020 18:27

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force execution of finish shard bulk request #51957

Force execution of finish shard bulk request #51957

Tim-Brooks commented Feb 5, 2020

elasticmachine commented Feb 5, 2020

Tim-Brooks commented Feb 5, 2020

jasontedor left a comment

jasontedor Feb 5, 2020

henningandersen Feb 5, 2020

Tim-Brooks Feb 5, 2020

henningandersen Feb 5, 2020

jasontedor left a comment

jasontedor Feb 5, 2020

jasontedor commented Feb 5, 2020

Force execution of finish shard bulk request #51957

Force execution of finish shard bulk request #51957

Conversation

Tim-Brooks commented Feb 5, 2020

elasticmachine commented Feb 5, 2020

Tim-Brooks commented Feb 5, 2020

jasontedor left a comment

Choose a reason for hiding this comment

jasontedor Feb 5, 2020

Choose a reason for hiding this comment

henningandersen Feb 5, 2020

Choose a reason for hiding this comment

Tim-Brooks Feb 5, 2020

Choose a reason for hiding this comment

henningandersen Feb 5, 2020

Choose a reason for hiding this comment

jasontedor left a comment

Choose a reason for hiding this comment

jasontedor Feb 5, 2020

Choose a reason for hiding this comment

jasontedor commented Feb 5, 2020