Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow asynchronous block operations to be delayed in IndexShardOperationPermits #35999

Closed
wants to merge 2 commits into from

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Nov 28, 2018

IndexShardOperationPermits provides two methods to execute an operation while holding all the index shard permits: blockOperations() and asyncBlockOperations(). When they are called concurrently, all synchronous and asynchronous operations are competing for the acquisition of all permits. The order of execution is defined by the fairness of the internal semaphore used to acquire/release permits, but since asynchronous block operations are executed on another thread on the generic thread pool (which can already be busy with other unrelated operations) the order of execution is actually undefined.

This behavior was OK until the merge of #35540, in which we allow transport replication write action to acquire all permits during the execution. This change caused tests failures like #35850 where the execution of the action is competing with a primary term bump for the acquisition of all permits (see #35862 (comment) for a more complete explanation of the issue). Depending of which takes the precedence, the test failed or succeed.

This pull request changes the IndexShardOperationPermits so that asynchronous block operations are delayed like regular operations if another block operation already requested the delay of operations. As soon as a blocking operation is terminated, IndexShardOperationPermits checks if one or more operations were delayed and should be released, and releases async block operations first - one after the other. This way IndexShardOperationPermits can guarantee that async block operations are executed in the order they were requested, while regular operation are still delayed until all sync/async block operations are terminated.

Closes #35850

@tlrx tlrx added >enhancement v7.0.0 :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v6.6.0 labels Nov 28, 2018
@tlrx tlrx requested a review from ywelsch November 28, 2018 13:43
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

final Optional<DelayedOperation> nextAsyncOperation = delayedOperations.stream()
.filter(op -> op instanceof AsyncBlockDelayedOperation)
.findFirst();
if (nextAsyncOperation.isPresent()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was happy with the simplicity of this change but there is a pitfall here: if a block op sneaks in and terminates it "releases" another async block op.

@tlrx
Copy link
Member Author

tlrx commented Nov 30, 2018

Closed in favor of #36116

@tlrx tlrx closed this Nov 30, 2018
@tlrx tlrx deleted the async-block-ops-delayed branch November 30, 2018 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants