Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TransportShardBulkAction to better support retries #31821

Merged
merged 38 commits into from
Aug 10, 2018

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jul 5, 2018

Processing bulk request goes item by item. Sometimes during processing, we need to stop execution and wait for a new mapping update to be processed by the node. This is currently achieved by throwing a RetryOnPrimaryException, which is caught higher up. When the exception is caught, we wait for the next cluster state to arrive and process the request again. Sadly this is a problem because all operations that were already done until the mapping change was required are applied again and get new sequence numbers. This in turn means that the previously issued sequence numbers are never replicated to the replicas. That causes the local checkpoint of those shards to be stuck and with it all the seq# based infrastructure.

This PR refactors how we deal with retries with the goal of removing RetryOnPrimaryException and RetryOnReplicaException (not done yet). It achieves so by introducing a class PrimaryExecutionContext that is used the capture the execution state and allows continuing from where the execution stopped. The class also formalizes the steps each item has to go through:

  1. A translation phase for updates
  2. Execution phase (always index/delete)
  3. Two kinds of retries
  4. A finalization phase which allows updates to the index/delete result to an update result.

This PR is still rough around the edges. There are no proper unit tests and the IT tests roughly pass. It is in a good enough shape to get feedback from people to see if this is where we want things to go.

If we like it, the same approach can be applied to the replica execution.

@bleskes bleskes added >enhancement WIP :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Jul 5, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes bleskes changed the title WIP: Refactor TransportShardBulkAction to be support retries WIP: Refactor TransportShardBulkAction to better support retries Jul 5, 2018
@bleskes
Copy link
Contributor Author

bleskes commented Aug 2, 2018

@ywelsch thanks for taking a look. This one is not easy to review. I addressed all your comments. Can you please take another look?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some smaller comments but LGTM o.w.. I could not find any unit test where check the situation where the waitForMappingUpdate fails, maybe good to look into that.

currentItemState = ItemProcessingState.EXECUTED;
final DocWriteRequest docWriteRequest = getCurrentItem().request();
markAsCompleted(new BulkItemResponse(getCurrentItem().id(), docWriteRequest.opType(),
// Make sure to use request.index() here, if you use docWriteRequest.index() it will use the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused because request.index() does not exist here. There is getCurrentItem().index() though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. Updated the comment

* received from the user (specifically, an update request is translated to an indexing or delete request).
*/
public void setRequestToExecute(DocWriteRequest writeRequest) {
assert currentItemState != ItemProcessingState.TRANSLATED &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of ruling out the states which don't allow this to be called, I think it's easier to understand if we put the states where we allow this to be called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be assert currentItemState == INITIAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. This started before I had the explicit reset and back to initial. It grew out of hand. I took your suggestion and added some :)


/** completes the operation without doing anything on the primary */
public void markOperationAsNoOp(DocWriteResponse response) {
assert currentItemState != ItemProcessingState.EXECUTED &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only called in INITIAL state, so let's assert currentItemState == INITIAL

/** the current operation has been executed on the primary with the specified result */
public void markOperationAsExecuted(Engine.Result result) {
assert currentItemState == ItemProcessingState.TRANSLATED: currentItemState;
assert executionResult == null : executionResult;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can add this (and similar ones) as invariant to the class (similar as was done for ReplicationTracker) we then call assert invariant() on each of these methods.
For example, one invariant might state that if we are in TRANSLATED state, the executionResult is null.

return new BulkShardResponse(request.shardId(), responses);
}

private static boolean isAborted(BulkItemResponse response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this method up next to findNextNonAborted

for (int i = 0; i < items.length; i++) {
responses[i] = items[i].getPrimaryResponse();
}
return new BulkShardResponse(request.shardId(), responses);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole method can be abbreviated to

return new BulkShardResponse(request.shardId(),
    Arrays.stream(request.items()).map(BulkItemRequest::getPrimaryResponse).toArray(BulkItemResponse[]::new));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

.primaryTerm(0, 1).build();
}

private ClusterService clusterService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to remove this in the commit I added. This is not needed anymore by tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@bleskes
Copy link
Contributor Author

bleskes commented Aug 9, 2018

I could not find any unit test where check the situation where the waitForMappingUpdate fails

See testExecuteBulkIndexRequestWithErrorWhileUpdatingMapping and other usages of ThrowingMappingUpdatePerformer.

@ywelsch Can you please take another look?

@ywelsch
Copy link
Contributor

ywelsch commented Aug 9, 2018

See testExecuteBulkIndexRequestWithErrorWhileUpdatingMapping and other usages of ThrowingMappingUpdatePerformer.

those are tests where the mapping updates fails. I meant the situation where the subsequent waitForMappingUpdate fails (i.e. https://github.com/elastic/elasticsearch/pull/31821/files#diff-720a796f6beda1dfa6af60b45ffe1010R225 ).

@bleskes
Copy link
Contributor Author

bleskes commented Aug 9, 2018

those are tests where the mapping updates fails. I meant the situation where the subsequent waitForMappingUpdate fails

I see. Let me work something up.

@bleskes bleskes requested a review from ywelsch August 9, 2018 15:43
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for this PR and all the assertions!

/** returns a translog location that is needed to be synced in order to persist all operations executed so far */
public Translog.Location getLocationToSync() {
assert hasMoreOperationsToExecute() == false;
assert assertInvariants(ItemProcessingState.INITIAL, ItemProcessingState.COMPLETED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected for this to only be INITIAL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a bulk with all aboreted items you can overflow in advance and have initial here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was that we should always end up in INITIAL, because we always move to INITIAL after completing an item

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRR. Misread your comment. That is correct as far as I can tell. I pushed 5eeb932

@bleskes bleskes merged commit f58ed21 into elastic:master Aug 10, 2018
@bleskes bleskes deleted the bulk_retry branch August 10, 2018 08:15
@bleskes
Copy link
Contributor Author

bleskes commented Aug 10, 2018

Thanks @ywelsch for the review and the good suggestions.

bleskes added a commit that referenced this pull request Aug 10, 2018
Processing bulk request goes item by item. Sometimes during processing, we need to stop execution and wait for a new mapping update to be processed by the node. This is currently achieved by throwing a `RetryOnPrimaryException`, which is caught higher up. When the exception is caught, we wait for the next cluster state to arrive and process the request again. Sadly this is a problem because all operations that were already done until the mapping change was required are applied again and get new sequence numbers. This in turn means that the previously issued sequence numbers are never replicated to the replicas. That causes the local checkpoint of those shards to be stuck and with it all the seq# based infrastructure.

This commit refactors how we deal with retries with the goal of removing  `RetryOnPrimaryException` and `RetryOnReplicaException` (not done yet). It achieves so by introducing a class `BulkPrimaryExecutionContext` that is used the capture the execution state and allows continuing from where the execution stopped. The class also formalizes the steps each item has to go through:
1) A translation phase for updates
2) Execution phase (always index/delete)
3) Waiting for a mapping update to come in, if needed
4) Requires a retry (for updates and cases where the mapping are still not available after the put mapping call returns)
5) A finalization phase which allows updates to the index/delete result to an update result.
@ywelsch ywelsch mentioned this pull request Aug 13, 2018
ywelsch added a commit that referenced this pull request Aug 14, 2018
#31821 introduced an unreleased bug where NOOP updates were incorrectly mutating the bulk
shard request, inserting null item to be replicated, which would result in NullPointerExceptions when
serializing the request to be shipped to the replicas.

Closes #32808
ywelsch added a commit that referenced this pull request Aug 14, 2018
#31821 introduced an unreleased bug where NOOP updates were incorrectly mutating the bulk
shard request, inserting null item to be replicated, which would result in NullPointerExceptions when
serializing the request to be shipped to the replicas.

Closes #32808
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants