Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove some abstractions from TransportReplicationAction #40706

Conversation

DaveCTurner
Copy link
Contributor

TransportReplicationAction is a rather complex beast, and some of its
concrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.

This commit is a step towards separating the TransportReplicationAction into
these two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.

@DaveCTurner DaveCTurner added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >refactoring v8.0.0 v7.2.0 labels Apr 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor Author

Note to reviewers: I have broken this PR up into a sequence of self-contained commits. It probably makes sense to review these commits one-by-one.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaveCTurner , this looks like a solid improvement.
I have a few specific comments that I hope you will address.

}
setPhase(task, "finished");
try {
channel.sendResponse(replicaResponse);
listener.onResponse(replicaResponse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does change the failure handling slightly in that if sendResponse fails, ChannelActionListener.onFailure is now called instead of AsyncReplicaAction.onFailure. This in turn means that setPhase(task, "finished"); is not called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already called setPhase(task, "finished"); once, just above here, so I think this is ok. However, listener.onResponse cannot fail, it tries to send errors back and ultimately just logs them, so I removed this try block.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I added one additional minor comment. Thanks @DaveCTurner .

() -> new ConcreteReplicaRequest<>(replicaRequest),
executor, true, true,
new ReplicaOperationTransportHandler());
protected boolean forcePrimaryActionExecution() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would be a constructor parameter instead to avoid the dreaded relying on subclass uninitialized state problem.

However, the old registerRequestHandlers had the same issue, so this is still a good improvement. I would like to add a javadoc comment to it that overriding methods should only return either true and not rely on state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh darn it I didn't mean to commit that, it doesn't work. At least, I committed it, then reverted it (it doesn't work) but then reverted the reversion to investigate why. I have now reverted the reverted reversion. 🤣

@DaveCTurner DaveCTurner merged commit 9ad9f16 into elastic:master Apr 3, 2019
@DaveCTurner DaveCTurner deleted the 2019-04-01-refactor-transport-replication-action branch April 3, 2019 07:33

transportService.registerRequestHandler(actionName, request, ThreadPool.Names.SAME, this::handleOperationRequest);
transportService.registerRequestHandler(transportPrimaryAction, () -> new ConcreteShardRequest<>(request), executor, true,
forcePrimaryActionExecution(), this::handlePrimaryRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of forcePrimaryActionExecution() is passed to the wrong parameter (canTripCircuitBreaker). Swap this and the previous true and I think it will work.

Copy link
Contributor Author

@DaveCTurner DaveCTurner Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, well spotted. I opened #40762.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 3, 2019
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.

Relates elastic#40706
DaveCTurner added a commit that referenced this pull request Apr 3, 2019
`TransportReplicationAction` is a rather complex beast, and some of its
concrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.

This commit is a step towards separating the `TransportReplicationAction` into
these two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.
DaveCTurner added a commit that referenced this pull request Apr 3, 2019
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.

Relates #40706
DaveCTurner added a commit that referenced this pull request Apr 3, 2019
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.

Relates #40706
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 3, 2019
…leniency

* elastic/master:
  SQL: Fix deserialisation issue of TimeProcessor (elastic#40776)
  Improve GCS docs for using keystore (elastic#40605)
  Add Restore Operation to SnapshotResiliencyTests (elastic#40634)
  Small refactorings to analysis components (elastic#40745)
  SQL: Fix display size for DATE/DATETIME (elastic#40669)
  add HLRC protocol tests for transform state and stats (elastic#40766)
  Inline TransportReplAction#registerRequestHandlers (elastic#40762)
  remove experimental label from search_as_you_type documentation (elastic#40744)
  Remove some abstractions from `TransportReplicationAction` (elastic#40706)
  Upgrade to latest build scan plugin (elastic#40702)
  Use default memory lock setting in testing (elastic#40730)
  Add Bulk Delete Api to BlobStore (elastic#40322)
  Remove yaml skips older than 7.0 (elastic#40183)
  Docs: Move id in the java-api (elastic#40748)
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 4, 2019
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see elastic#40706).

This commit removes these customisation points and injects the test-only
behaviour using other techniques.
DaveCTurner added a commit that referenced this pull request Apr 5, 2019
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see #40706).

This commit removes these customisation points and injects the test-only
behaviour using other techniques.
DaveCTurner added a commit that referenced this pull request Apr 5, 2019
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see #40706).

This commit removes these customisation points and injects the test-only
behaviour using other techniques.
DaveCTurner added a commit that referenced this pull request Apr 11, 2019
A small refactoring that removes the primaryTerm field from ReplicasProxy and
instead passes it directly in to the methods that need it. Relates #40706.
DaveCTurner added a commit that referenced this pull request Apr 11, 2019
A small refactoring that removes the primaryTerm field from ReplicasProxy and
instead passes it directly in to the methods that need it. Relates #40706.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 15, 2019
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation`
exists so it can be overridden in tests. This commit re-works these tests to
use a real `ReplicationOperation` and inlines the now-unnecessary method.

Relates elastic#40706.
DaveCTurner added a commit that referenced this pull request Apr 16, 2019
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation`
exists so it can be overridden in tests. This commit re-works these tests to
use a real `ReplicationOperation` and inlines the now-unnecessary method.

Relates #40706.
DaveCTurner added a commit that referenced this pull request Apr 16, 2019
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation`
exists so it can be overridden in tests. This commit re-works these tests to
use a real `ReplicationOperation` and inlines the now-unnecessary method.

Relates #40706.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
…0706)

`TransportReplicationAction` is a rather complex beast, and some of its
concrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.

This commit is a step towards separating the `TransportReplicationAction` into
these two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.

Relates elastic#40706
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see elastic#40706).

This commit removes these customisation points and injects the test-only
behaviour using other techniques.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
A small refactoring that removes the primaryTerm field from ReplicasProxy and
instead passes it directly in to the methods that need it. Relates elastic#40706.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation`
exists so it can be overridden in tests. This commit re-works these tests to
use a real `ReplicationOperation` and inlines the now-unnecessary method.

Relates elastic#40706.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >non-issue >refactoring v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants