-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove some abstractions from TransportReplicationAction
#40706
Remove some abstractions from TransportReplicationAction
#40706
Conversation
Today this requires overriding and reimplementing registerRequestHandlers() but it's clearer to highlight the difference like this
Pinging @elastic/es-distributed |
Note to reviewers: I have broken this PR up into a sequence of self-contained commits. It probably makes sense to review these commits one-by-one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DaveCTurner , this looks like a solid improvement.
I have a few specific comments that I hope you will address.
...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java
Outdated
Show resolved
Hide resolved
} | ||
setPhase(task, "finished"); | ||
try { | ||
channel.sendResponse(replicaResponse); | ||
listener.onResponse(replicaResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does change the failure handling slightly in that if sendResponse
fails, ChannelActionListener.onFailure
is now called instead of AsyncReplicaAction.onFailure
. This in turn means that setPhase(task, "finished");
is not called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already called setPhase(task, "finished");
once, just above here, so I think this is ok. However, listener.onResponse
cannot fail, it tries to send errors back and ultimately just logs them, so I removed this try
block.
...r/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I added one additional minor comment. Thanks @DaveCTurner .
() -> new ConcreteReplicaRequest<>(replicaRequest), | ||
executor, true, true, | ||
new ReplicaOperationTransportHandler()); | ||
protected boolean forcePrimaryActionExecution() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would be a constructor parameter instead to avoid the dreaded relying on subclass uninitialized state problem.
However, the old registerRequestHandlers had the same issue, so this is still a good improvement. I would like to add a javadoc comment to it that overriding methods should only return either true and not rely on state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh darn it I didn't mean to commit that, it doesn't work. At least, I committed it, then reverted it (it doesn't work) but then reverted the reversion to investigate why. I have now reverted the reverted reversion. 🤣
…ction is forced""" This reverts commit a5bfc84.
|
||
transportService.registerRequestHandler(actionName, request, ThreadPool.Names.SAME, this::handleOperationRequest); | ||
transportService.registerRequestHandler(transportPrimaryAction, () -> new ConcreteShardRequest<>(request), executor, true, | ||
forcePrimaryActionExecution(), this::handlePrimaryRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of forcePrimaryActionExecution()
is passed to the wrong parameter (canTripCircuitBreaker
). Swap this and the previous true
and I think it will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, well spotted. I opened #40762.
It is important that resync actions are not rejected on the primary even if its `write` threadpool is overloaded. Today we do this by exposing `registerRequestHandlers` to subclasses and overriding it in `TransportResyncReplicationAction`. This isn't ideal because it obscures the difference between this action and other replication actions, and also might allow subclasses to try and use some state before they are properly initialised. This change replaces this override with a constructor parameter to solve these issues. Relates elastic#40706
`TransportReplicationAction` is a rather complex beast, and some of its concrete implementations do not need all of its features. More specifically, it (a) chases a primary around the cluster until it manages to pin it down and then (b) executes an action on that primary and all its replicas. There are some actions that are coordinated by the primary itself, meaning that there is no need for the chase-the-primary phases, and in the case of peer recovery retention leases and primary/replica resync it is important to bypass these first phases. This commit is a step towards separating the `TransportReplicationAction` into these two parts. It is a mostly mechanical sequence of steps to remove some abstractions that are no longer in use.
It is important that resync actions are not rejected on the primary even if its `write` threadpool is overloaded. Today we do this by exposing `registerRequestHandlers` to subclasses and overriding it in `TransportResyncReplicationAction`. This isn't ideal because it obscures the difference between this action and other replication actions, and also might allow subclasses to try and use some state before they are properly initialised. This change replaces this override with a constructor parameter to solve these issues. Relates #40706
It is important that resync actions are not rejected on the primary even if its `write` threadpool is overloaded. Today we do this by exposing `registerRequestHandlers` to subclasses and overriding it in `TransportResyncReplicationAction`. This isn't ideal because it obscures the difference between this action and other replication actions, and also might allow subclasses to try and use some state before they are properly initialised. This change replaces this override with a constructor parameter to solve these issues. Relates #40706
…leniency * elastic/master: SQL: Fix deserialisation issue of TimeProcessor (elastic#40776) Improve GCS docs for using keystore (elastic#40605) Add Restore Operation to SnapshotResiliencyTests (elastic#40634) Small refactorings to analysis components (elastic#40745) SQL: Fix display size for DATE/DATETIME (elastic#40669) add HLRC protocol tests for transform state and stats (elastic#40766) Inline TransportReplAction#registerRequestHandlers (elastic#40762) remove experimental label from search_as_you_type documentation (elastic#40744) Remove some abstractions from `TransportReplicationAction` (elastic#40706) Upgrade to latest build scan plugin (elastic#40702) Use default memory lock setting in testing (elastic#40730) Add Bulk Delete Api to BlobStore (elastic#40322) Remove yaml skips older than 7.0 (elastic#40183) Docs: Move id in the java-api (elastic#40748)
The `getIndexShard()` and `sendReplicaRequest()` methods in TransportReplicationAction are effectively only used to customise some behaviour in tests. However there are other ways to do this that do not cause such an obstacle to separating the TransportReplicationAction into its two halves (see elastic#40706). This commit removes these customisation points and injects the test-only behaviour using other techniques.
The `getIndexShard()` and `sendReplicaRequest()` methods in TransportReplicationAction are effectively only used to customise some behaviour in tests. However there are other ways to do this that do not cause such an obstacle to separating the TransportReplicationAction into its two halves (see #40706). This commit removes these customisation points and injects the test-only behaviour using other techniques.
The `getIndexShard()` and `sendReplicaRequest()` methods in TransportReplicationAction are effectively only used to customise some behaviour in tests. However there are other ways to do this that do not cause such an obstacle to separating the TransportReplicationAction into its two halves (see #40706). This commit removes these customisation points and injects the test-only behaviour using other techniques.
A small refactoring that removes the primaryTerm field from ReplicasProxy and instead passes it directly in to the methods that need it. Relates #40706.
A small refactoring that removes the primaryTerm field from ReplicasProxy and instead passes it directly in to the methods that need it. Relates #40706.
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation` exists so it can be overridden in tests. This commit re-works these tests to use a real `ReplicationOperation` and inlines the now-unnecessary method. Relates elastic#40706.
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation` exists so it can be overridden in tests. This commit re-works these tests to use a real `ReplicationOperation` and inlines the now-unnecessary method. Relates #40706.
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation` exists so it can be overridden in tests. This commit re-works these tests to use a real `ReplicationOperation` and inlines the now-unnecessary method. Relates #40706.
…0706) `TransportReplicationAction` is a rather complex beast, and some of its concrete implementations do not need all of its features. More specifically, it (a) chases a primary around the cluster until it manages to pin it down and then (b) executes an action on that primary and all its replicas. There are some actions that are coordinated by the primary itself, meaning that there is no need for the chase-the-primary phases, and in the case of peer recovery retention leases and primary/replica resync it is important to bypass these first phases. This commit is a step towards separating the `TransportReplicationAction` into these two parts. It is a mostly mechanical sequence of steps to remove some abstractions that are no longer in use.
It is important that resync actions are not rejected on the primary even if its `write` threadpool is overloaded. Today we do this by exposing `registerRequestHandlers` to subclasses and overriding it in `TransportResyncReplicationAction`. This isn't ideal because it obscures the difference between this action and other replication actions, and also might allow subclasses to try and use some state before they are properly initialised. This change replaces this override with a constructor parameter to solve these issues. Relates elastic#40706
The `getIndexShard()` and `sendReplicaRequest()` methods in TransportReplicationAction are effectively only used to customise some behaviour in tests. However there are other ways to do this that do not cause such an obstacle to separating the TransportReplicationAction into its two halves (see elastic#40706). This commit removes these customisation points and injects the test-only behaviour using other techniques.
A small refactoring that removes the primaryTerm field from ReplicasProxy and instead passes it directly in to the methods that need it. Relates elastic#40706.
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation` exists so it can be overridden in tests. This commit re-works these tests to use a real `ReplicationOperation` and inlines the now-unnecessary method. Relates elastic#40706.
TransportReplicationAction
is a rather complex beast, and some of itsconcrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.
This commit is a step towards separating the
TransportReplicationAction
intothese two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.