-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TransportLocalClusterStateAction wait for cluster to unblock #117230
base: main
Are you sure you want to change the base?
Conversation
This will make `TransportLocalClusterStateAction` wait for a new state that is not blocked. This means we need a timeout (again). For consistency's sake, we're reusing the REST param `master_timeout` for this timeout as well. The only class that was using `TransportLocalClusterStateAction` was `TransportGetAliasesAction`, so its request needed to accept a timeout again as well.
e6c4cfd
to
b4f4cf5
Compare
@@ -51,6 +51,8 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=cat-v] | |||
|
|||
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards] | |||
|
|||
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=master-timeout] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if these docs and rest-api-spec changes are out of scope or not for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok to me, tho note that we have never before supported ?master_timeout
on these APIs, this isn't reinstating a parameter that previously was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I realized that, but I figured it made sense to add them if they're parameters we're accepting.
@@ -146,6 +146,7 @@ | |||
exports org.elasticsearch.action.support.master; | |||
exports org.elasticsearch.action.support.master.info; | |||
exports org.elasticsearch.action.support.nodes; | |||
exports org.elasticsearch.action.support.local; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved TransportLocalClusterStateAction
to its own local
package - together with the new LocalClusterStateRequest
.
public GetAliasesRequest(String... aliases) { | ||
this.aliases = aliases; | ||
this.originalAliases = aliases; | ||
this(MasterNodeRequest.TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, aliases); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It felt out of scope for this PR to refactor all these constructor callers to provide an explicit master timeout - that will have to be done in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather we did those changes first to set things up for this PR to go through without having to add back this trappy parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we do those changes first if there is currently - on main
- no timeout to set in these constructors? Are you suggesting adding a constructor on main
that takes a timeout but doesn't actually do anything with it and swapping that constructor with the ones on this branch afterward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, hm, good point. Ok, let's do it this way round.
server/src/main/java/org/elasticsearch/action/support/local/LocalClusterStateRequest.java
Outdated
Show resolved
Hide resolved
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
Hi @nielsbauman, I've created a changelog YAML for you. |
@elasticmachine update branch |
server/src/main/java/org/elasticsearch/action/support/local/LocalClusterStateRequest.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/elasticsearch/action/support/local/TransportLocalClusterStateAction.java
Outdated
Show resolved
Hide resolved
); | ||
listener.onFailure(new ElasticsearchTimeoutException("timed out while waiting for cluster to unblock", exception)); | ||
} | ||
}, clusterState -> isTaskCancelled(task) || checkBlock(request, clusterState) == null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that if the task is cancelled while waiting for an unblocking cluster state then it'll wait for the next cluster state update before completing, which could be arbitrarily far in the future. Could we use org.elasticsearch.tasks.CancellableTask#addListener
to react more promptly to cancellation instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍. It looks like we're not doing that in TransportMasterNodeAction
either. Am I missing something or should we add it there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh we probably should. But that's a good point, let's leave this with the same behaviour as in TransportMasterNodeAction
and we can come back to this change later.
public GetAliasesRequest(String... aliases) { | ||
this.aliases = aliases; | ||
this.originalAliases = aliases; | ||
this(MasterNodeRequest.TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, aliases); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather we did those changes first to set things up for this PR to go through without having to add back this trappy parameter.
@@ -51,6 +51,8 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=cat-v] | |||
|
|||
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards] | |||
|
|||
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=master-timeout] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok to me, tho note that we have never before supported ?master_timeout
on these APIs, this isn't reinstating a parameter that previously was removed.
This will make
TransportLocalClusterStateAction
wait for a new statethat is not blocked. This means we need a timeout (again). For
consistency's sake, we're reusing the REST param
master_timeout
forthis timeout as well.
The only class that was using
TransportLocalClusterStateAction
wasTransportGetAliasesAction
, so its request needed to accept a timeoutagain as well.