Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TransportLocalClusterStateAction wait for cluster to unblock #117230

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

nielsbauman
Copy link
Contributor

This will make TransportLocalClusterStateAction wait for a new state
that is not blocked. This means we need a timeout (again). For
consistency's sake, we're reusing the REST param master_timeout for
this timeout as well.

The only class that was using TransportLocalClusterStateAction was
TransportGetAliasesAction, so its request needed to accept a timeout
again as well.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 labels Nov 21, 2024
This will make `TransportLocalClusterStateAction` wait for a new state
that is not blocked. This means we need a timeout (again). For
consistency's sake, we're reusing the REST param `master_timeout` for
this timeout as well.

The only class that was using `TransportLocalClusterStateAction` was
`TransportGetAliasesAction`, so its request needed to accept a timeout
again as well.
@@ -51,6 +51,8 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=cat-v]

include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]

include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=master-timeout]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if these docs and rest-api-spec changes are out of scope or not for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me, tho note that we have never before supported ?master_timeout on these APIs, this isn't reinstating a parameter that previously was removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I realized that, but I figured it made sense to add them if they're parameters we're accepting.

@@ -146,6 +146,7 @@
exports org.elasticsearch.action.support.master;
exports org.elasticsearch.action.support.master.info;
exports org.elasticsearch.action.support.nodes;
exports org.elasticsearch.action.support.local;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved TransportLocalClusterStateAction to its own local package - together with the new LocalClusterStateRequest.

public GetAliasesRequest(String... aliases) {
this.aliases = aliases;
this.originalAliases = aliases;
this(MasterNodeRequest.TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, aliases);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt out of scope for this PR to refactor all these constructor callers to provide an explicit master timeout - that will have to be done in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather we did those changes first to set things up for this PR to go through without having to add back this trappy parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we do those changes first if there is currently - on main - no timeout to set in these constructors? Are you suggesting adding a constructor on main that takes a timeout but doesn't actually do anything with it and swapping that constructor with the ones on this branch afterward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hm, good point. Ok, let's do it this way round.

@nielsbauman nielsbauman added >enhancement :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed Coordination Meta label for Distributed Coordination team and removed needs:triage Requires assignment of a team area label labels Nov 21, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Nov 21, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine
Copy link
Collaborator

Hi @nielsbauman, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine removed the Team:Distributed Coordination Meta label for Distributed Coordination team label Nov 21, 2024
@nielsbauman
Copy link
Contributor Author

@elasticmachine update branch

);
listener.onFailure(new ElasticsearchTimeoutException("timed out while waiting for cluster to unblock", exception));
}
}, clusterState -> isTaskCancelled(task) || checkBlock(request, clusterState) == null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that if the task is cancelled while waiting for an unblocking cluster state then it'll wait for the next cluster state update before completing, which could be arbitrarily far in the future. Could we use org.elasticsearch.tasks.CancellableTask#addListener to react more promptly to cancellation instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍. It looks like we're not doing that in TransportMasterNodeAction either. Am I missing something or should we add it there too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh we probably should. But that's a good point, let's leave this with the same behaviour as in TransportMasterNodeAction and we can come back to this change later.

public GetAliasesRequest(String... aliases) {
this.aliases = aliases;
this.originalAliases = aliases;
this(MasterNodeRequest.TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT, aliases);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather we did those changes first to set things up for this PR to go through without having to add back this trappy parameter.

@@ -51,6 +51,8 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=cat-v]

include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]

include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=master-timeout]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me, tho note that we have never before supported ?master_timeout on these APIs, this isn't reinstating a parameter that previously was removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement Team:Distributed Indexing Meta label for Distributed Indexing team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants