Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce usage of TransportMasterNodeReadAction #101805

Open
7 of 40 tasks
Tracked by #77466
DaveCTurner opened this issue Nov 4, 2023 · 6 comments
Open
7 of 40 tasks
Tracked by #77466

Reduce usage of TransportMasterNodeReadAction #101805

DaveCTurner opened this issue Nov 4, 2023 · 6 comments
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Meta Team:Distributed Indexing Meta label for Distributed Indexing team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >tech debt

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Nov 4, 2023

TransportMasterNodeReadAction is intended for cases where we need to collect some state held only on the elected master, for instance related to shard allocation or data stream errors. However, many TransportMasterNodeReadAction implementations work as a pure function of the cluster state, which is held on every node, so there is no need to route these requests to the master for processing. Moreover, some of these requests may be quite expensive to process in large clusters, so routing them all to the master represents a scalability bottleneck. We should reconsider each usage of TransportMasterNodeReadAction and decide whether it needs to run on the master or not. If not, we should convert them to regular local-only transport actions (e.g. using TransportLocalClusterStateAction).

Additionally, many of these actions are not currently cancellable, but they (or at least the expensive ones) should be. Experience shows that we're not great at spotting the expensive ones ahead of time, so IMO we should err on the side of caution and make each one cancellable unless we have a good reason for not doing so.

Note that attempting to route these requests to the current master does not give them any stronger consistency guarantees, because the node that does the work does not validate that it is the master before responding. It's possible that a new master has been elected, and the cluster state updated, without the responding node knowing about it.

  • GetDataStreamsTransportAction (org.elasticsearch.datastreams.action)
  • GetPipelineTransportAction (org.elasticsearch.action.ingest)
  • TransportClusterGetSettingsAction (org.elasticsearch.action.admin.cluster.settings) Run TransportClusterGetSettingsAction on local node #119831
  • TransportClusterHealthAction (org.elasticsearch.action.admin.cluster.health)
  • TransportClusterInfoAction (org.elasticsearch.action.support.master.info)
  • TransportClusterSearchShardsAction (org.elasticsearch.action.admin.cluster.shards)
  • TransportClusterStateAction (org.elasticsearch.action.admin.cluster.state)
  • TransportDeprecationInfoAction (org.elasticsearch.xpack.deprecation)
  • TransportExplainDataStreamLifecycleAction (org.elasticsearch.datastreams.lifecycle.action)
  • TransportExplainLifecycleAction (org.elasticsearch.xpack.ilm.action)
  • TransportFollowInfoAction (org.elasticsearch.xpack.ccr.action)
  • TransportGetAliasesAction (org.elasticsearch.action.admin.indices.alias.get) Run TransportGetAliasesAction on local node #101815
  • TransportGetAnalyticsCollectionAction (org.elasticsearch.xpack.application.analytics.action)
  • TransportGetAutoFollowPatternAction (org.elasticsearch.xpack.ccr.action)
  • TransportGetBasicStatusAction (org.elasticsearch.license)
  • TransportGetComponentTemplateAction (org.elasticsearch.action.admin.indices.template.get) Run TransportGetComponentTemplateAction on local node #116868
  • TransportGetComposableIndexTemplateAction (org.elasticsearch.action.admin.indices.template.get) Run TransportGetComposableIndexTemplate on local node #119830
  • TransportGetDataStreamLifecycleAction (org.elasticsearch.datastreams.lifecycle.action)
  • TransportGetDatafeedsAction (org.elasticsearch.xpack.ml.action)
  • TransportGetDesiredBalanceAction (org.elasticsearch.action.admin.cluster.allocation), see Reduce usage of TransportMasterNodeReadAction #101805 (comment)
  • TransportGetDesiredNodesAction (org.elasticsearch.action.admin.cluster.desirednodes)
  • TransportGetEnrichPolicyAction (org.elasticsearch.xpack.enrich.action)
  • TransportGetIndexAction (org.elasticsearch.action.admin.indices.get)
  • TransportGetIndexTemplatesAction (org.elasticsearch.action.admin.indices.template.get) Run TransportGetIndexTemplatesAction on local node #119837
  • TransportGetJobModelSnapshotsUpgradeStatsAction (org.elasticsearch.xpack.ml.action)
  • TransportGetJobsAction (org.elasticsearch.xpack.ml.action)
  • TransportGetLicenseAction (org.elasticsearch.license)
  • TransportGetLifecycleAction (org.elasticsearch.xpack.ilm.action)
  • TransportGetMappingsAction (org.elasticsearch.action.admin.indices.mapping.get)
  • TransportGetRepositoriesAction (org.elasticsearch.action.admin.cluster.repositories.get)
  • TransportGetSettingsAction (org.elasticsearch.action.admin.indices.settings.get)
  • TransportGetStatusAction (org.elasticsearch.xpack.ilm.action)
  • TransportGetStoredScriptAction (org.elasticsearch.action.admin.cluster.storedscripts)
  • TransportGetTrialStatusAction (org.elasticsearch.license)
  • TransportIndicesShardStoresAction (org.elasticsearch.action.admin.indices.shards)
  • TransportPendingClusterTasksAction (org.elasticsearch.action.admin.cluster.tasks)
  • TransportPrevalidateNodeRemovalAction (org.elasticsearch.action.admin.cluster.node.shutdown)
  • TransportSimulateIndexTemplateAction (org.elasticsearch.action.admin.indices.template.post) Run template simulation actions on local node #120038
  • TransportSimulateTemplateAction (org.elasticsearch.action.admin.indices.template.post) Run template simulation actions on local node #120038
  • any other TransportMasterNodeReadAction implementations added since this list was created

Relates #77466

@DaveCTurner DaveCTurner added Meta :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >tech debt labels Nov 4, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 4, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine pushed a commit that referenced this issue Nov 6, 2023
This action is a pure function of the cluster state, it can run on any
node. Moreover it can be fairly expensive if there are a lot of aliases
so running it on the master can be quite harmful to the cluster.
Finally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

Relates #101805
@idegtiarenko
Copy link
Contributor

I believe TransportGetDesiredBalanceAction is required to run on elected as it response contains DesiredBalanceStats that are computed in the allocator during execution. I think the same is true regarding ClusterInfo. Alternatively we can update the action to run anywhere in the cluster and read only the stats and clusterInfo from elected master.

@DaveCTurner
Copy link
Contributor Author

I believe TransportGetDesiredBalanceAction is required to run on elected

Me too - please feel free to cross it off the list in the OP with a comment to that effect.

@nielsbauman
Copy link
Contributor

nielsbauman commented Nov 22, 2024

I'm adding TransportGetStatusAction and TransportGetLifecycleAction (org.elasticsearch.xpack.ilm.action) to the list. They don't extend TransportMasterNodeReadAction (they extend TransportMasterNodeAction), but they're also pure functions of the cluster state.

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Nov 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

nielsbauman added a commit that referenced this issue Dec 4, 2024
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

The `?local` and `?master_timeout` parameters become a no-op and are
marked as deprecated.

Relates #101805
Relates #107984
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete))

nielsbauman added a commit that referenced this issue Dec 23, 2024
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

The `?local` parameter becomes a no-op and is marked as deprecated.

Relates #101805
Relates #107984
navarone-feekery pushed a commit to navarone-feekery/elasticsearch that referenced this issue Dec 26, 2024
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

The `?local` parameter becomes a no-op and is marked as deprecated.

Relates elastic#101805
Relates elastic#107984
nielsbauman added a commit that referenced this issue Jan 17, 2025
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

Relates #101805
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Meta Team:Distributed Indexing Meta label for Distributed Indexing team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >tech debt
Projects
None yet
Development

No branches or pull requests

4 participants