Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the ability to register a PeerFinderListener to Coordinator #88626

Merged
merged 3 commits into from
Jul 21, 2022

Conversation

masseyke
Copy link
Member

This came out of #88562. The problem there is that if we are a non-master-eligible node we want to start polling a master-eligible node whenever we realize that there is no elected master. We are notified that there is no elected master in a ClusterChangedEvent. We get the list of master-eligible nodes from PeerFinder#getFoundPeers. However at that the time that we are notified of the ClusterChangeEvent, PeerFinder#getFoundPeers returns an empty Iterable. That collection is populated in another thread, and there is currently no way to get notified of when it is populated. This PR adds the ability to register a PeerFinderListener with Coordinator. The listener has an onFoundPeersUpdated that is called whenever the collection of peers changes (whether added to or removed from).

@masseyke
Copy link
Member Author

One question I have is whether this is granular enough. For example, when a master node steps down and a new one is elected, onFoundPeersUpdated is called 3 times in rapid succession:

  1. When the master steps down, onLeaderFailure calls activate on the PeerFinder, putting the name(s) of the other nodes in there
  2. When it gets a response from each of these nodes after it tries to connect to them PeerFinder#onFoundPeersUpdated (and therefore this listener) is called again, even though the number of peers has not actually changed
  3. Once the new master is elected everything is removed from the collection of peers and this listener is called again.

For #88562 I only care about the first one. It's possible the others could have uses too.

@masseyke
Copy link
Member Author

@DaveCTurner does this look like what you had in mind when we talked?

@masseyke masseyke added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Jul 20, 2022
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense. LGTM (one nit)

@@ -147,7 +147,7 @@ void logBootstrapState(Metadata metadata) {
}
}

void onFoundPeersUpdated() {
public void onFoundPeersUpdated() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void onFoundPeersUpdated() {
@Override
public void onFoundPeersUpdated() {

@masseyke masseyke marked this pull request as ready for review July 21, 2022 14:19
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team (obsolete) label Jul 21, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke merged commit 7b8c2c7 into elastic:master Jul 21, 2022
@masseyke masseyke deleted the feature/PeerFinder-listener branch July 21, 2022 15:12
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 22, 2022
* upstream/master: (40 commits)
  Fix CI job naming
  [ML] disallow autoscaling downscaling in two trained model assignment scenarios (elastic#88623)
  Add "Vector Search" area to changelog schema
  [DOCS] Update API key API (elastic#88499)
  Enable the pipeline on the feature branch (elastic#88672)
  Adding the ability to register a PeerFinderListener to Coordinator (elastic#88626)
  [DOCS] Fix transform painless example syntax (elastic#88364)
  [ML] Muting InternalCategorizationAggregationTests testReduceRandom (elastic#88685)
  Fix double rounding errors for disk usage (elastic#88683)
  Replace health request with a state observer. (elastic#88641)
  [ML] Fail model deployment if all allocations cannot be provided (elastic#88656)
  Upgrade to OpenJDK 18.0.2+9 (elastic#88675)
  [ML] make bucket_correlation aggregation generally available (elastic#88655)
  Adding cardinality support for random_sampler agg (elastic#86838)
  Use custom task instead of generic AckedClusterStateUpdateTask (elastic#88643)
  Reinstate test cluster throttling behavior (elastic#88664)
  Mute testReadBlobWithPrematureConnectionClose
  Simplify plugin descriptor tests (elastic#88659)
  Add CI job for testing more job parallelism
  [ML] make deployment infer requests fully cancellable (elastic#88649)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement Team:Distributed Meta label for distributed team (obsolete) v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants