Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix port assignment and discovery in tests #33675

Closed
6 tasks done
DaveCTurner opened this issue Sep 13, 2018 · 4 comments
Closed
6 tasks done

Fix port assignment and discovery in tests #33675

DaveCTurner opened this issue Sep 13, 2018 · 4 comments
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. :Distributed Coordination/Network Http and internode communication implementations >test Issues or PRs that are addressing/adding tests v6.6.0 v7.0.0-beta1

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 13, 2018

Today when configuring nodes for tests we use a variety of mechanisms to ensure that they can find each other and form a cluster. At least one of these mechanisms is racy, leading to occasional test failures (#29244), and few of them implement our documented recommendation:

that the unicast hosts list be maintained as the list of master-eligible nodes in the cluster.

The main obstacle to fixing this was the difficulty of dynamically configuring the unicast hosts list in tests.

In the REST tests, we currently start up one seed node and then have this node as the single host in all other nodes' unicast lists, but this means that the cluster can sometimes form without the seed node knowing about it, and it then cannot ping any of the other nodes to join. Unfortunately these tests include BWC ones so we can only reasonably make this change in master once 6.5 is released, or else we will be trying to configure nodes that do not natively support file-based discovery.

The racy one is the AbstractDisruption test case, which attempts to find a free port by binding to it, then releasing it, and then telling the node to use that, and for some reason port 30210 is occasionally in use by something else. It really does seem to be just this port that causes issues, and we don't want to backport all this work to 5.6, but we do want to avoid test failures like this in 5.6, so let's just avoid this particular port in that branch.

@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Network Http and internode communication implementations v7.0.0 v6.5.0 labels Sep 13, 2018
@DaveCTurner DaveCTurner self-assigned this Sep 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@DaveCTurner DaveCTurner added the :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. label Sep 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 15, 2018
original-brownbear added a commit that referenced this issue Oct 16, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery.
* Relates #33675
* Simplify away ClusterDiscoveryConfiguration
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 17, 2018
* For 7.0 use file based discovery in REST tests
* Relates elastic#33675
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 24, 2018
* Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before
* Manually set master nodes in the cases where we didn't use the default
* Remove use of the `NodeConfigurationSource` indirection
* Relates elastic#33675
original-brownbear added a commit that referenced this issue Oct 24, 2018
* For `6.5+` use file based discovery in REST tests
* Relates #33675
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 25, 2018
* With all 3 nodes starting in parallel 2 nodes can win a master election and
start waiting for nodes to join when started in parallel. We wait for 30s on the
rest tests for the cluster health endpoint to show 3 nodes which breaks if
one of the master nodes itself waits for 30s for joining nodes, waiting for 5 seconds
fixes the issue and allows us to run with `minimum master nodes < node count`
* relates elastic#33675
@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
original-brownbear added a commit that referenced this issue Oct 26, 2018
* DISCOVERY: Use Realistic Num. of Min Master Nodes

* With all 3 nodes starting in parallel 2 nodes can win a master election and
start waiting for nodes to join when started in parallel. We wait for 30s on the
rest tests for the cluster health endpoint to show 3 nodes which breaks if
one of the master nodes itself waits for 30s for joining nodes, waiting for 5 seconds
fixes the issue and allows us to run with `minimum master nodes < node count`
* relates #33675
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 29, 2018
* Exlude port 30210 from tests because it collides for unknown reasons
* Relates elastic#33675
kcm pushed a commit that referenced this issue Oct 30, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery.
* Relates #33675
* Simplify away ClusterDiscoveryConfiguration
kcm pushed a commit that referenced this issue Oct 30, 2018
* For `6.5+` use file based discovery in REST tests
* Relates #33675
kcm pushed a commit that referenced this issue Oct 30, 2018
* DISCOVERY: Use Realistic Num. of Min Master Nodes

* With all 3 nodes starting in parallel 2 nodes can win a master election and
start waiting for nodes to join when started in parallel. We wait for 30s on the
rest tests for the cluster health endpoint to show 3 nodes which breaks if
one of the master nodes itself waits for 30s for joining nodes, waiting for 5 seconds
fixes the issue and allows us to run with `minimum master nodes < node count`
* relates #33675
original-brownbear added a commit that referenced this issue Oct 30, 2018
* Exlude port 30210 from tests because it collides for unknown reasons
* Relates #33675
@original-brownbear
Copy link
Member

@DaveCTurner all points handled I think :) Good to close here?

original-brownbear added a commit that referenced this issue Oct 31, 2018
* DISCOVERY: Cleanup AbstractDisruptionTestCase

* Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before
* Remove use of the `NodeConfigurationSource` indirection
* Relates #33675
@DaveCTurner
Copy link
Contributor Author

Yes, all done.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Nov 14, 2018
* For `6.5+` use file based discovery in REST tests
* Relates elastic#33675
original-brownbear added a commit that referenced this issue Nov 15, 2018
* For `6.5+` use file based discovery in REST tests
* Relates #33675
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Nov 16, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery.
* Relates elastic#33675
* Simplify away ClusterDiscoveryConfiguration
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Nov 16, 2018
* DISCOVERY: Cleanup AbstractDisruptionTestCase

* Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before
* Remove use of the `NodeConfigurationSource` indirection
* Relates elastic#33675
original-brownbear added a commit that referenced this issue Nov 16, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery.
* Relates #33675
* Simplify away ClusterDiscoveryConfiguration
original-brownbear added a commit that referenced this issue Nov 16, 2018
* DISCOVERY: Cleanup AbstractDisruptionTestCase

* Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before
* Remove use of the `NodeConfigurationSource` indirection
* Relates #33675
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. :Distributed Coordination/Network Http and internode communication implementations >test Issues or PRs that are addressing/adding tests v6.6.0 v7.0.0-beta1
Projects
None yet
Development

No branches or pull requests

4 participants