-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix port assignment and discovery in tests #33675
Labels
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
:Distributed Coordination/Network
Http and internode communication implementations
>test
Issues or PRs that are addressing/adding tests
v6.6.0
v7.0.0-beta1
Comments
DaveCTurner
added
>test
Issues or PRs that are addressing/adding tests
:Distributed Coordination/Network
Http and internode communication implementations
v7.0.0
v6.5.0
labels
Sep 13, 2018
Pinging @elastic/es-core-infra |
DaveCTurner
added
the
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
label
Sep 13, 2018
Pinging @elastic/es-distributed |
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 15, 2018
original-brownbear
added a commit
that referenced
this issue
Oct 16, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 17, 2018
* For 7.0 use file based discovery in REST tests * Relates elastic#33675
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 24, 2018
* Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before * Manually set master nodes in the cases where we didn't use the default * Remove use of the `NodeConfigurationSource` indirection * Relates elastic#33675
original-brownbear
added a commit
that referenced
this issue
Oct 24, 2018
* For `6.5+` use file based discovery in REST tests * Relates #33675
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 25, 2018
* With all 3 nodes starting in parallel 2 nodes can win a master election and start waiting for nodes to join when started in parallel. We wait for 30s on the rest tests for the cluster health endpoint to show 3 nodes which breaks if one of the master nodes itself waits for 30s for joining nodes, waiting for 5 seconds fixes the issue and allows us to run with `minimum master nodes < node count` * relates elastic#33675
original-brownbear
added a commit
that referenced
this issue
Oct 26, 2018
* DISCOVERY: Use Realistic Num. of Min Master Nodes * With all 3 nodes starting in parallel 2 nodes can win a master election and start waiting for nodes to join when started in parallel. We wait for 30s on the rest tests for the cluster health endpoint to show 3 nodes which breaks if one of the master nodes itself waits for 30s for joining nodes, waiting for 5 seconds fixes the issue and allows us to run with `minimum master nodes < node count` * relates #33675
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Oct 29, 2018
* Exlude port 30210 from tests because it collides for unknown reasons * Relates elastic#33675
kcm
pushed a commit
that referenced
this issue
Oct 30, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration
kcm
pushed a commit
that referenced
this issue
Oct 30, 2018
* For `6.5+` use file based discovery in REST tests * Relates #33675
kcm
pushed a commit
that referenced
this issue
Oct 30, 2018
* DISCOVERY: Use Realistic Num. of Min Master Nodes * With all 3 nodes starting in parallel 2 nodes can win a master election and start waiting for nodes to join when started in parallel. We wait for 30s on the rest tests for the cluster health endpoint to show 3 nodes which breaks if one of the master nodes itself waits for 30s for joining nodes, waiting for 5 seconds fixes the issue and allows us to run with `minimum master nodes < node count` * relates #33675
original-brownbear
added a commit
that referenced
this issue
Oct 30, 2018
* Exlude port 30210 from tests because it collides for unknown reasons * Relates #33675
@DaveCTurner all points handled I think :) Good to close here? |
original-brownbear
added a commit
that referenced
this issue
Oct 31, 2018
* DISCOVERY: Cleanup AbstractDisruptionTestCase * Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before * Remove use of the `NodeConfigurationSource` indirection * Relates #33675
Yes, all done. |
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Nov 14, 2018
* For `6.5+` use file based discovery in REST tests * Relates elastic#33675
original-brownbear
added a commit
that referenced
this issue
Nov 15, 2018
* For `6.5+` use file based discovery in REST tests * Relates #33675
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Nov 16, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates elastic#33675 * Simplify away ClusterDiscoveryConfiguration
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Nov 16, 2018
* DISCOVERY: Cleanup AbstractDisruptionTestCase * Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before * Remove use of the `NodeConfigurationSource` indirection * Relates elastic#33675
original-brownbear
added a commit
that referenced
this issue
Nov 16, 2018
* Discovery: Move AbstractDisruptionTestCase to file-based discovery. * Relates #33675 * Simplify away ClusterDiscoveryConfiguration
original-brownbear
added a commit
that referenced
this issue
Nov 16, 2018
* DISCOVERY: Cleanup AbstractDisruptionTestCase * Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before * Remove use of the `NodeConfigurationSource` indirection * Relates #33675
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
:Distributed Coordination/Network
Http and internode communication implementations
>test
Issues or PRs that are addressing/adding tests
v6.6.0
v7.0.0-beta1
Today when configuring nodes for tests we use a variety of mechanisms to ensure that they can find each other and form a cluster. At least one of these mechanisms is racy, leading to occasional test failures (#29244), and few of them implement our documented recommendation:
The main obstacle to fixing this was the difficulty of dynamically configuring the unicast hosts list in tests.
MockUncasedHostProvider
in mostESIntegTestCase
s (Use file-based discovery not MockUncasedHostsProvider #33554, Use file-based discovery not MockUncasedHostsProvider (backport of #33554) #33658)AbstractDisruptionTestCase
(Disc: Move AbstractDisruptionTC to filebased D. #34461)AbstractDisruptionTestCase
, overridingnodeSettings
rather than making a separateNodeConfigurationSource
, and usingESIntegTestCase
to manageminimum_master_nodes
etc. (DISCOVERY: Cleanup AbstractDisruptionTestCase #34808)In the REST tests, we currently start up one seed node and then have this node as the single host in all other nodes' unicast lists, but this means that the cluster can sometimes form without the seed node knowing about it, and it then cannot ping any of the other nodes to join. Unfortunately these tests include BWC ones so we can only reasonably make this change in
master
once 6.5 is released, or else we will be trying to configure nodes that do not natively support file-based discovery.The racy one is the AbstractDisruption test case, which attempts to find a free port by binding to it, then releasing it, and then telling the node to use that, and for some reason port 30210 is occasionally in use by something else. It really does seem to be just this port that causes issues, and we don't want to backport all this work to 5.6, but we do want to avoid test failures like this in 5.6, so let's just avoid this particular port in that branch.
The text was updated successfully, but these errors were encountered: