Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cluster manager bootstrap takes time causing intermittent failures in integration tests (o.o.c.ClusterHealthIT.testHealthOnClusterManagerFailover) #1828

Closed
dreamer-89 opened this issue Dec 29, 2021 · 7 comments · Fixed by #13505
Assignees
Labels
bug Something isn't working Cluster Manager flaky-test Random test failure that succeeds on second run

Comments

@dreamer-89
Copy link
Member

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Run o.o.cluster.ClusterHealthIT.testHealthOnMasterFailover and enable index creation.
  2. Reduce the master node timeout to <10 seconds.
  3. Run test multiple times. It fails with >90% when 1 second timeout is used.

Expected behavior
Master node boot up time should stay less than < 1 minut.

Host/Environment (please complete the following information):

  • OS: iOS
@dreamer-89 dreamer-89 added bug Something isn't working untriaged labels Dec 29, 2021
@anasalkouz anasalkouz added flaky-test Random test failure that succeeds on second run and removed untriaged labels Jan 4, 2022
@dblock
Copy link
Member

dblock commented Jan 13, 2022

Failure in #1874 (comment) looks the same

> Task :server:internalClusterTest

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.ClusterHealthIT.testHealthOnMasterFailover" -Dtests.seed=2391EC7752804595 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-EC -Dtests.timezone=Etc/Greenwich -Druntime.java=17

org.opensearch.cluster.ClusterHealthIT > testHealthOnMasterFailover FAILED
    java.lang.AssertionError: expected same:<RED> was not:<GREEN>
        at __randomizedtesting.SeedInfo.seed([2391EC7752804595:BD7EC35A516A5973]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotSame(Assert.java:829)
        at org.junit.Assert.assertSame(Assert.java:772)
        at org.junit.Assert.assertSame(Assert.java:783)
        at org.opensearch.cluster.ClusterHealthIT.testHealthOnMasterFailover(ClusterHealthIT.java:393)

@dblock dblock changed the title [BUG] Master bootstrap takes time causing intermittent failures in integration tests [BUG] Master bootstrap takes time causing intermittent failures in integration tests (o.o.c.ClusterHealthIT.testHealthOnMasterFailover) Jan 13, 2022
@saratvemulapalli
Copy link
Member

saratvemulapalli commented Feb 2, 2022

Similar failure: #2037 #2047

@tlfeng
Copy link
Collaborator

tlfeng commented Mar 15, 2022

For detail, please see issue #1693 for the error message of MasterNotDiscoveredException

@Poojita-Raj Poojita-Raj changed the title [BUG] Master bootstrap takes time causing intermittent failures in integration tests (o.o.c.ClusterHealthIT.testHealthOnMasterFailover) [BUG] Cluster manager bootstrap takes time causing intermittent failures in integration tests (o.o.c.ClusterHealthIT.testHealthOnClusterManagerFailover) Nov 15, 2022
@dblock
Copy link
Member

dblock commented Nov 23, 2022

Another one in #5354 (comment)

@rahulkarajgikar
Copy link
Contributor

checking

@rahulkarajgikar
Copy link
Contributor

5k runs on linux machine with 2 minutes, was able to see 5 failures.

5k runs on linux machine with 3 minutes, did not see any failures.

@rahulkarajgikar
Copy link
Contributor

Raised PR to increase timeout: #13505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager flaky-test Random test failure that succeeds on second run
Projects
Status: ✅ Done
8 participants