Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] CcrRollingUpgradeIT testCannotFollowLeaderInUpgradedCluster fails in CI #39355

Closed
costin opened this issue Feb 25, 2019 · 11 comments
Closed
Assignees
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI

Comments

@costin
Copy link
Member

costin commented Feb 25, 2019

See:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+corretto-periodic/ES_BUILD_JAVA=java11,label=amazon/38/console

@costin costin added >test-failure Triaged test failures from CI :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Feb 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@costin
Copy link
Member Author

costin commented Feb 25, 2019

Might be a side-effect of memory locking (there are several warnings in the logs):

org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:43237], URI [/not_supported/_ccr/follow?wait_for_active_shards=1], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[upgraded-node-leader-0][127.0.0.1:33013][indices:admin/xpack/ccr/put_follow]"}],"type":"illegal_argument_exception","reason":"no index stats available for the leader index"},"status":400}
	at __randomizedtesting.SeedInfo.seed([5BD57230DCE13991:D58A82BC76C8228A]:0)
	at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:260)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:238)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212)
	at org.elasticsearch.upgrades.CcrRollingUpgradeIT.followIndex(CcrRollingUpgradeIT.java:311)
	at org.elasticsearch.upgrades.CcrRollingUpgradeIT.testCannotFollowLeaderInUpgradedCluster(CcrRollingUpgradeIT.java:227)

@martijnvg
Copy link
Member

This is a test issue. The no index stats available for the leader index error is thrown when none of the leader index shards are started. This likely to happen in other tests too.

I can add awaitYellow/Green statement before they put follow is executed in CcrRollingUpgradeIT.testCannotFollowLeaderInUpgradedCluster() test, but I wonder whether we should change the put follow api (and resume follow api) to wait for at least all leader primary shards to be started before fetching the indices stats to figure out the historyUUIDS in CcrLicenseChecker#fetchLeaderHistoryUUIDs(...) method.

@dnhatn @jasontedor what do you think about this?

@martijnvg
Copy link
Member

The old failure doesn't occur any more. Unfortunately it just failed with a different failure several times.

REPRODUCE WITH: ./gradlew :x-pack:qa:rolling-upgrade-multi-cluster:v8.0.0#follower#upgradedClusterTestRunner -Dtests.seed=1F0AD97401B9839A -Dtests.class=org.elasticsearch.upgrades.CcrRollingUpgradeIT -Dtests.method="testCannotFollowLeaderInUpgradedCluster" -Dtests.security.manager=true -Dtests.locale=en-AI -Dtests.timezone=Etc/GMT-12 -Dcompiler.java=11 -Druntime.java=11
FAILURE 3.45s | CcrRollingUpgradeIT.testCannotFollowLeaderInUpgradedCluster <<< FAILURES!
   > Throwable #1: junit.framework.AssertionFailedError: Expected exception ResponseException but no exception was thrown
   >    at __randomizedtesting.SeedInfo.seed([1F0AD97401B9839A:915529F8AB909881]:0)
   >    at org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2685)
   >    at org.apache.lucene.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2670)
   >    at org.elasticsearch.upgrades.CcrRollingUpgradeIT.testCannotFollowLeaderInUpgradedCluster(CcrRollingUpgradeIT.java:219)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   >    at java.base/java.lang.Thread.run(Thread.java:834)

I guess it wasn't notice because bwc tests were active for a while

This failure reproduces. I will mute and then work on a fix.

@martijnvg
Copy link
Member

Muted test on master, 7.x, 7.0 and 6.7.

martijnvg added a commit that referenced this issue Mar 7, 2019
martijnvg added a commit that referenced this issue Mar 7, 2019
@martijnvg
Copy link
Member

The problem, that the expected failure doesn't occur, only occurs on master branch. So i've unmuted the test on the other branches.

martijnvg added a commit that referenced this issue Mar 7, 2019
@dakrone
Copy link
Member

dakrone commented Jul 15, 2020

This failed again today, but I was unable to reproduce it: https://gradle-enterprise.elastic.co/s/bsxqgpj54davc

@dakrone dakrone reopened this Jul 15, 2020
@dakrone
Copy link
Member

dakrone commented Jul 15, 2020

Already muted as part of #59625

@dnhatn
Copy link
Member

dnhatn commented Jul 16, 2020

Close in favor of #59625

@dnhatn dnhatn closed this as completed Jul 16, 2020
@dakrone
Copy link
Member

dakrone commented Jul 16, 2020

@dnhatn I'm a little confused by this, did you mean to close this issue in favor of itself?

@dnhatn
Copy link
Member

dnhatn commented Jul 16, 2020

@dakrone Thanks for noticing that. I referred to a wrong issue. It should be #59625 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants