Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RareClusterStateIT#testDelayedMappingPropagationOnPrimary can fail with assertion error. #41030

Closed
jtibshirani opened this issue Apr 9, 2019 · 7 comments
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI

Comments

@jtibshirani
Copy link
Contributor

I wasn't able to reproduce this error locally. Although it's only failed once, I opted to file an issue because there have been some recent changes to the test related to Zen2.

This looks related to #36813, feel free to close this issue if you'd prefer to track work there.


Link to the build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=zulu11,nodes=immutable&&linux&&docker/349/console

Command to reproduce:

./gradlew :server:integTest --tests "org.elasticsearch.cluster.coordination.RareClusterStateIT.testDelayedMappingPropagationOnPrimary" \
  -Dtests.seed=332AB212F4B698FB \
  -Dtests.security.manager=true \
  -Dtests.locale=kw-GB \
  -Dtests.timezone=America/North_Dakota/Center \
  -Dcompiler.java=12 \
  -Druntime.java=11

Relevant excerpt from the logs:

org.elasticsearch.cluster.coordination.RareClusterStateIT > testDelayedMappingPropagationOnPrimary FAILED
    java.util.concurrent.TimeoutException: Timeout waiting for task.
        at __randomizedtesting.SeedInfo.seed([332AB212F4B698FB:459F59389E75EA12]:0)
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:236)
        at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:65)
        at org.elasticsearch.cluster.coordination.RareClusterStateIT.lambda$testDelayedMappingPropagationOnPrimary$3(RareClusterStateIT.java:271)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:850)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
        at org.elasticsearch.cluster.coordination.RareClusterStateIT.testDelayedMappingPropagationOnPrimary(RareClusterStateIT.java:269)
@jtibshirani jtibshirani added >test-failure Triaged test failures from CI :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Apr 9, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

jtibshirani added a commit that referenced this issue Apr 9, 2019
jtibshirani added a commit that referenced this issue Apr 9, 2019
@ywelsch
Copy link
Contributor

ywelsch commented May 7, 2019

@original-brownbear as this test has been muted for a while and there is no recent failures with logs, and I can't reproduce this locally, should we enable the test again on master to get a recent failure?

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue May 7, 2019
* Reenabling these to get fresh failure logs since they are not
reproducible locally
* Relates # elastic#36813, elastic#41030
@original-brownbear
Copy link
Member

Sure I opened #41884

original-brownbear added a commit that referenced this issue May 7, 2019
* Reenabling these to get fresh failure logs since they are not
reproducible locally
* Relates # #36813, #41030
@ywelsch ywelsch self-assigned this May 22, 2019
@jaymode
Copy link
Member

jaymode commented May 22, 2019

This popped up today in a PR build, the build scan is https://gradle.com/s/nczuoemcggsl2.

@ywelsch
Copy link
Contributor

ywelsch commented May 24, 2019

This popped up today in a PR build, the build scan is https://gradle.com/s/nczuoemcggsl2.

This failed in a different place I think, and it does not make sense to me how it could fail there (it's an assertNotNull in an assertBusy that's failing, and the assertBusy does not seem to be repeated..., the test finishes in a few seconds, much faster than the assertBusy which typically takes 10 seconds to retry). We will need another failure to be reported here.

@original-brownbear
Copy link
Member

@ywelsch see #42430 I think I fixed this one in there too (I had the replica version of the test assigned to myself).

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
* Reenabling these to get fresh failure logs since they are not
reproducible locally
* Relates # elastic#36813, elastic#41030
@original-brownbear
Copy link
Member

fixed by #42430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants