Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] InternalTestClusterIT testStoppingNodesOneByOne failing #89325

Closed
DaveCTurner opened this issue Aug 13, 2022 · 1 comment · Fixed by #89326
Closed

[CI] InternalTestClusterIT testStoppingNodesOneByOne failing #89325

DaveCTurner opened this issue Aug 13, 2022 · 1 comment · Fixed by #89326
Assignees
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

The failing assertion was introduced recently in #89298 and indicates a bug.

Build scan:
https://gradle-enterprise.elastic.co/s/fjmlkdojglcho/tests/:test:framework:integTest/org.elasticsearch.test.test.InternalTestClusterIT/testStoppingNodesOneByOne

Reproduction line:
./gradlew ':test:framework:integTest' --tests "org.elasticsearch.test.test.InternalTestClusterIT.testStoppingNodesOneByOne" -Dtests.seed=870CFEA2FF2FF837 -Dtests.locale=de-LU -Dtests.timezone=Europe/Minsk -Druntime.java=17

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.test.test.InternalTestClusterIT&tests.test=testStoppingNodesOneByOne

Failure excerpt:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=101, name=elasticsearch[node_t2][cluster_coordination][T#1], state=RUNNABLE, group=TGRP-InternalTestClusterIT]

  at __randomizedtesting.SeedInfo.seed([870CFEA2FF2FF837:41AC62AC915F5692]:0)

  Caused by: java.lang.AssertionError: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of notification of leader failure: [node_t1][127.0.0.1:30491] Node not connected on EsThreadPoolExecutor[name = node_t2/cluster_coordination, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5b147e7d[Shutting down, pool size = 1, active threads = 1, queued tasks = 2, completed tasks = 79]] (shutdown)

    at __randomizedtesting.SeedInfo.seed([870CFEA2FF2FF837]:0)
    at org.elasticsearch.transport.TransportService.handleSendRequestException(TransportService.java:799)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:719)
    at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.handleWakeUp(LeaderChecker.java:229)
    at org.elasticsearch.cluster.coordination.LeaderChecker.updateLeader(LeaderChecker.java:152)
    at org.elasticsearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:903)
    at org.elasticsearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:370)
    at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$handleFollowerCheck$3(FollowersChecker.java:204)
    at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
    at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:887)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(Thread.java:833)

    Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of notification of leader failure: [node_t1][127.0.0.1:30491] Node not connected on EsThreadPoolExecutor[name = node_t2/cluster_coordination, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5b147e7d[Shutting down, pool size = 1, active threads = 1, queued tasks = 2, completed tasks = 79]] (shutdown)

      at org.elasticsearch.common.util.concurrent.EsRejectedExecutionHandler.newRejectedException(EsRejectedExecutionHandler.java:40)
      at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:34)
      at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:833)
      at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1365)
      at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:72)
      at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.leaderFailed(LeaderChecker.java:334)
      at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler$1.handleException(LeaderChecker.java:257)
      at org.elasticsearch.transport.TransportService.handleSendRequestException(TransportService.java:794)
      at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:719)
      at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.handleWakeUp(LeaderChecker.java:229)
      at org.elasticsearch.cluster.coordination.LeaderChecker.updateLeader(LeaderChecker.java:152)
      at org.elasticsearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:903)
      at org.elasticsearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:370)
      at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$handleFollowerCheck$3(FollowersChecker.java:204)
      at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
      at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
      at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:887)
      at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      at java.lang.Thread.run(Thread.java:833)

@DaveCTurner DaveCTurner added :Distributed Coordination/Network Http and internode communication implementations >bug >test-failure Triaged test failures from CI labels Aug 13, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 13, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner self-assigned this Aug 13, 2022
@DaveCTurner DaveCTurner added :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. and removed :Distributed Coordination/Network Http and internode communication implementations labels Aug 13, 2022
elasticmachine pushed a commit to DaveCTurner/elasticsearch that referenced this issue Aug 13, 2022
bearer-pipeline-test pushed a commit to BearerPipelineTest/elasticsearch that referenced this issue Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants