[CI] InternalTestClusterIT testStoppingNodesOneByOne failing #89325

DaveCTurner · 2022-08-13T07:31:49Z

The failing assertion was introduced recently in #89298 and indicates a bug.

Build scan:
https://gradle-enterprise.elastic.co/s/fjmlkdojglcho/tests/:test:framework:integTest/org.elasticsearch.test.test.InternalTestClusterIT/testStoppingNodesOneByOne

Reproduction line:
./gradlew ':test:framework:integTest' --tests "org.elasticsearch.test.test.InternalTestClusterIT.testStoppingNodesOneByOne" -Dtests.seed=870CFEA2FF2FF837 -Dtests.locale=de-LU -Dtests.timezone=Europe/Minsk -Druntime.java=17

Applicable branches:
main

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.test.test.InternalTestClusterIT&tests.test=testStoppingNodesOneByOne

Failure excerpt:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=101, name=elasticsearch[node_t2][cluster_coordination][T#1], state=RUNNABLE, group=TGRP-InternalTestClusterIT]

  at __randomizedtesting.SeedInfo.seed([870CFEA2FF2FF837:41AC62AC915F5692]:0)

  Caused by: java.lang.AssertionError: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of notification of leader failure: [node_t1][127.0.0.1:30491] Node not connected on EsThreadPoolExecutor[name = node_t2/cluster_coordination, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5b147e7d[Shutting down, pool size = 1, active threads = 1, queued tasks = 2, completed tasks = 79]] (shutdown)

    at __randomizedtesting.SeedInfo.seed([870CFEA2FF2FF837]:0)
    at org.elasticsearch.transport.TransportService.handleSendRequestException(TransportService.java:799)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:719)
    at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.handleWakeUp(LeaderChecker.java:229)
    at org.elasticsearch.cluster.coordination.LeaderChecker.updateLeader(LeaderChecker.java:152)
    at org.elasticsearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:903)
    at org.elasticsearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:370)
    at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$handleFollowerCheck$3(FollowersChecker.java:204)
    at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
    at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:887)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(Thread.java:833)

    Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of notification of leader failure: [node_t1][127.0.0.1:30491] Node not connected on EsThreadPoolExecutor[name = node_t2/cluster_coordination, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5b147e7d[Shutting down, pool size = 1, active threads = 1, queued tasks = 2, completed tasks = 79]] (shutdown)

      at org.elasticsearch.common.util.concurrent.EsRejectedExecutionHandler.newRejectedException(EsRejectedExecutionHandler.java:40)
      at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:34)
      at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:833)
      at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1365)
      at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:72)
      at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.leaderFailed(LeaderChecker.java:334)
      at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler$1.handleException(LeaderChecker.java:257)
      at org.elasticsearch.transport.TransportService.handleSendRequestException(TransportService.java:794)
      at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:719)
      at org.elasticsearch.cluster.coordination.LeaderChecker$CheckScheduler.handleWakeUp(LeaderChecker.java:229)
      at org.elasticsearch.cluster.coordination.LeaderChecker.updateLeader(LeaderChecker.java:152)
      at org.elasticsearch.cluster.coordination.Coordinator.becomeFollower(Coordinator.java:903)
      at org.elasticsearch.cluster.coordination.Coordinator.onFollowerCheckRequest(Coordinator.java:370)
      at org.elasticsearch.cluster.coordination.FollowersChecker.lambda$handleFollowerCheck$3(FollowersChecker.java:204)
      at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
      at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
      at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:887)
      at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      at java.lang.Thread.run(Thread.java:833)

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2022-08-13T07:32:11Z

Pinging @elastic/es-distributed (Team:Distributed)

Closes elastic#89325

DaveCTurner added :Distributed Coordination/Network Http and internode communication implementations >bug >test-failure Triaged test failures from CI labels Aug 13, 2022

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 13, 2022

DaveCTurner self-assigned this Aug 13, 2022

DaveCTurner added :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. and removed :Distributed Coordination/Network Http and internode communication implementations labels Aug 13, 2022

DaveCTurner mentioned this issue Aug 13, 2022

Handle rejection in LeaderChecker #89326

Merged

elasticmachine pushed a commit to DaveCTurner/elasticsearch that referenced this issue Aug 13, 2022

AwaitsFix for elastic#89325

dcc87dd

DaveCTurner mentioned this issue Aug 13, 2022

[CI] ConcurrentSnapshotsIT testMasterFailoverAndMultipleQueuedUpSnapshotsAcrossTwoRepos failing #89317

Closed

DaveCTurner closed this as completed in #89326 Aug 15, 2022

bearer-pipeline-test pushed a commit to BearerPipelineTest/elasticsearch that referenced this issue Aug 15, 2022

Handle rejection in LeaderChecker (elastic#89326)

51f89f4

Closes elastic#89325

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] InternalTestClusterIT testStoppingNodesOneByOne failing #89325

[CI] InternalTestClusterIT testStoppingNodesOneByOne failing #89325

DaveCTurner commented Aug 13, 2022

elasticsearchmachine commented Aug 13, 2022

[CI] InternalTestClusterIT testStoppingNodesOneByOne failing #89325

[CI] InternalTestClusterIT testStoppingNodesOneByOne failing #89325

Comments

DaveCTurner commented Aug 13, 2022

elasticsearchmachine commented Aug 13, 2022