Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.elasticsearch.discovery.zen.UnicastZenPingTests testSimplePings failure #26701

Closed
nik9000 opened this issue Sep 18, 2017 · 7 comments · Fixed by #26764 or #38893
Closed

org.elasticsearch.discovery.zen.UnicastZenPingTests testSimplePings failure #26701

nik9000 opened this issue Sep 18, 2017 · 7 comments · Fixed by #26764 or #38893
Assignees
Labels
:Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >test Issues or PRs that are addressing/adding tests v7.2.0 v8.0.0-alpha1

Comments

@nik9000
Copy link
Member

nik9000 commented Sep 18, 2017

Doesn't reproduce locally but here is what CI said:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=ubuntu/1531/consoleFull

 Throwable #1: java.lang.AssertionError: node [UZP_C] ping count to [UZP_D] is unexpected
 Expected: <3>
      but: was <2>
 	at __randomizedtesting.SeedInfo.seed([7A6799622BA295CA:119B3346A64F64EA]:0)
 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
 	at org.elasticsearch.discovery.zen.UnicastZenPingTests.assertPingCount(UnicastZenPingTests.java:735)
 	at org.elasticsearch.discovery.zen.UnicastZenPingTests.testSimplePings(UnicastZenPingTests.java:253)
 	at java.lang.Thread.run(Thread.java:748)

The failure in CI has tons of other logging around the failure but that logging looks similar to what I get locally when I run the test.

@nik9000 nik9000 added :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >test Issues or PRs that are addressing/adding tests labels Sep 18, 2017
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Sep 23, 2017
@original-brownbear
Copy link
Member

@nik9000 fixed this in #26764 I think. Reproducing this locally is easiest by simply running the test in a loop and it will fail after ~1k iterations (run two forks in parallel and it's a matter of seconds for this exact failure to occur, see PR for why parallel runs make this more likely).

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Sep 25, 2017
original-brownbear added a commit that referenced this issue Sep 26, 2017
…ME_WAIT Sockets (#26764)

 #26701 Added option to RST instead of FIN to TcpTransport#closeChannels
@original-brownbear
Copy link
Member

Fixed in #26764 :)

s1monw pushed a commit that referenced this issue Sep 27, 2017
…ME_WAIT Sockets (#26764)

 #26701 Added option to RST instead of FIN to TcpTransport#closeChannels
@ywelsch
Copy link
Contributor

ywelsch commented Jan 17, 2019

I wonder if we're having the same issue again, see https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=debian/185/consoleText

We are supposed to connect to all 4 nodes, but we can't:

 1> [2019-01-17T08:20:17,402][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] before test
  1> [2019-01-17T08:20:17,421][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:59956}, bound_addresses {[::1]:44412}, {127.0.0.1:59956}
  1> [2019-01-17T08:20:17,431][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:52778}, bound_addresses {[::1]:50051}, {127.0.0.1:52778}
  1> [2019-01-17T08:20:17,442][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:38129}, bound_addresses {[::1]:50526}, {127.0.0.1:38129}
  1> [2019-01-17T08:20:17,442][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] UZP_D version set to [6.7.0]
  1> [2019-01-17T08:20:17,454][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:37553}, bound_addresses {[::1]:46058}, {127.0.0.1:37553}
  1> [2019-01-17T08:20:17,456][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] ping from UZP_A
  1> [2019-01-17T08:20:23,135][WARN ][o.e.d.z.UnicastZenPing   ] [testSimplePings] timed out after [5s] resolving host [127.0.0.1:59956]
  1> [2019-01-17T08:20:23,135][WARN ][o.e.d.z.UnicastZenPing   ] [testSimplePings] timed out after [5s] resolving host [127.0.0.1:52778]
  1> [2019-01-17T08:20:23,135][WARN ][o.e.d.z.UnicastZenPing   ] [testSimplePings] timed out after [5s] resolving host [127.0.0.1:38129]
  1> [2019-01-17T08:20:23,135][WARN ][o.e.d.z.UnicastZenPing   ] [testSimplePings] timed out after [5s] resolving host [127.0.0.1:37553]
  1> [2019-01-17T08:20:23,160][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] shutting down...
  1> [2019-01-17T08:20:23,184][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] after test
FAILURE 5.79s J1 | UnicastZenPingTests.testSimplePings <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: <1>
   >      but: was <0>
   > 	at __randomizedtesting.SeedInfo.seed([51A989765F1675F5:3A552352D2FB84D5]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.discovery.zen.UnicastZenPingTests.testSimplePings(UnicastZenPingTests.java:220)
   > 	at java.lang.Thread.run(Thread.java:748)

@ywelsch ywelsch reopened this Jan 17, 2019
@original-brownbear original-brownbear self-assigned this Jan 17, 2019
@jbaiera
Copy link
Member

jbaiera commented Jan 17, 2019

Can confirm that running this locally at about 1k iterations reproduces the error:

#> ./gradlew :server:unitTest   -Dtests.seed=51A989765F1675F5   -Dtests.class=org.elasticsearch.discovery.zen.UnicastZenPingTests   -Dtests.method="testSimplePings"   -Dtests.security.manager=true   -Dtests.locale=es-PY   -Dtests.timezone=Africa/Mbabane   -Dcompiler.java=11   -Druntime.java=8 -Dtests.iters=1000 -Dtests.failfast=true
[...snip...]
 1> [2019-01-17T21:38:39,987][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] [seed=[51A989765F1675F5:85F5AE96BA9F5083]] before test
  1> [2019-01-17T21:38:39,991][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:64368}, bound_addresses {[::1]:64367}, {127.0.0.1:64368}
  1> [2019-01-17T21:38:39,993][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:64370}, bound_addresses {[::1]:64369}, {127.0.0.1:64370}
  1> [2019-01-17T21:38:39,995][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:64372}, bound_addresses {[::1]:64371}, {127.0.0.1:64372}
  1> [2019-01-17T21:38:39,995][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] UZP_D version set to [6.7.0]
  1> [2019-01-17T21:38:39,998][INFO ][o.e.t.TransportService   ] [testSimplePings] publish_address {127.0.0.1:64374}, bound_addresses {[::1]:64373}, {127.0.0.1:64374}
  1> [2019-01-17T21:38:39,999][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] ping from UZP_A
  1> [2019-01-17T21:38:40,004][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_A] [1] failed send ping to {127.0.0.1:64374}{qlFUagb2RCKxs4jgikC7HA}{127.0.0.1}{127.0.0.1:64374}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64374}{qlFUagb2RCKxs4jgikC7HA}{127.0.0.1}{127.0.0.1:64374}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,005][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_A] [1] failed send ping to {127.0.0.1:64372}{2TUgFa0aTGWSNlVS7e3Afw}{127.0.0.1}{127.0.0.1:64372}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64372}{2TUgFa0aTGWSNlVS7e3Afw}{127.0.0.1}{127.0.0.1:64372}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,007][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_A] [1] failed send ping to {127.0.0.1:64374}{X2XEb2wHRyu-fVOCX2-VAg}{127.0.0.1}{127.0.0.1:64374}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64374}{X2XEb2wHRyu-fVOCX2-VAg}{127.0.0.1}{127.0.0.1:64374}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,008][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_A] [1] failed send ping to {127.0.0.1:64372}{bm_etgCATHGExb6rQSMF0g}{127.0.0.1}{127.0.0.1:64372}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64372}{bm_etgCATHGExb6rQSMF0g}{127.0.0.1}{127.0.0.1:64372}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,010][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_A] [1] failed send ping to {127.0.0.1:64374}{KEoACGfkQhGQJjqIDf_GyQ}{127.0.0.1}{127.0.0.1:64374}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64374}{KEoACGfkQhGQJjqIDf_GyQ}{127.0.0.1}{127.0.0.1:64374}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,010][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_A] [1] failed send ping to {127.0.0.1:64372}{Mk2b_UM4R9S2H7agBYDxWw}{127.0.0.1}{127.0.0.1:64372}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64372}{Mk2b_UM4R9S2H7agBYDxWw}{127.0.0.1}{127.0.0.1:64372}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,011][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] ping from UZP_B
  1> [2019-01-17T21:38:40,014][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_B] [1] failed send ping to {127.0.0.1:64372}{fcyElX0cSB-iujfunMyZ2A}{127.0.0.1}{127.0.0.1:64372}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64372}{fcyElX0cSB-iujfunMyZ2A}{127.0.0.1}{127.0.0.1:64372}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,014][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_B] [1] failed send ping to {127.0.0.1:64374}{6egowKtFSM2eofFgD-Qr7g}{127.0.0.1}{127.0.0.1:64374}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64374}{6egowKtFSM2eofFgD-Qr7g}{127.0.0.1}{127.0.0.1:64374}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,017][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_B] [1] failed send ping to {127.0.0.1:64372}{xBiCfKfzRqy4ttyR71rG-w}{127.0.0.1}{127.0.0.1:64372}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64372}{xBiCfKfzRqy4ttyR71rG-w}{127.0.0.1}{127.0.0.1:64372}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,017][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_B] [1] failed send ping to {127.0.0.1:64374}{zgetbFqTTJCkJMLm7DkvSA}{127.0.0.1}{127.0.0.1:64374}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64374}{zgetbFqTTJCkJMLm7DkvSA}{127.0.0.1}{127.0.0.1:64374}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,021][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_B] [1] failed send ping to {127.0.0.1:64372}{hThrcCbfSge_8xeF7CQDRA}{127.0.0.1}{127.0.0.1:64372}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64372}{hThrcCbfSge_8xeF7CQDRA}{127.0.0.1}{127.0.0.1:64372}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,021][WARN ][o.e.d.z.UnicastZenPing   ] [UZP_B] [1] failed send ping to {127.0.0.1:64374}{VHntvRJJTqeIkd-2kru41Q}{127.0.0.1}{127.0.0.1:64374}
  1> java.lang.IllegalStateException: handshake failed, mismatched cluster name [Cluster [mismatch]] - {127.0.0.1:64374}{VHntvRJJTqeIkd-2kru41Q}{127.0.0.1}{127.0.0.1:64374}
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:421) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportService.handshake(TransportService.java:385) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.discovery.zen.UnicastZenPingTests$TestUnicastZenPing$1.doRun(UnicastZenPingTests.java:881) [test/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
  1> [2019-01-17T21:38:40,027][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] shutting down...
  1> [2019-01-17T21:38:40,031][INFO ][o.e.d.z.UnicastZenPingTests] [testSimplePings] [seed=[51A989765F1675F5:85F5AE96BA9F5083]] after test
  2> REPRODUCE WITH: ./gradlew :server:unitTest -Dtests.seed=51A989765F1675F5 -Dtests.class=org.elasticsearch.discovery.zen.UnicastZenPingTests -Dtests.method="testSimplePings {seed=[51A989765F1675F5:85F5AE96BA9F5083]}" -Dtests.security.manager=true -Dtests.locale=es-PY -Dtests.timezone=Africa/Mbabane -Dcompiler.java=11 -Druntime.java=8
FAILURE 0.07s | UnicastZenPingTests.testSimplePings {seed=[51A989765F1675F5:85F5AE96BA9F5083]} <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: node [UZP_B] ping count to [UZP_A] is unexpected
   > Expected: <3>
   >      but: was <2>
   >    at __randomizedtesting.SeedInfo.seed([51A989765F1675F5:85F5AE96BA9F5083]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.discovery.zen.UnicastZenPingTests.assertPings(UnicastZenPingTests.java:748)
   >    at org.elasticsearch.discovery.zen.UnicastZenPingTests.testSimplePings(UnicastZenPingTests.java:235)
   >    at java.lang.Thread.run(Thread.java:748)
IGNOR/A 0.01s | UnicastZenPingTests.testSimplePings {seed=[51A989765F1675F5:794B4E6012FD68E3]}
   > Assumption #1: Ignored, failures limit reached (1 >= 1).
[...snip...]
IGNOR/A 0.00s | UnicastZenPingTests.testSimplePings {seed=[51A989765F1675F5:7C57D3D7A0FF224B]}
   > Assumption #1: Ignored, failures limit reached (1 >= 1).
  2> NOTE: leaving temporary files on disk at: /Users/james.baiera/Documents/source/es/elasticsearch-6.x/server/build/testrun/unitTest/J0/temp/org.elasticsearch.discovery.zen.UnicastZenPingTests_51A989765F1675F5-001
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=2010, maxMBSortInHeap=7.745145049049367, sim=RandomSimilarity(queryNorm=false): {}, locale=es-PY, timezone=Africa/Mbabane
  2> NOTE: Mac OS X 10.14.2 x86_64/Oracle Corporation 1.8.0_171 (64-bit)/cpus=8,threads=1,free=458943056,total=532676608
  2> NOTE: All tests run in this JVM: [UnicastZenPingTests]
Completed [1/1] in 29.00s, 1000 tests, 1 failure, 617 skipped <<< FAILURES!


> Task :server:unitTest FAILED
Tests with failures:
  - org.elasticsearch.discovery.zen.UnicastZenPingTests.testSimplePings {seed=[51A989765F1675F5:85F5AE96BA9F5083]}

   [junit4] JVM J0:     0.37 ..    33.25 =    32.88s
   [junit4] Execution time total: 33 seconds
   [junit4] Tests summary: 1 suite, 1000 tests, 1 failure, 617 ignored (617 assumptions)

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':server:unitTest'.
> There were test failures: 1 suite, 1000 tests, 1 failure, 617 ignored (617 assumptions) [seed: 51A989765F1675F5]

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/5.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 1m 14s
27 actionable tasks: 1 executed, 26 up-to-date

jbaiera added a commit that referenced this issue Jan 17, 2019
jbaiera added a commit that referenced this issue Jan 17, 2019
@original-brownbear
Copy link
Member

original-brownbear commented Jan 17, 2019

It's def. not the TIME_WAIT issue again here I think. Just ran a few k iterations (no luck reproducing this for me so far) and I have < 10 TIME_WAIT connections open afterward.
Looking into what's causing this now regardless :)

Turns out the above only holds true for master, in 6.x we're still producing thousands of lingering TIME_WAIT ... looking into that now.

Reenabled test in master in 62ddc8c

original-brownbear added a commit that referenced this issue Jan 18, 2019
* This was muted needlessly, the problem in #26701 only applies to `6.x`
* Relates #26701
@jimczi jimczi added the v8.0.0 label Feb 12, 2019
@jimczi
Copy link
Contributor

jimczi commented Feb 12, 2019

@original-brownbear it's also failing in master:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/1521/console

./gradlew :server:unitTest \
  -Dtests.seed=9889DED36477654D \
  -Dtests.class=org.elasticsearch.discovery.zen.UnicastZenPingTests \
  -Dtests.method="testSimplePings" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-VE \
  -Dtests.timezone=Africa/Mbabane \
  -Dcompiler.java=11 \
  -Druntime.java=8

Not reproducible locally for me.

@original-brownbear
Copy link
Member

original-brownbear commented Feb 12, 2019

20190124224140-D76BF75E-log-2658-3890858197344673937.txt

attaching the logs for the above failure (master build 1521) here. Seems like this may not have been a network issue to begin with. Will investigate soon.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Feb 14, 2019
* Just like elastic#37268 removing another 1s timeout, those are dangerous since they're easily exceeded by an untimely gc pause
* Closes elastic#26701
original-brownbear added a commit that referenced this issue Feb 14, 2019
* Just like #37268 removing another 1s timeout, those are dangerous since they're easily exceeded by an untimely gc pause
* Closes #26701
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Feb 27, 2019
* Just like elastic#37268 removing another 1s timeout, those are dangerous since they're easily exceeded by an untimely gc pause
* Closes elastic#26701
original-brownbear added a commit that referenced this issue Feb 27, 2019
* Just like #37268 removing another 1s timeout, those are dangerous since they're easily exceeded by an untimely gc pause
* Closes #26701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >test Issues or PRs that are addressing/adding tests v7.2.0 v8.0.0-alpha1
Projects
None yet
7 participants