[CI] FullClusterRestartIT.testSearch fails on master #25311

dimitris-athanasiou · 2017-06-20T09:05:39Z

Link to CI: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+bwc-tests/81/console

Reproduce with:

gradle :qa:full-cluster-restart:v5.0.2#upgradedClusterTestRunner -Dtests.seed=6FCA8286356192B5 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=id -Dtests.timezone=Etc/GMT+7

Could not reproduce it.

Full error:

Suite: org.elasticsearch.upgrades.FullClusterRestartIT
  1> [2017-06-20T08:46:31,294][WARN ][o.e.b.JNANatives         ] unable to install syscall filter: 
  1> java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed
  1> 	at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:363) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:215) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:99) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:110) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:80) [framework-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.test.ESTestCase.<clinit>(ESTestCase.java:186) [framework-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_131]
  1> 	at java.lang.Class.forName(Class.java:348) [?:1.8.0_131]
  1> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:592) [randomizedtesting-runner-2.5.0.jar:?]
  1> [2017-06-20T08:46:31,313][WARN ][o.e.b.JNANatives         ] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
  1> [2017-06-20T08:46:31,313][WARN ][o.e.b.JNANatives         ] This can result in part of the JVM being swapped out.
  1> [2017-06-20T08:46:31,314][WARN ][o.e.b.JNANatives         ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
  1> [2017-06-20T08:46:31,314][WARN ][o.e.b.JNANatives         ] These can be adjusted by modifying /etc/security/limits.conf, for example: 
  1> 	# allow user 'jenkins' mlockall
  1> 	jenkins soft memlock unlimited
  1> 	jenkins hard memlock unlimited
  1> [2017-06-20T08:46:31,314][WARN ][o.e.b.JNANatives         ] If you are logged in interactively, you will have to re-login for the new limits to take effect.
  1> [2017-06-20T01:46:32,052][INFO ][o.e.u.FullClusterRestartIT] [testSingleDoc]: before test
  1> [2017-06-20T01:46:32,074][INFO ][o.e.u.FullClusterRestartIT] initializing REST clients against [http://[::1]:33081, http://[::1]:39135]
  1> [2017-06-20T01:46:32,826][INFO ][o.e.u.FullClusterRestartIT] [testSingleDoc]: after test
  1> [2017-06-20T01:46:32,847][INFO ][o.e.u.FullClusterRestartIT] [testNewReplicasWork]: before test
  1> [2017-06-20T01:46:33,367][INFO ][o.e.u.FullClusterRestartIT] [testNewReplicasWork]: after test
  1> [2017-06-20T01:46:33,372][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: before test
  1> [2017-06-20T01:46:33,384][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search
  1> [2017-06-20T01:46:33,393][INFO ][o.e.u.FullClusterRestartIT] Found 2482 in old index
  1> [2017-06-20T01:46:33,393][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search with sort
  1> [2017-06-20T01:46:33,422][INFO ][o.e.u.FullClusterRestartIT] --> testing exists filter
  1> [2017-06-20T01:46:33,443][INFO ][o.e.u.FullClusterRestartIT] --> testing _all search
  1> [2017-06-20T01:46:34,195][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: after test
  2> REPRODUCE WITH: gradle :qa:full-cluster-restart:v5.0.2#upgradedClusterTestRunner -Dtests.seed=6FCA8286356192B5 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=id -Dtests.timezone=Etc/GMT+7
FAILURE 0.84s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<0> but was:<519772>
   > 	at __randomizedtesting.SeedInfo.seed([6FCA8286356192B5:FD7317D37178CA1A]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:435)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:149)
   > 	at java.lang.Thread.run(Thread.java:748)
  1> [2017-06-20T01:46:34,229][INFO ][o.e.u.FullClusterRestartIT] [testSnapshotRestore]: before test
  1> [2017-06-20T01:46:34,908][INFO ][o.e.u.FullClusterRestartIT] There are still tasks running after this test that might break subsequent tests [internal:index/shard/recovery/file_chunk, internal:index/shard/recovery/start_recovery].
  1> [2017-06-20T01:46:34,909][INFO ][o.e.u.FullClusterRestartIT] [testSnapshotRestore]: after test
  1> [2017-06-20T01:46:34,917][INFO ][o.e.u.FullClusterRestartIT] [testRecovery]: before test
  1> [2017-06-20T01:46:35,010][INFO ][o.e.u.FullClusterRestartIT] There are still tasks running after this test that might break subsequent tests [internal:index/shard/recovery/file_chunk, internal:index/shard/recovery/start_recovery].
  1> [2017-06-20T01:46:35,010][INFO ][o.e.u.FullClusterRestartIT] [testRecovery]: after test
  1> [2017-06-20T01:46:35,026][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: before test
  1> [2017-06-20T01:46:35,052][ERROR][o.e.u.FullClusterRestartIT] clusterState={cluster_name=full-cluster-restart, compressed_size_in_bytes=3199, metadata={indices={restored_testsnapshotrestore={settings={index={number_of_shards=5, provided_name=testsnapshotrestore, creation_date=1497948353547, number_of_replicas=1, uuid=5YSqwvBaRSauR32PZ2lrzA, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[gCTVGM0IRMu6FiCW_wafEQ], 1=[-lf2VFE8Tve2nHYQKJ7cDg], 2=[vQV9eUwUS0KjWcuqWVXDPA], 3=[2Wc6p_sCRaaqSNkEknX1Uw], 4=[YJi_6tgCTvOKDs0u9AEGHA]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, testsearch={settings={index={refresh_interval=1s, number_of_shards=1, provided_name=testsearch, creation_date=1497948357550, number_of_replicas=0, uuid=0gFkIub6QdW3sFf6hUO25A, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={bool={type=boolean}, field={properties={with={properties={dots={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}}, string={type=text}, binary={store=true, type=binary}, foo={type=text, fields={keyword={ignore_above=256, type=keyword}}}, float={type=float}, dots_in_field_names={type=text}, int={type=long}}}}, aliases=[], state=open, in_sync_allocations={0=[aDrIAXM7RmGaGB7RPqslIw]}, primary_terms={0=2}}, testaliaswithbadname={settings={index={number_of_shards=1, provided_name=testaliaswithbadname, creation_date=1497948347228, number_of_replicas=0, uuid=lx09u25HQdyp_KLfb35neA, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={key={type=keyword}}}}, aliases=[#testaliaswithbadname], state=open, in_sync_allocations={0=[5t10L5PXQgCT43ZjRCMtQA]}, primary_terms={0=2}}, testsingledoc={settings={index={number_of_shards=5, provided_name=testsingledoc, creation_date=1497948330341, number_of_replicas=1, uuid=M71hphp7SR6nsXv8vQKccg, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={test={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[aBqjfD-QQaOl9ttQcAtB5Q, 6lc8KdrtSKqgThoU4xN-bw], 1=[bmfqVGrbT9OvoXwrKWesgQ, VY4FoetYQT-LA536Njtlgg], 2=[nW_fTTjIQbGI_uAj8N_d6Q, yyjITc0gT-qdawPDzmR5Uw], 3=[fT_K4f37Sma_TuWjChDCig, gSh_oykiRg-ufZKiVuRtQw], 4=[TzVLSWQ1TBq6_I4OSA-jEg, yxZyliQUSZGKE_NnNKAsfA]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, testnewreplicaswork={settings={index={number_of_shards=1, provided_name=testnewreplicaswork, creation_date=1497948331660, number_of_replicas=1, uuid=JMpUI6ExQXCQNseTRxMhVg, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text}}}}, aliases=[], state=open, in_sync_allocations={0=[RBqacL0QSGuGpcT3SnoqXA, Jq6cXpdrQ_Gyromx5iKhdQ]}, primary_terms={0=2}}, testsnapshotrestore={settings={index={number_of_shards=5, provided_name=testsnapshotrestore, creation_date=1497948353547, number_of_replicas=1, uuid=jKVtQRQeQtmcf4EAoFsMlQ, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[eCQpxsSrQb-mW3p55MC92Q, hIbMtnj2QvqHXnmavttFXA], 1=[NY5pk7fpSvK2jGXbO3hAsw, w5Qg9IfBQPCVG5IdIA1T0g], 2=[jlLuniShQxaREOf19wo7dQ, cIpyM7g8SmSG8nZnh1vhPA], 3=[USGzVDKtRvS_1STKFqMpnw, d6Ckq7XYQJKwn1WMz6CGuA], 4=[j6OBWcWJT7S2FmXgaDDFUg, kIPn-9otTMuNHb-zSUhcAQ]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, testrecovery={settings={index={number_of_shards=5, provided_name=testrecovery, creation_date=1497948348582, number_of_replicas=1, uuid=glGyYBQQR7SWlFOrQjS3HQ, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[qBeilrG6SbKZgEyDFGqnTg, TeiSagr4So-Unx5-d2pd6w], 1=[syrkQ478RUmAabauMhVA1g, vNjfVU5LRC2KEBAKnXaGlQ], 2=[_p4VoqjISbW46-Jpy8CmPQ, nlmREv_MTSKQ0SNXG_d4Gw], 3=[NamlkbNNRSS77J_nv7kWdA, S2EVbn48Tt2y9Zl3yAzmjA], 4=[w677kVvFRdmcFNfdbZyO1A, Jx6nu2-RRPmx_XX1jcFzIQ]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, info={settings={index={number_of_shards=5, provided_name=info, creation_date=1497948348164, number_of_replicas=1, uuid=8vGlHvyKSUyDRGpJvM_0Pg, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={value={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[CBfSbTVESnKzkYC6NHYG0A, -4S7fFFaSuqnie_qXw51iA], 1=[MsfoHJI1SmyaAEfL3q8a-A, m9eTciEGQtKmTPxjl0_PnA], 2=[CPhCk50tQV2f_Eo99iASDw, daNPgIFsRKeEHuodR_opUQ], 3=[SeKnywUyRi2uVE_8CeGV7A, nI_vmrLaRZSZUTDX84Tgpg], 4=[6u6Y9mqwQ52EZrIU7NJc0Q, NsT677n7Sd6t0sPsHNCFyA]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}}, cluster_uuid=pb6OYrMmT6mBUSbmB3LAfw, repositories={repo={settings={compress=true, location=/var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/cluster/shared/repo}, type=fs}}, templates={}, index-graveyard={tombstones=[{index={index_uuid=lWxKhL6DSyilySChFRrWnQ, index_name=restored_testsnapshotrestore}, delete_date_in_millis=1497948394266}]}}}
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/testrun/v5.0.2#upgradedClusterTestRunner/J0/temp/org.elasticsearch.upgrades.FullClusterRestartIT_6FCA8286356192B5-001
  1> [2017-06-20T01:46:35,185][INFO ][o.e.u.FullClusterRestartIT] There are still tasks running after this test that might break subsequent tests [internal:index/shard/recovery/file_chunk, internal:index/shard/recovery/start_recovery].
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=1823, maxMBSortInHeap=5.933591739645208, sim=RandomSimilarity(queryNorm=true): {}, locale=id, timezone=Etc/GMT+7
  1> [2017-06-20T01:46:35,185][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: after test
  2> NOTE: Linux 2.6.32-642.15.1.el6.x86_64 amd64/Oracle Corporation 1.8.0_131 (64-bit)/cpus=4,threads=1,free=466418616,total=514850816
  2> NOTE: All tests run in this JVM: [FullClusterRestartIT]
Completed [1/1] in 4.79s, 6 tests, 1 failure <<< FAILURES!

The text was updated successfully, but these errors were encountered:

Relates to #25311

danielmitterdorfer · 2017-06-27T07:23:18Z

We have another test failure in build 1172:

ERROR   0.77s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.NullPointerException
   > 	at __randomizedtesting.SeedInfo.seed([EC77F0A9E127324E:7ECE65FCA53E6AE1]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:482)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:696)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:479)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:156)
   > 	at java.lang.Thread.run(Thread.java:748)

It is not reproducible locally so I conclude it is timing-related and the API endpoint does not always return data for the index that is checked in the predicate.

martijnvg · 2017-06-27T08:51:01Z

That upgrade status should always return data for the index in that test, regardles whether an upgrade did actually run. I'll add some more logging.

Relates to #25311

danielmitterdorfer · 2017-06-27T09:16:38Z

We've just had more failures in

martijnvg · 2017-06-27T11:58:33Z

These failures didn't yet have the extra logging I added.

Relates to #25311

jaymode · 2017-06-28T20:15:38Z

Another failure https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=debian/1181/console

martijnvg · 2017-06-29T09:54:37Z

Another failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+bwc-tests/107/console

Relates to #25311

martijnvg · 2017-07-03T07:47:10Z

Failures occurred over the weekend:

I think the upgrade status api was reporting incorrect stats because a refresh may not have occurred yet. I'll make sure a refresh is ran after the upgrade.

Relates to #25311

javanna · 2017-07-04T08:19:37Z

Another failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+bwc-tests/122 .
Seems like the additional refresh call didn't help.

2> REPRODUCE WITH: gradle :qa:full-cluster-restart:v5.1.1#upgradedClusterTestRunner -Dtests.seed=51FA4D6D953D9A54 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=zh-HK -Dtests.timezone=Indian/Mayotte
FAILURE 10.9s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<0> but was:<489468>
   > 	at __randomizedtesting.SeedInfo.seed([51FA4D6D953D9A54:C343D838D124C2FB]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:722)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:696)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:485)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:156)
   > 	at java.lang.Thread.run(Thread.java:748)
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
  1> [2017-07-04T03:47:54,954][INFO ][o.e.u.FullClusterRestartIT] [testClusterState]: before test
  1> [2017-07-04T03:47:54,983][INFO ][o.e.u.FullClusterRestartIT] [testClusterState]: after test
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/testrun/v5.1.1#upgradedClusterTestRunner/J0/temp/org.elasticsearch.upgrades.FullClusterRestartIT_51FA4D6D953D9A54-001
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=1982, maxMBSortInHeap=5.75625762753509, sim=RandomSimilarity(queryNorm=false): {}, locale=zh-HK, timezone=Indian/Mayotte
  2> NOTE: Linux 4.4.62-18.6-default amd64/Oracle Corporation 1.8.0_131 (64-bit)/cpus=4,threads=1,free=460630976,total=514850816
  2> NOTE: All tests run in this JVM: [FullClusterRestartIT]
Completed [1/1] in 15.81s, 7 tests, 1 failure, 1 skipped <<< FAILURES!

[ant:junit4] JVM J0: stderr was not empty, see: /var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/testrun/v5.1.1#upgradedClusterTestRunner/temp/junit4-J0-20170704_004738_1841494010593132791203.syserr
Tests with failures:
  - org.elasticsearch.upgrades.FullClusterRestartIT.testSearch

javanna · 2017-07-04T08:21:22Z

And another one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/3411 .

…ade really ran Relates to #25311

martijnvg · 2017-07-04T09:44:52Z

This is just weird. I pushed another commit that checks the upgrade api response to check whether the index really was upgraded.

javanna · 2017-07-04T12:10:50Z

last failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos/1202 .

martijnvg · 2017-07-04T17:22:22Z

last failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos/1202 .

That failure didn't yet have the commit that verifies the upgrade response. I'm interested in seeing if that fails too.

martijnvg · 2017-07-04T19:42:07Z

Another two failures:

The upgrade api returns:

{_shards={total=1, failed=0, successful=0}, upgraded_indices={}}

Indicating to the upgrade has not been performed. So I ssh-ed into the build slave used for the last build and the upgrade seem to be invoked:

[2017-07-04T17:41:09,589][INFO ][o.e.i.e.Engine           ] [node-0] [testsearch][0] starting segment upgrade upgradeOnlyAncientSegments=false

However the finished segment upgrade log line is missing here, indicating that something went wrong during upgrading. There is no trace of any error in the log. I'll increase the log level the TransportUpgradeAction to see what actually goes wrong here.

jpountz · 2017-07-05T15:27:04Z

Another one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=sles/1206/console

martijnvg · 2017-07-06T12:31:12Z

This one failed in the same way: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+bwc-tests/129

However the log file does contain the finished segment upgrade log line:

[2017-07-06T09:00:27,778][INFO ][o.e.i.e.Engine           ] [node-0] [testsearch][0] starting segment upgrade upgradeOnlyAncientSegments=false
[2017-07-06T09:00:28,188][INFO ][o.e.i.e.Engine           ] [node-0] [testsearch][0] finished segment upgrade

This is weird, because the upgrade api returned:

{_shards={total=1, failed=0, successful=0}, upgraded_indices={}}

Indicating no upgrade ran. (which then failed the test)

The log file also contained many restore failures (which the previous failures didn't have) (not sure if it is related):

[2017-07-06T09:00:26,304][WARN ][o.e.i.c.IndicesClusterStateService] [node-0] [[restored_testsnapshotrestore][2]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [restored_testsnapshotrestore][2]: Recovery failed from {node-1}{bz86vogHQX2P5MZCat7_Kw}{H1NJ5aCmQMqsneFDSR1OMw}{127.0.0.1}{127.0.0.1:41051}{testattr=test} into {node-0}{B_toSUhESeOkFQbbAZqE3g}{9DTa331OTBe8sF7tcDNIMQ}{127.0.0.1}{127.0.0.1:34998}{testattr=test}
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:75) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:617) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][127.0.0.1:41051][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] prepare target for translog failed
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:177) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
Caused by: org.elasticsearch.common.util.CancellableThreads$ExecutionCancelledException: recovery was canceled reason [shard is closed]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler$1.onCancel(RecoverySourceHandler.java:106) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:129) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
	Suppressed: java.lang.IllegalStateException: Future got interrupted
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.prepareForTranslogOperations(RemoteRecoveryTargetHandler.java:83) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$prepareTargetForTranslog$6(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:105) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
	Caused by: java.lang.InterruptedException
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) ~[?:1.8.0_131]
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ~[?:1.8.0_131]
		at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:251) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:94) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:42) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		... 18 more
[2017-07-06T09:00:26,311][WARN ][o.e.c.a.s.ShardStateAction] [node-0] [restored_testsnapshotrestore][2] received shard failed for shard id [[restored_testsnapshotrestore][2]], allocation id [047LokEjR7uqcnxlWWhagw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[restored_testsnapshotrestore][2]: Recovery failed from {node-1}{bz86vogHQX2P5MZCat7_Kw}{H1NJ5aCmQMqsneFDSR1OMw}{127.0.0.1}{127.0.0.1:41051}{testattr=test} into {node-0}{B_toSUhESeOkFQbbAZqE3g}{9DTa331OTBe8sF7tcDNIMQ}{127.0.0.1}{127.0.0.1:34998}{testattr=test}]; nested: RemoteTransportException[[node-1][127.0.0.1:41051][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ExecutionCancelledException[recovery was canceled reason [shard is closed]]; ]
org.elasticsearch.indices.recovery.RecoveryFailedException: [restored_testsnapshotrestore][2]: Recovery failed from {node-1}{bz86vogHQX2P5MZCat7_Kw}{H1NJ5aCmQMqsneFDSR1OMw}{127.0.0.1}{127.0.0.1:41051}{testattr=test} into {node-0}{B_toSUhESeOkFQbbAZqE3g}{9DTa331OTBe8sF7tcDNIMQ}{127.0.0.1}{127.0.0.1:34998}{testattr=test}
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:75) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:617) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][127.0.0.1:41051][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] prepare target for translog failed
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:177) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
Caused by: org.elasticsearch.common.util.CancellableThreads$ExecutionCancelledException: recovery was canceled reason [shard is closed]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler$1.onCancel(RecoverySourceHandler.java:106) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:129) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
	Suppressed: java.lang.IllegalStateException: Future got interrupted
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.prepareForTranslogOperations(RemoteRecoveryTargetHandler.java:83) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$prepareTargetForTranslog$6(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:105) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
	Caused by: java.lang.InterruptedException
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) ~[?:1.8.0_131]
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ~[?:1.8.0_131]
		at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:251) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:94) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:42) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		... 18 more

talevy · 2017-07-12T16:16:18Z

another one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+aggressive-opts/370/console
another one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=fedora/1229/console

martijnvg · 2017-07-14T15:33:52Z

This test hasn't after: 9040f44 got pushed. This commit changes to test to wait for green state before running upgrade. I noticed that in one of these build failures that shard header reported that the upgrade command was send to 0 shards. (total was 1 and successful was 1). If this test doesn't fail over the weekend then I will close this issue.

talevy · 2017-08-09T23:59:11Z

happened again: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+bwc-tests/42/console

gradle :qa:full-cluster-restart:v5.4.3#upgradedClusterTestRunner -Dtests.seed=C9DE4FDD8C148B00 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=sr-Latn-BA -Dtests.timezone=BST
  1> [2017-08-09T16:47:20,196][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: before test
  1> [2017-08-09T16:47:20,222][INFO ][o.e.u.FullClusterRestartIT] health api response: {number_of_pending_tasks=1, cluster_name=full-cluster-restart, active_shards=1, active_primary_shards=1, unassigned_shards=0, delayed_unassigned_shards=0, timed_out=false, relocating_shards=0, initializing_shards=0, task_max_waiting_in_queue_millis=0, number_of_data_nodes=2, number_of_in_flight_fetch=0, active_shards_percent_as_number=100.0, status=green, number_of_nodes=2}
  1> [2017-08-09T16:47:20,223][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search
  1> [2017-08-09T16:47:20,258][INFO ][o.e.u.FullClusterRestartIT] Found 2491 in old index
  1> [2017-08-09T16:47:20,258][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search with sort
  1> [2017-08-09T16:47:20,291][INFO ][o.e.u.FullClusterRestartIT] --> testing exists filter
  1> [2017-08-09T16:47:20,321][INFO ][o.e.u.FullClusterRestartIT] --> testing _all search
  1> [2017-08-09T16:47:21,035][INFO ][o.e.u.FullClusterRestartIT] upgrade api response: {_shards={total=1, failed=0, successful=0}, upgraded_indices={}}
  1> [2017-08-09T16:47:21,090][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: after test
FAILURE 0.91s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError
   > 	at __randomizedtesting.SeedInfo.seed([C9DE4FDD8C148B00:5B67DA88C80DD3AF]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:559)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:167)
   > 	at java.lang.Thread.run(Thread.java:748)
  1> [2017-08-09T16:47:21,106][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: before test
  1> [2017-08-09T16:47:21,124][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: after test

danielmitterdorfer · 2017-08-10T12:28:28Z

And again in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+bwc-tests/45/console

FAILURE 1.00s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError
   > 	at __randomizedtesting.SeedInfo.seed([F32820552B31394A:6191B5006F2861E5]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:559)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:167)
   > 	at java.lang.Thread.run(Thread.java:745)

The failing assertion checks whether versions of upgraded indices are present (i.e. not null). Attached are the cluster logs from the 5.3.3 on which the test has failed.

cluster-logs.zip

martijnvg · 2017-08-11T13:37:50Z

I see two things in the cluster logs:

A lot of restore snaphot errors in node0
In node1 I see: [2017-08-10T11:48:56,881][INFO ][o.e.i.e.Engine ] [node-1] [testsearch][0] starting segment upgrade upgradeOnlyAncientSegments=false but not finished segment upgrade, which should be there if I read the code in InternalEngine#forceMerge(...) (line 1459) correctly.

I think both observations can be the reason why this test fails.

cbuescher · 2017-08-14T12:32:38Z

This one looks similar: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+bwc-tests/57

nik9000 · 2017-08-15T19:14:52Z

So these tests are against the _upgrade API which document as not being useful any more. I think these were just ported from the old tests. I don't think we need to keep these tests because we don't advise folks to use the API anymore anyway.

nik9000 · 2017-08-15T19:20:00Z

I'll open a PR to remove the failing test in a bit. If we end up with some other mystery error in the bwc tests we should keep this issue in mind, but I don't think it is worth spending a ton of time hunting this issue given that the API should go anyway.

Our documentation for the API is: ``` The _upgrade API is no longer useful and will be removed. Instead, see Reindex to upgrade. ``` Given that, I don't think we need to test the API anymore. Closes elastic#25311

Our documentation for the API is: ``` The _upgrade API is no longer useful and will be removed. Instead, see Reindex to upgrade. ``` Given that, I don't think we need to test the API anymore. Closes #25311

dimitris-athanasiou added >test-failure Triaged test failures from CI >test Issues or PRs that are addressing/adding tests v6.0.0 labels Jun 20, 2017

dimitris-athanasiou assigned martijnvg Jun 20, 2017

martijnvg added a commit that referenced this issue Jun 21, 2017

test: verify size_to_upgrade_in_bytes in assertBusy(...)

926527a

Relates to #25311

martijnvg added a commit that referenced this issue Jun 27, 2017

test: added extra log line

9f5aef7

Relates to #25311

martijnvg added a commit that referenced this issue Jun 27, 2017

test: get upgrade status for all indices

11fcfaa

Relates to #25311

martijnvg added a commit that referenced this issue Jun 29, 2017

test: added not null assertion

7f2bcf1

Relates to #25311

martijnvg added a commit that referenced this issue Jul 3, 2017

test: Run flush before upgrade and refresh after upgrade.

129f53d

Relates to #25311

martijnvg added a commit that referenced this issue Jul 4, 2017

test: also inspect the upgrade api response to check whether the upgr…

1c63c82

…ade really ran Relates to #25311

martijnvg mentioned this issue Jul 12, 2017

FullClusterRestartIT testSearch failure: upgraded versions assertion failed #25678

Closed

martijnvg closed this as completed Jul 17, 2017

clintongormley added the v6.0.0-beta1 label Jul 25, 2017

clintongormley removed the v6.0.0 label Jul 25, 2017

colings86 added the :Core/Infra/Core Core issues without another label label Jul 25, 2017

danielmitterdorfer reopened this Aug 10, 2017

nik9000 assigned nik9000 and unassigned martijnvg Aug 15, 2017

nik9000 mentioned this issue Aug 15, 2017

Drop upgrade from full cluster restart tests #26224

Merged

nik9000 closed this as completed in #26224 Aug 15, 2017

timoninmaxim mentioned this issue Dec 24, 2019

Rest only High Level APIs completeness #47678

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] FullClusterRestartIT.testSearch fails on master #25311

[CI] FullClusterRestartIT.testSearch fails on master #25311

dimitris-athanasiou commented Jun 20, 2017 •

edited

Loading

danielmitterdorfer commented Jun 27, 2017

martijnvg commented Jun 27, 2017

danielmitterdorfer commented Jun 27, 2017 •

edited

Loading

martijnvg commented Jun 27, 2017

jaymode commented Jun 28, 2017

martijnvg commented Jun 29, 2017

martijnvg commented Jul 3, 2017

javanna commented Jul 4, 2017

javanna commented Jul 4, 2017

martijnvg commented Jul 4, 2017

javanna commented Jul 4, 2017

martijnvg commented Jul 4, 2017

martijnvg commented Jul 4, 2017

jpountz commented Jul 5, 2017

martijnvg commented Jul 6, 2017

talevy commented Jul 12, 2017 •

edited

Loading

martijnvg commented Jul 14, 2017

talevy commented Aug 9, 2017

danielmitterdorfer commented Aug 10, 2017

martijnvg commented Aug 11, 2017

cbuescher commented Aug 14, 2017

nik9000 commented Aug 15, 2017

nik9000 commented Aug 15, 2017

[CI] FullClusterRestartIT.testSearch fails on master #25311

[CI] FullClusterRestartIT.testSearch fails on master #25311

Comments

dimitris-athanasiou commented Jun 20, 2017 • edited Loading

danielmitterdorfer commented Jun 27, 2017

martijnvg commented Jun 27, 2017

danielmitterdorfer commented Jun 27, 2017 • edited Loading

martijnvg commented Jun 27, 2017

jaymode commented Jun 28, 2017

martijnvg commented Jun 29, 2017

martijnvg commented Jul 3, 2017

javanna commented Jul 4, 2017

javanna commented Jul 4, 2017

martijnvg commented Jul 4, 2017

javanna commented Jul 4, 2017

martijnvg commented Jul 4, 2017

martijnvg commented Jul 4, 2017

jpountz commented Jul 5, 2017

martijnvg commented Jul 6, 2017

talevy commented Jul 12, 2017 • edited Loading

martijnvg commented Jul 14, 2017

talevy commented Aug 9, 2017

danielmitterdorfer commented Aug 10, 2017

martijnvg commented Aug 11, 2017

cbuescher commented Aug 14, 2017

nik9000 commented Aug 15, 2017

nik9000 commented Aug 15, 2017

dimitris-athanasiou commented Jun 20, 2017 •

edited

Loading

danielmitterdorfer commented Jun 27, 2017 •

edited

Loading

talevy commented Jul 12, 2017 •

edited

Loading