Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] FullClusterRestartIT.testSearch fails on master #25311

Closed
dimitris-athanasiou opened this issue Jun 20, 2017 · 23 comments
Closed

[CI] FullClusterRestartIT.testSearch fails on master #25311

dimitris-athanasiou opened this issue Jun 20, 2017 · 23 comments
Assignees
Labels
:Core/Infra/Core Core issues without another label >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI v6.0.0-beta1

Comments

@dimitris-athanasiou
Copy link
Contributor

dimitris-athanasiou commented Jun 20, 2017

Link to CI: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+bwc-tests/81/console

Reproduce with:

gradle :qa:full-cluster-restart:v5.0.2#upgradedClusterTestRunner -Dtests.seed=6FCA8286356192B5 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=id -Dtests.timezone=Etc/GMT+7

Could not reproduce it.

Full error:

Suite: org.elasticsearch.upgrades.FullClusterRestartIT
  1> [2017-06-20T08:46:31,294][WARN ][o.e.b.JNANatives         ] unable to install syscall filter: 
  1> java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed
  1> 	at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:363) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:215) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:99) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:110) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:80) [framework-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at org.elasticsearch.test.ESTestCase.<clinit>(ESTestCase.java:186) [framework-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
  1> 	at java.lang.Class.forName0(Native Method) ~[?:1.8.0_131]
  1> 	at java.lang.Class.forName(Class.java:348) [?:1.8.0_131]
  1> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:592) [randomizedtesting-runner-2.5.0.jar:?]
  1> [2017-06-20T08:46:31,313][WARN ][o.e.b.JNANatives         ] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
  1> [2017-06-20T08:46:31,313][WARN ][o.e.b.JNANatives         ] This can result in part of the JVM being swapped out.
  1> [2017-06-20T08:46:31,314][WARN ][o.e.b.JNANatives         ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
  1> [2017-06-20T08:46:31,314][WARN ][o.e.b.JNANatives         ] These can be adjusted by modifying /etc/security/limits.conf, for example: 
  1> 	# allow user 'jenkins' mlockall
  1> 	jenkins soft memlock unlimited
  1> 	jenkins hard memlock unlimited
  1> [2017-06-20T08:46:31,314][WARN ][o.e.b.JNANatives         ] If you are logged in interactively, you will have to re-login for the new limits to take effect.
  1> [2017-06-20T01:46:32,052][INFO ][o.e.u.FullClusterRestartIT] [testSingleDoc]: before test
  1> [2017-06-20T01:46:32,074][INFO ][o.e.u.FullClusterRestartIT] initializing REST clients against [http://[::1]:33081, http://[::1]:39135]
  1> [2017-06-20T01:46:32,826][INFO ][o.e.u.FullClusterRestartIT] [testSingleDoc]: after test
  1> [2017-06-20T01:46:32,847][INFO ][o.e.u.FullClusterRestartIT] [testNewReplicasWork]: before test
  1> [2017-06-20T01:46:33,367][INFO ][o.e.u.FullClusterRestartIT] [testNewReplicasWork]: after test
  1> [2017-06-20T01:46:33,372][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: before test
  1> [2017-06-20T01:46:33,384][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search
  1> [2017-06-20T01:46:33,393][INFO ][o.e.u.FullClusterRestartIT] Found 2482 in old index
  1> [2017-06-20T01:46:33,393][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search with sort
  1> [2017-06-20T01:46:33,422][INFO ][o.e.u.FullClusterRestartIT] --> testing exists filter
  1> [2017-06-20T01:46:33,443][INFO ][o.e.u.FullClusterRestartIT] --> testing _all search
  1> [2017-06-20T01:46:34,195][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: after test
  2> REPRODUCE WITH: gradle :qa:full-cluster-restart:v5.0.2#upgradedClusterTestRunner -Dtests.seed=6FCA8286356192B5 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=id -Dtests.timezone=Etc/GMT+7
FAILURE 0.84s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<0> but was:<519772>
   > 	at __randomizedtesting.SeedInfo.seed([6FCA8286356192B5:FD7317D37178CA1A]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:435)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:149)
   > 	at java.lang.Thread.run(Thread.java:748)
  1> [2017-06-20T01:46:34,229][INFO ][o.e.u.FullClusterRestartIT] [testSnapshotRestore]: before test
  1> [2017-06-20T01:46:34,908][INFO ][o.e.u.FullClusterRestartIT] There are still tasks running after this test that might break subsequent tests [internal:index/shard/recovery/file_chunk, internal:index/shard/recovery/start_recovery].
  1> [2017-06-20T01:46:34,909][INFO ][o.e.u.FullClusterRestartIT] [testSnapshotRestore]: after test
  1> [2017-06-20T01:46:34,917][INFO ][o.e.u.FullClusterRestartIT] [testRecovery]: before test
  1> [2017-06-20T01:46:35,010][INFO ][o.e.u.FullClusterRestartIT] There are still tasks running after this test that might break subsequent tests [internal:index/shard/recovery/file_chunk, internal:index/shard/recovery/start_recovery].
  1> [2017-06-20T01:46:35,010][INFO ][o.e.u.FullClusterRestartIT] [testRecovery]: after test
  1> [2017-06-20T01:46:35,026][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: before test
  1> [2017-06-20T01:46:35,052][ERROR][o.e.u.FullClusterRestartIT] clusterState={cluster_name=full-cluster-restart, compressed_size_in_bytes=3199, metadata={indices={restored_testsnapshotrestore={settings={index={number_of_shards=5, provided_name=testsnapshotrestore, creation_date=1497948353547, number_of_replicas=1, uuid=5YSqwvBaRSauR32PZ2lrzA, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[gCTVGM0IRMu6FiCW_wafEQ], 1=[-lf2VFE8Tve2nHYQKJ7cDg], 2=[vQV9eUwUS0KjWcuqWVXDPA], 3=[2Wc6p_sCRaaqSNkEknX1Uw], 4=[YJi_6tgCTvOKDs0u9AEGHA]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, testsearch={settings={index={refresh_interval=1s, number_of_shards=1, provided_name=testsearch, creation_date=1497948357550, number_of_replicas=0, uuid=0gFkIub6QdW3sFf6hUO25A, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={bool={type=boolean}, field={properties={with={properties={dots={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}}, string={type=text}, binary={store=true, type=binary}, foo={type=text, fields={keyword={ignore_above=256, type=keyword}}}, float={type=float}, dots_in_field_names={type=text}, int={type=long}}}}, aliases=[], state=open, in_sync_allocations={0=[aDrIAXM7RmGaGB7RPqslIw]}, primary_terms={0=2}}, testaliaswithbadname={settings={index={number_of_shards=1, provided_name=testaliaswithbadname, creation_date=1497948347228, number_of_replicas=0, uuid=lx09u25HQdyp_KLfb35neA, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={key={type=keyword}}}}, aliases=[#testaliaswithbadname], state=open, in_sync_allocations={0=[5t10L5PXQgCT43ZjRCMtQA]}, primary_terms={0=2}}, testsingledoc={settings={index={number_of_shards=5, provided_name=testsingledoc, creation_date=1497948330341, number_of_replicas=1, uuid=M71hphp7SR6nsXv8vQKccg, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={test={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[aBqjfD-QQaOl9ttQcAtB5Q, 6lc8KdrtSKqgThoU4xN-bw], 1=[bmfqVGrbT9OvoXwrKWesgQ, VY4FoetYQT-LA536Njtlgg], 2=[nW_fTTjIQbGI_uAj8N_d6Q, yyjITc0gT-qdawPDzmR5Uw], 3=[fT_K4f37Sma_TuWjChDCig, gSh_oykiRg-ufZKiVuRtQw], 4=[TzVLSWQ1TBq6_I4OSA-jEg, yxZyliQUSZGKE_NnNKAsfA]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, testnewreplicaswork={settings={index={number_of_shards=1, provided_name=testnewreplicaswork, creation_date=1497948331660, number_of_replicas=1, uuid=JMpUI6ExQXCQNseTRxMhVg, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text}}}}, aliases=[], state=open, in_sync_allocations={0=[RBqacL0QSGuGpcT3SnoqXA, Jq6cXpdrQ_Gyromx5iKhdQ]}, primary_terms={0=2}}, testsnapshotrestore={settings={index={number_of_shards=5, provided_name=testsnapshotrestore, creation_date=1497948353547, number_of_replicas=1, uuid=jKVtQRQeQtmcf4EAoFsMlQ, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[eCQpxsSrQb-mW3p55MC92Q, hIbMtnj2QvqHXnmavttFXA], 1=[NY5pk7fpSvK2jGXbO3hAsw, w5Qg9IfBQPCVG5IdIA1T0g], 2=[jlLuniShQxaREOf19wo7dQ, cIpyM7g8SmSG8nZnh1vhPA], 3=[USGzVDKtRvS_1STKFqMpnw, d6Ckq7XYQJKwn1WMz6CGuA], 4=[j6OBWcWJT7S2FmXgaDDFUg, kIPn-9otTMuNHb-zSUhcAQ]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, testrecovery={settings={index={number_of_shards=5, provided_name=testrecovery, creation_date=1497948348582, number_of_replicas=1, uuid=glGyYBQQR7SWlFOrQjS3HQ, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={field={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[qBeilrG6SbKZgEyDFGqnTg, TeiSagr4So-Unx5-d2pd6w], 1=[syrkQ478RUmAabauMhVA1g, vNjfVU5LRC2KEBAKnXaGlQ], 2=[_p4VoqjISbW46-Jpy8CmPQ, nlmREv_MTSKQ0SNXG_d4Gw], 3=[NamlkbNNRSS77J_nv7kWdA, S2EVbn48Tt2y9Zl3yAzmjA], 4=[w677kVvFRdmcFNfdbZyO1A, Jx6nu2-RRPmx_XX1jcFzIQ]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}, info={settings={index={number_of_shards=5, provided_name=info, creation_date=1497948348164, number_of_replicas=1, uuid=8vGlHvyKSUyDRGpJvM_0Pg, version={upgraded=6000003, created=5000299}}}, mappings={doc={_all={enabled=true}, properties={value={type=text, fields={keyword={ignore_above=256, type=keyword}}}}}}, aliases=[], state=open, in_sync_allocations={0=[CBfSbTVESnKzkYC6NHYG0A, -4S7fFFaSuqnie_qXw51iA], 1=[MsfoHJI1SmyaAEfL3q8a-A, m9eTciEGQtKmTPxjl0_PnA], 2=[CPhCk50tQV2f_Eo99iASDw, daNPgIFsRKeEHuodR_opUQ], 3=[SeKnywUyRi2uVE_8CeGV7A, nI_vmrLaRZSZUTDX84Tgpg], 4=[6u6Y9mqwQ52EZrIU7NJc0Q, NsT677n7Sd6t0sPsHNCFyA]}, primary_terms={0=2, 1=2, 2=2, 3=2, 4=2}}}, cluster_uuid=pb6OYrMmT6mBUSbmB3LAfw, repositories={repo={settings={compress=true, location=/var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/cluster/shared/repo}, type=fs}}, templates={}, index-graveyard={tombstones=[{index={index_uuid=lWxKhL6DSyilySChFRrWnQ, index_name=restored_testsnapshotrestore}, delete_date_in_millis=1497948394266}]}}}
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/testrun/v5.0.2#upgradedClusterTestRunner/J0/temp/org.elasticsearch.upgrades.FullClusterRestartIT_6FCA8286356192B5-001
  1> [2017-06-20T01:46:35,185][INFO ][o.e.u.FullClusterRestartIT] There are still tasks running after this test that might break subsequent tests [internal:index/shard/recovery/file_chunk, internal:index/shard/recovery/start_recovery].
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=1823, maxMBSortInHeap=5.933591739645208, sim=RandomSimilarity(queryNorm=true): {}, locale=id, timezone=Etc/GMT+7
  1> [2017-06-20T01:46:35,185][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: after test
  2> NOTE: Linux 2.6.32-642.15.1.el6.x86_64 amd64/Oracle Corporation 1.8.0_131 (64-bit)/cpus=4,threads=1,free=466418616,total=514850816
  2> NOTE: All tests run in this JVM: [FullClusterRestartIT]
Completed [1/1] in 4.79s, 6 tests, 1 failure <<< FAILURES!
@dimitris-athanasiou dimitris-athanasiou added >test-failure Triaged test failures from CI >test Issues or PRs that are addressing/adding tests v6.0.0 labels Jun 20, 2017
@danielmitterdorfer
Copy link
Member

We have another test failure in build 1172:

ERROR   0.77s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.NullPointerException
   > 	at __randomizedtesting.SeedInfo.seed([EC77F0A9E127324E:7ECE65FCA53E6AE1]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:482)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:696)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:479)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:156)
   > 	at java.lang.Thread.run(Thread.java:748)

It is not reproducible locally so I conclude it is timing-related and the API endpoint does not always return data for the index that is checked in the predicate.

@martijnvg
Copy link
Member

That upgrade status should always return data for the index in that test, regardles whether an upgrade did actually run. I'll add some more logging.

martijnvg added a commit that referenced this issue Jun 27, 2017
@martijnvg
Copy link
Member

These failures didn't yet have the extra logging I added.

martijnvg added a commit that referenced this issue Jun 27, 2017
@martijnvg
Copy link
Member

martijnvg added a commit that referenced this issue Jun 29, 2017
@martijnvg
Copy link
Member

Failures occurred over the weekend:

I think the upgrade status api was reporting incorrect stats because a refresh may not have occurred yet. I'll make sure a refresh is ran after the upgrade.

@javanna
Copy link
Member

javanna commented Jul 4, 2017

Another failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+bwc-tests/122 .
Seems like the additional refresh call didn't help.

2> REPRODUCE WITH: gradle :qa:full-cluster-restart:v5.1.1#upgradedClusterTestRunner -Dtests.seed=51FA4D6D953D9A54 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=zh-HK -Dtests.timezone=Indian/Mayotte
FAILURE 10.9s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<0> but was:<489468>
   > 	at __randomizedtesting.SeedInfo.seed([51FA4D6D953D9A54:C343D838D124C2FB]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:722)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:696)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:485)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:156)
   > 	at java.lang.Thread.run(Thread.java:748)
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
   > 	Suppressed: java.lang.AssertionError: expected:<0> but was:<489468>
   > 		at org.elasticsearch.upgrades.FullClusterRestartIT.lambda$assertUpgradeWorks$3(FullClusterRestartIT.java:493)
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:710)
   > 		... 40 more
  1> [2017-07-04T03:47:54,954][INFO ][o.e.u.FullClusterRestartIT] [testClusterState]: before test
  1> [2017-07-04T03:47:54,983][INFO ][o.e.u.FullClusterRestartIT] [testClusterState]: after test
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/testrun/v5.1.1#upgradedClusterTestRunner/J0/temp/org.elasticsearch.upgrades.FullClusterRestartIT_51FA4D6D953D9A54-001
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=1982, maxMBSortInHeap=5.75625762753509, sim=RandomSimilarity(queryNorm=false): {}, locale=zh-HK, timezone=Indian/Mayotte
  2> NOTE: Linux 4.4.62-18.6-default amd64/Oracle Corporation 1.8.0_131 (64-bit)/cpus=4,threads=1,free=460630976,total=514850816
  2> NOTE: All tests run in this JVM: [FullClusterRestartIT]
Completed [1/1] in 15.81s, 7 tests, 1 failure, 1 skipped <<< FAILURES!

[ant:junit4] JVM J0: stderr was not empty, see: /var/lib/jenkins/workspace/elastic+elasticsearch+master+bwc-tests/qa/full-cluster-restart/build/testrun/v5.1.1#upgradedClusterTestRunner/temp/junit4-J0-20170704_004738_1841494010593132791203.syserr
Tests with failures:
  - org.elasticsearch.upgrades.FullClusterRestartIT.testSearch

@javanna
Copy link
Member

javanna commented Jul 4, 2017

martijnvg added a commit that referenced this issue Jul 4, 2017
@martijnvg
Copy link
Member

This is just weird. I pushed another commit that checks the upgrade api response to check whether the index really was upgraded.

@javanna
Copy link
Member

javanna commented Jul 4, 2017

@martijnvg
Copy link
Member

last failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos/1202 .

That failure didn't yet have the commit that verifies the upgrade response. I'm interested in seeing if that fails too.

@martijnvg
Copy link
Member

Another two failures:

The upgrade api returns:

{_shards={total=1, failed=0, successful=0}, upgraded_indices={}}

Indicating to the upgrade has not been performed. So I ssh-ed into the build slave used for the last build and the upgrade seem to be invoked:

[2017-07-04T17:41:09,589][INFO ][o.e.i.e.Engine           ] [node-0] [testsearch][0] starting segment upgrade upgradeOnlyAncientSegments=false

However the finished segment upgrade log line is missing here, indicating that something went wrong during upgrading. There is no trace of any error in the log. I'll increase the log level the TransportUpgradeAction to see what actually goes wrong here.

@jpountz
Copy link
Contributor

jpountz commented Jul 5, 2017

@martijnvg
Copy link
Member

This one failed in the same way: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+bwc-tests/129

However the log file does contain the finished segment upgrade log line:

[2017-07-06T09:00:27,778][INFO ][o.e.i.e.Engine           ] [node-0] [testsearch][0] starting segment upgrade upgradeOnlyAncientSegments=false
[2017-07-06T09:00:28,188][INFO ][o.e.i.e.Engine           ] [node-0] [testsearch][0] finished segment upgrade

This is weird, because the upgrade api returned:

{_shards={total=1, failed=0, successful=0}, upgraded_indices={}}

Indicating no upgrade ran. (which then failed the test)

The log file also contained many restore failures (which the previous failures didn't have) (not sure if it is related):

[2017-07-06T09:00:26,304][WARN ][o.e.i.c.IndicesClusterStateService] [node-0] [[restored_testsnapshotrestore][2]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [restored_testsnapshotrestore][2]: Recovery failed from {node-1}{bz86vogHQX2P5MZCat7_Kw}{H1NJ5aCmQMqsneFDSR1OMw}{127.0.0.1}{127.0.0.1:41051}{testattr=test} into {node-0}{B_toSUhESeOkFQbbAZqE3g}{9DTa331OTBe8sF7tcDNIMQ}{127.0.0.1}{127.0.0.1:34998}{testattr=test}
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:75) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:617) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][127.0.0.1:41051][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] prepare target for translog failed
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:177) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
Caused by: org.elasticsearch.common.util.CancellableThreads$ExecutionCancelledException: recovery was canceled reason [shard is closed]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler$1.onCancel(RecoverySourceHandler.java:106) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:129) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
	Suppressed: java.lang.IllegalStateException: Future got interrupted
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.prepareForTranslogOperations(RemoteRecoveryTargetHandler.java:83) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$prepareTargetForTranslog$6(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:105) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
	Caused by: java.lang.InterruptedException
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) ~[?:1.8.0_131]
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ~[?:1.8.0_131]
		at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:251) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:94) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:42) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		... 18 more
[2017-07-06T09:00:26,311][WARN ][o.e.c.a.s.ShardStateAction] [node-0] [restored_testsnapshotrestore][2] received shard failed for shard id [[restored_testsnapshotrestore][2]], allocation id [047LokEjR7uqcnxlWWhagw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[restored_testsnapshotrestore][2]: Recovery failed from {node-1}{bz86vogHQX2P5MZCat7_Kw}{H1NJ5aCmQMqsneFDSR1OMw}{127.0.0.1}{127.0.0.1:41051}{testattr=test} into {node-0}{B_toSUhESeOkFQbbAZqE3g}{9DTa331OTBe8sF7tcDNIMQ}{127.0.0.1}{127.0.0.1:34998}{testattr=test}]; nested: RemoteTransportException[[node-1][127.0.0.1:41051][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: ExecutionCancelledException[recovery was canceled reason [shard is closed]]; ]
org.elasticsearch.indices.recovery.RecoveryFailedException: [restored_testsnapshotrestore][2]: Recovery failed from {node-1}{bz86vogHQX2P5MZCat7_Kw}{H1NJ5aCmQMqsneFDSR1OMw}{127.0.0.1}{127.0.0.1:41051}{testattr=test} into {node-0}{B_toSUhESeOkFQbbAZqE3g}{9DTa331OTBe8sF7tcDNIMQ}{127.0.0.1}{127.0.0.1:34998}{testattr=test}
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:75) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:617) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][127.0.0.1:41051][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] prepare target for translog failed
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:177) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
Caused by: org.elasticsearch.common.util.CancellableThreads$ExecutionCancelledException: recovery was canceled reason [shard is closed]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler$1.onCancel(RecoverySourceHandler.java:106) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:129) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
	... 5 more
	Suppressed: java.lang.IllegalStateException: Future got interrupted
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.prepareForTranslogOperations(RemoteRecoveryTargetHandler.java:83) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$prepareTargetForTranslog$6(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:105) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.prepareTargetForTranslog(RecoverySourceHandler.java:411) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:126) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:54) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:135) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:132) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1545) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
	Caused by: java.lang.InterruptedException
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) ~[?:1.8.0_131]
		at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ~[?:1.8.0_131]
		at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:251) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:94) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:42) ~[elasticsearch-6.0.0-alpha3-SNAPSHOT.jar:6.0.0-alpha3-SNAPSHOT]
		... 18 more

@martijnvg
Copy link
Member

This test hasn't after: 9040f44 got pushed. This commit changes to test to wait for green state before running upgrade. I noticed that in one of these build failures that shard header reported that the upgrade command was send to 0 shards. (total was 1 and successful was 1). If this test doesn't fail over the weekend then I will close this issue.

@colings86 colings86 added the :Core/Infra/Core Core issues without another label label Jul 25, 2017
@talevy
Copy link
Contributor

talevy commented Aug 9, 2017

happened again: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+bwc-tests/42/console

gradle :qa:full-cluster-restart:v5.4.3#upgradedClusterTestRunner -Dtests.seed=C9DE4FDD8C148B00 -Dtests.class=org.elasticsearch.upgrades.FullClusterRestartIT -Dtests.method="testSearch" -Dtests.security.manager=true -Dtests.locale=sr-Latn-BA -Dtests.timezone=BST
  1> [2017-08-09T16:47:20,196][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: before test
  1> [2017-08-09T16:47:20,222][INFO ][o.e.u.FullClusterRestartIT] health api response: {number_of_pending_tasks=1, cluster_name=full-cluster-restart, active_shards=1, active_primary_shards=1, unassigned_shards=0, delayed_unassigned_shards=0, timed_out=false, relocating_shards=0, initializing_shards=0, task_max_waiting_in_queue_millis=0, number_of_data_nodes=2, number_of_in_flight_fetch=0, active_shards_percent_as_number=100.0, status=green, number_of_nodes=2}
  1> [2017-08-09T16:47:20,223][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search
  1> [2017-08-09T16:47:20,258][INFO ][o.e.u.FullClusterRestartIT] Found 2491 in old index
  1> [2017-08-09T16:47:20,258][INFO ][o.e.u.FullClusterRestartIT] --> testing basic search with sort
  1> [2017-08-09T16:47:20,291][INFO ][o.e.u.FullClusterRestartIT] --> testing exists filter
  1> [2017-08-09T16:47:20,321][INFO ][o.e.u.FullClusterRestartIT] --> testing _all search
  1> [2017-08-09T16:47:21,035][INFO ][o.e.u.FullClusterRestartIT] upgrade api response: {_shards={total=1, failed=0, successful=0}, upgraded_indices={}}
  1> [2017-08-09T16:47:21,090][INFO ][o.e.u.FullClusterRestartIT] [testSearch]: after test
FAILURE 0.91s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError
   > 	at __randomizedtesting.SeedInfo.seed([C9DE4FDD8C148B00:5B67DA88C80DD3AF]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:559)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:167)
   > 	at java.lang.Thread.run(Thread.java:748)
  1> [2017-08-09T16:47:21,106][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: before test
  1> [2017-08-09T16:47:21,124][INFO ][o.e.u.FullClusterRestartIT] [testAliasWithBadName]: after test

@danielmitterdorfer
Copy link
Member

And again in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+bwc-tests/45/console

FAILURE 1.00s | FullClusterRestartIT.testSearch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError
   > 	at __randomizedtesting.SeedInfo.seed([F32820552B31394A:6191B5006F2861E5]:0)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.assertUpgradeWorks(FullClusterRestartIT.java:559)
   > 	at org.elasticsearch.upgrades.FullClusterRestartIT.testSearch(FullClusterRestartIT.java:167)
   > 	at java.lang.Thread.run(Thread.java:745)

The failing assertion checks whether versions of upgraded indices are present (i.e. not null). Attached are the cluster logs from the 5.3.3 on which the test has failed.

cluster-logs.zip

@martijnvg
Copy link
Member

I see two things in the cluster logs:

  • A lot of restore snaphot errors in node0
  • In node1 I see: [2017-08-10T11:48:56,881][INFO ][o.e.i.e.Engine ] [node-1] [testsearch][0] starting segment upgrade upgradeOnlyAncientSegments=false but not finished segment upgrade, which should be there if I read the code in InternalEngine#forceMerge(...) (line 1459) correctly.

I think both observations can be the reason why this test fails.

@cbuescher
Copy link
Member

@nik9000 nik9000 assigned nik9000 and unassigned martijnvg Aug 15, 2017
@nik9000
Copy link
Member

nik9000 commented Aug 15, 2017

So these tests are against the _upgrade API which document as not being useful any more. I think these were just ported from the old tests. I don't think we need to keep these tests because we don't advise folks to use the API anymore anyway.

@nik9000
Copy link
Member

nik9000 commented Aug 15, 2017

I'll open a PR to remove the failing test in a bit. If we end up with some other mystery error in the bwc tests we should keep this issue in mind, but I don't think it is worth spending a ton of time hunting this issue given that the API should go anyway.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Aug 15, 2017
Our documentation for the API is:
```
The _upgrade API is no longer useful and will be removed.
Instead, see Reindex to upgrade.
```

Given that, I don't think we need to test the API anymore.

Closes elastic#25311
nik9000 added a commit that referenced this issue Aug 15, 2017
Our documentation for the API is:
```
The _upgrade API is no longer useful and will be removed.
Instead, see Reindex to upgrade.
```

Given that, I don't think we need to test the API anymore.

Closes #25311
nik9000 added a commit that referenced this issue Aug 15, 2017
Our documentation for the API is:
```
The _upgrade API is no longer useful and will be removed.
Instead, see Reindex to upgrade.
```

Given that, I don't think we need to test the API anymore.

Closes #25311
nik9000 added a commit that referenced this issue Aug 15, 2017
Our documentation for the API is:
```
The _upgrade API is no longer useful and will be removed.
Instead, see Reindex to upgrade.
```

Given that, I don't think we need to test the API anymore.

Closes #25311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI v6.0.0-beta1
Projects
None yet
Development

No branches or pull requests