[CI] IndexShardIT » testMaybeFlush #52223

hendrikmuhs · 2020-02-11T17:25:55Z

failed on master:

https://gradle-enterprise.elastic.co/s/izpnat3y374v4

expected:<0> but was:<3>
java.lang.AssertionError
: 
expected:<0> but was:<3>
Close stacktrace
at __randomizedtesting.SeedInfo.seed([ACB0D4A252C38F40:2DB52A569EBDD90C]:0)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at org.elasticsearch.index.shard.IndexShardIT.testMaybeFlush(IndexShardIT.java:352)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:566)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
at java.lang.Thread.run(Thread.java:834)
[2020-02-12T03:56:52,902][INFO ][o.e.i.s.IndexShardIT     ] [testMaybeFlush] before test
[2020-02-12T03:56:52,908][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s_0] [test] creating index, cause [api], templates [one_shard_index_template, random-soft-deletes-template], shards [1]/[0], mappings []
[2020-02-12T03:56:53,059][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[test][0]]]).
[2020-02-12T03:56:53,133][INFO ][o.e.c.s.IndexScopedSettings] [node_s_0] [test] updating [index.translog.flush_threshold_size] from [512mb] to [135b]
[2020-02-12T03:56:53,181][INFO ][o.e.c.s.IndexScopedSettings] [node_s_0] [test] updating [index.translog.flush_threshold_size] from [512mb] to [135b]
[2020-02-12T03:56:53,188][INFO ][o.e.c.m.MetaDataMappingService] [node_s_0] [test/1CLpUW3xTsuEeLFtC4nV7g] create_mapping
[2020-02-12T03:56:53,329][INFO ][o.e.i.s.IndexShardIT     ] [testMaybeFlush] after test
[2020-02-12T03:56:53,329][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [test/1CLpUW3xTsuEeLFtC4nV7g] deleting index

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-02-11T17:25:58Z

Pinging @elastic/es-distributed (:Distributed/Engine)

dnhatn · 2020-02-11T23:23:55Z

Relates to #51905.

Since #51905, we use the local checkpoint of the safe commit to calculate the number of uncommitted operations of a translog stats. If a periodic flush triggered by afterWriteOperation completes before we sync translog, then the last commit is not safe. We also need to sync translog from Engine instead of the translog so that we can advance the safe commit. Relates #51905 Closes #52223

astefan · 2020-02-17T14:06:18Z

This one failed again today:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_RUNTIME_JAVA=openjdk13,nodes=general-purpose/516/console
https://gradle-enterprise.elastic.co/s/do3zhzqosbde4

and the fix referenced in this issue seems to be in.

Doesn't reproduce for me though.

We aren't able to reproduce or figure out the reason that failed this test. This commit adds more assertions so we can narrow the scope. Relates #52223

Since elastic#51905, we use the local checkpoint of the safe commit to calculate the number of uncommitted operations of a translog stats. If a periodic flush triggered by afterWriteOperation completes before we sync translog, then the last commit is not safe. We also need to sync translog from Engine instead of the translog so that we can advance the safe commit. Relates elastic#51905 Closes elastic#52223

We aren't able to reproduce or figure out the reason that failed this test. This commit adds more assertions so we can narrow the scope. Relates elastic#52223

Since #51905, we use the local checkpoint of the safe commit to calculate the number of uncommitted operations of a translog stats. If a periodic flush triggered by afterWriteOperation completes before we sync translog, then the last commit is not safe. We also need to sync translog from Engine instead of the translog so that we can advance the safe commit. Relates #51905 Closes #52223

We aren't able to reproduce or figure out the reason that failed this test. This commit adds more assertions so we can narrow the scope. Relates #52223

ywelsch · 2020-03-03T08:16:44Z

@dnhatn has added more assertions here, but there haven't been any recent failures. I'll close this for now, but please report and reopen issue on any future failures.

tlrx · 2020-03-25T11:56:39Z

This test failed today on 7.x with a similar assertion error:

java.lang.AssertionError: last commit [{translog_uuid=-eXKsvspTWe4_IpuWE2ZJA, history_uuid=Q7T02qpRT3-59zTo-2JICw, local_checkpoint=2, max_seq_no=2, max_unsafe_auto_id_timestamp=-1}]
Expected: <0>
     but: was <3>
	at __randomizedtesting.SeedInfo.seed([1D69D57C6EE18EC5:9C6C2B88A29FD889]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.elasticsearch.index.shard.IndexShardIT.testMaybeFlush(IndexShardIT.java:371)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:834)

Build scan: https://gradle-enterprise.elastic.co/s/i7xtizngehx4u

It does not reproduce locally:

 ./gradlew ':server:integTest' --tests "org.elasticsearch.index.shard.IndexShardIT.testMaybeFlush" \
  -Dtests.seed=1D69D57C6EE18EC5 \
  -Dtests.security.manager=true \
  -Dtests.locale=es-US \
  -Dtests.timezone=America/Marigot \
  -Dcompiler.java=13

Relates #52223

dnhatn · 2020-04-01T03:43:55Z

We still do not have enough info for this test failure. I've enabled the debug logs of the engine in this test. I am closing this issue again. Please reopen when it fails.

gwbrown · 2020-04-01T20:30:44Z

This failed again in 7.x. Build scan, Jenkins

williamrandolph · 2020-04-02T18:38:31Z

And again, in 7.x: Build scan, Jenkins

mayya-sharipova · 2020-04-03T20:06:51Z

And again, in 7.x: Build scan, Jenkins

dimitris-athanasiou · 2020-04-06T07:49:37Z

Ang again, in 7.x: Build scan, Jenkins

dnhatn · 2020-04-06T16:57:42Z

I am looking at this issue now.

dnhatn · 2020-04-06T19:24:25Z

I've opened #54839.

IndexShardIT#testMaybeFlush relies on the assumption that the safe commit and translog deletion policy have advanced after IndexShard#sync returns . This assumption does not hold if there's a race with the global checkpoint sync. Closes #52223

hendrikmuhs added >test-failure Triaged test failures from CI :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Feb 11, 2020

dnhatn self-assigned this Feb 11, 2020

jasontedor mentioned this issue Feb 11, 2020

Add unit tests for reading JVM options files #52176

Merged

dnhatn mentioned this issue Feb 12, 2020

Fix IndexShardIT#testMaybeFlush #52247

Merged

dnhatn closed this as completed in #52247 Feb 12, 2020

astefan reopened this Feb 17, 2020

dnhatn mentioned this issue Feb 18, 2020

Adjust global checkpoint background sync condition #52444

Closed

ywelsch self-assigned this Feb 19, 2020

dnhatn mentioned this issue Feb 25, 2020

Add more assertions to testMaybeFlush #52792

Merged

dnhatn added a commit that referenced this issue Feb 25, 2020

Add more assertions to testMaybeFlush (#52792)

c02b47c

We aren't able to reproduce or figure out the reason that failed this test. This commit adds more assertions so we can narrow the scope. Relates #52223

dnhatn added a commit that referenced this issue Feb 26, 2020

Add more assertions to testMaybeFlush (#52792)

814c275

We aren't able to reproduce or figure out the reason that failed this test. This commit adds more assertions so we can narrow the scope. Relates #52223

ywelsch closed this as completed Mar 3, 2020

tlrx reopened this Mar 25, 2020

dnhatn added a commit that referenced this issue Apr 1, 2020

Enable engine debug log for testMaybeFlush

6365f71

Relates #52223

dnhatn added a commit that referenced this issue Apr 1, 2020

Enable engine debug log for testMaybeFlush

c2506af

Relates #52223

dnhatn closed this as completed Apr 1, 2020

gwbrown reopened this Apr 1, 2020

dnhatn mentioned this issue Apr 6, 2020

Update translog policy before the next safe commit #54839

Merged

dnhatn closed this as completed in #54839 Apr 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] IndexShardIT » testMaybeFlush #52223

[CI] IndexShardIT » testMaybeFlush #52223

hendrikmuhs commented Feb 11, 2020

elasticmachine commented Feb 11, 2020

dnhatn commented Feb 11, 2020

astefan commented Feb 17, 2020

ywelsch commented Mar 3, 2020

tlrx commented Mar 25, 2020 •

edited

Loading

dnhatn commented Apr 1, 2020

gwbrown commented Apr 1, 2020

williamrandolph commented Apr 2, 2020

mayya-sharipova commented Apr 3, 2020

dimitris-athanasiou commented Apr 6, 2020

dnhatn commented Apr 6, 2020

dnhatn commented Apr 6, 2020

[CI] IndexShardIT » testMaybeFlush #52223

[CI] IndexShardIT » testMaybeFlush #52223

Comments

hendrikmuhs commented Feb 11, 2020

elasticmachine commented Feb 11, 2020

dnhatn commented Feb 11, 2020

astefan commented Feb 17, 2020

ywelsch commented Mar 3, 2020

tlrx commented Mar 25, 2020 • edited Loading

dnhatn commented Apr 1, 2020

gwbrown commented Apr 1, 2020

williamrandolph commented Apr 2, 2020

mayya-sharipova commented Apr 3, 2020

dimitris-athanasiou commented Apr 6, 2020

dnhatn commented Apr 6, 2020

dnhatn commented Apr 6, 2020

tlrx commented Mar 25, 2020 •

edited

Loading