Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexShardIT#testPendingRefreshWithIntervalChange fails #39565

Closed
javanna opened this issue Mar 1, 2019 · 6 comments · Fixed by #45025 or #40387
Closed

IndexShardIT#testPendingRefreshWithIntervalChange fails #39565

javanna opened this issue Mar 1, 2019 · 6 comments · Fixed by #45025 or #40387
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI

Comments

@javanna
Copy link
Member

javanna commented Mar 1, 2019

IndexShardIT#testPendingRefreshWithIntervalChange fails on both master and 7.x. Failure does not seem to reproduce with the same seed.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/365/console

./gradlew :server:integTest -Dtests.seed=170A6D9A6824A5B4 -Dtests.class=org.elasticsearch.index.shard.IndexShardIT -Dtests.method="testPendingRefreshWithIntervalChange" -Dtests.security.manager=true -Dtests.locale=ar-TN -Dtests.timezone=Africa/Libreville -Dcompiler.java=11 -Druntime.java=8
05:20:11 FAILURE 0.17s J3 | IndexShardIT.testPendingRefreshWithIntervalChange <<< FAILURES!
05:20:11    > Throwable #1: java.lang.AssertionError
05:20:11    > 	at __randomizedtesting.SeedInfo.seed([DF23BE067FF20969:44A0A86926C59400]:0)
05:20:11    > 	at org.elasticsearch.index.shard.IndexShardIT.testPendingRefreshWithIntervalChange(IndexShardIT.java:771)
05:20:11    > 	at java.lang.Thread.run(Thread.java:748)
@javanna javanna added the >test-failure Triaged test failures from CI label Mar 1, 2019
@javanna
Copy link
Member Author

javanna commented Mar 1, 2019

@s1monw I wonder if #39462 has something to do with this failure which appeared yesterday, approximately around its merge time.

@javanna javanna added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Mar 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@javanna
Copy link
Member Author

javanna commented Mar 1, 2019

I muted this test on both master and 7.x

@dnhatn dnhatn assigned dnhatn and unassigned s1monw Mar 18, 2019
@dnhatn
Copy link
Member

dnhatn commented Mar 18, 2019

I am able to reproduce this. I will work on the fix.

dnhatn added a commit that referenced this issue Mar 19, 2019
dnhatn added a commit that referenced this issue Mar 25, 2019
dnhatn added a commit that referenced this issue Mar 27, 2019
If a refresh, which is scheduled by the setting change, executes after
the index-2 operation and win the refresh race (i.e., maybeRefresh) with
the scheduledRefresh that we are going to check, then the latter will
return false.

Closes #39565
Relates #39462

PR #40387
dnhatn added a commit that referenced this issue Apr 4, 2019
If a refresh, which is scheduled by the setting change, executes after
the index-2 operation and win the refresh race (i.e., maybeRefresh) with
the scheduledRefresh that we are going to check, then the latter will
return false.

Closes #39565
Relates #39462

PR #40387
@andrershov
Copy link
Contributor

The test failed again https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+part1/332/console. It's only one new test failure, not muting the test to gather more stats. @dnhatn can you look at the failure, please?

@andrershov andrershov reopened this Jul 19, 2019
@dnhatn
Copy link
Member

dnhatn commented Jul 29, 2019

dnhatn added a commit that referenced this issue Aug 1, 2019
Previously, we use ThreadPoolStats to ensure that the scheduledRefresh
triggered by the internal refresh setting update is executed before we
index a new document. With that change (#40387), this test did not fail for 
the last 3 months. However, using ThreadPoolStats is not entirely watertight
as both "active" and "queue" count can be 0 in a very small interval
when ThreadPoolExecutor pulls a task from the queue but before marking
the corresponding worker as active (i.e., lock it).

Closes #39565
dnhatn added a commit that referenced this issue Aug 1, 2019
Previously, we use ThreadPoolStats to ensure that the scheduledRefresh
triggered by the internal refresh setting update is executed before we
index a new document. With that change (#40387), this test did not fail for 
the last 3 months. However, using ThreadPoolStats is not entirely watertight
as both "active" and "queue" count can be 0 in a very small interval
when ThreadPoolExecutor pulls a task from the queue but before marking
the corresponding worker as active (i.e., lock it).

Closes #39565
dnhatn added a commit that referenced this issue Aug 20, 2019
Previously, we use ThreadPoolStats to ensure that the scheduledRefresh
triggered by the internal refresh setting update is executed before we
index a new document. With that change (#40387), this test did not fail for
the last 3 months. However, using ThreadPoolStats is not entirely watertight
as both "active" and "queue" count can be 0 in a very small interval
when ThreadPoolExecutor pulls a task from the queue but before marking
the corresponding worker as active (i.e., lock it).

Closes #39565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI
Projects
None yet
5 participants