Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] QueueResizingEsThreadPoolExecutorTests testAutoQueueSizingWithMin #68063

Closed
albertzaharovits opened this issue Jan 27, 2021 · 8 comments
Closed
Assignees
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI

Comments

@albertzaharovits
Copy link
Contributor

Fails rarely (around twice a month), also on 6.8.

Build scan: 6.8 https://gradle-enterprise.elastic.co/s/tkp5mb4wbn34m

Repro line: RUNTIME_JAVA_HOME=$JAVA11_HOME JAVA_HOME=$JAVA11_HOME ./gradlew --no-daemon ':server:unitTest' -Dtests.class=org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutorTests -Dtests.method="testAutoQueueSizingWithMin" -Dcompiler.java=11 -Druntime.java=8

Reproduces locally?: No

Applicable branches: 6.8, 7.x

Failure history:
https://build-stats.elastic.co/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now%2Fy,mode:quick,to:now%2Fy))&_a=(columns:!(_source),index:b646ed00-7efc-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:'QueueResizingEsThreadPoolExecutorTests%20testAutoQueueSizingWithMin'),sort:!(process.time-start,desc))
Failure excerpt:

        QueueResizingEsThreadPoolExecutorTests.testAutoQueueSizingWithMin <<< FAILURES!	
   > Throwable #1: java.lang.AssertionError: 	
   > Expected: <4998>	
   >      but: was <5048>	
   > 	at __randomizedtesting.SeedInfo.seed([61AE4B8BE950603C:221EDAD98BB25B82]:0)	
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)	
   > 	at org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutorTests.lambda$testAutoQueueSizingWithMin$6(QueueResizingEsThreadPoolExecutorTests.java:150)	
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:906)	
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:880)	
   > 	at org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutorTests.testAutoQueueSizingWithMin(QueueResizingEsThreadPoolExecutorTests.java:149)	
   > 	at java.lang.Thread.run(Thread.java:748)	
   > 	Suppressed: java.lang.AssertionError: 	
   > Expected: <4998>	
   >      but: was <5048>	
   > 		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)	
   > 		at org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutorTests.lambda$testAutoQueueSizingWithMin$6(QueueResizingEsThreadPoolExecutorTests.java:150)	
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:894)	
   > 		... 39 more	
   > 	Suppressed: java.lang.AssertionError: 	
   > Expected: <4998>	
   >      but: was <5048>	
   > 		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)	
   > 		at org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutorTests.lambda$testAutoQueueSizingWithMin$6(QueueResizingEsThreadPoolExecutorTests.java:150)	
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:894)	
   > 		... 39 more	
   > 	Suppressed: java.lang.AssertionError: 	
   > Expected: <4998>	
   >      but: was <5048>	
   > 		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)	
   > 		at org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutorTests.lambda$testAutoQueueSizingWithMin$6(QueueResizingEsThreadPoolExecutorTests.java:150)	
   > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:894)	
   > 		... 39 more	
   > 	Suppressed: java.lang.AssertionError: 
...
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jan 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@jakelandis jakelandis added :Core/Infra/Core Core issues without another label and removed :Core/Features/Features Team:Data Management Meta label for data/management team labels Jan 28, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jan 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@jaymode jaymode self-assigned this Feb 10, 2021
@jaymode
Copy link
Member

jaymode commented Mar 16, 2021

This has surfaced a race condition in the production code. It is reproducible by running the test many times with a small measure window value. The bug relates to the resetting of the values in the finally block of QueueResizingEsThreadPoolExecutor#afterExecute. The bug will not reproduce with debug logging easily since it is a race condition (maybe an async log4j2 appender would help here?). From my brief investigation, I see that the queue does indeed have its size reduced and then has its size increased by 50 and is never decreased again. @dakrone curious if you have any thoughts or insights?

@dakrone
Copy link
Member

dakrone commented Mar 24, 2021

I think we should actually deprecate and remove the queue resizing portion of this. It turned out to never really be used in production, and was a precursor to adaptive replica selection.

@rjernst
Copy link
Member

rjernst commented Mar 25, 2021

@dakrone Do you mean remove the fixed_auto_queue_size threadpool type?

@dakrone
Copy link
Member

dakrone commented Mar 25, 2021

@rjernst yes, I think we still can use the actual class (it captures timings necessary for adaptive replica selection), but remove the automatic queue resizing parts and configuration for that aspect of it.

@dakrone
Copy link
Member

dakrone commented May 5, 2021

Relates to #72779

@rjernst
Copy link
Member

rjernst commented Sep 8, 2021

Closing as a duplicate of #71476

@rjernst rjernst closed this as completed Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants