[CI] Test Failure in org.elasticsearch.client.BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff #41324

original-brownbear · 2019-04-18T06:15:15Z

This is a result of #40866

We're running into https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/1107/console

  2> REPRODUCE WITH: ./gradlew :server:integTest --tests "org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff" -Dtests.seed=A9EF087F2D4CFAB -Dtests.security.manager=true -Dtests.locale=sr-Latn-RS -Dtests.timezone=America/Martinique -Dcompiler.java=12 -Druntime.java=8
  2> java.lang.AssertionError: Unexpected failure
        at __randomizedtesting.SeedInfo.seed([A9EF087F2D4CFAB:5EF9783BACF11D86]:0)
        at org.elasticsearch.action.bulk.BulkProcessorRetryIT.executeBulkRejectionLoad(BulkProcessorRetryIT.java:143)
        at org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff(BulkProcessorRetryIT.java:69)

        Caused by:
        RemoteTransportException[[node_s0][127.0.0.1:40957][indices:data/write/bulk]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@61f4536b on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1494a6e[Running, pool size = 1, active threads = 1, queued tasks = 30, completed tasks = 525]]];

            Caused by:
            org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@61f4536b on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30,

The problem here is that by moving the bulk requests' initial handling to the WRITE pool entirely we lost the bulk processor's retry functionality for the case of the initial request getting rejected outright.
We will need to discuss whether to find a way to retry this after-all or adjust the test I think.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-04-18T06:15:16Z

Pinging @elastic/es-distributed

original-brownbear · 2019-04-18T06:15:37Z

I will mute this for now, until we're clear on how to solve this.

* For elastic#41324

* For #41324

…1325) * For elastic#41324

…41331) * For #41324

* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for elastic#40866 which now might lead to an outright rejection of the request instead of its items individually * Fixed by adding retry functionality to the top level request as well * Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did * closes elastic#41324

* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually * Fixed by adding retry functionality to the top level request as well * Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did * closes #41324

* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for elastic#40866 which now might lead to an outright rejection of the request instead of its items individually * Fixed by adding retry functionality to the top level request as well * Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did * closes elastic#41324

* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually * Fixed by adding retry functionality to the top level request as well * Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did * closes #41324

davidkyle · 2019-04-30T15:25:47Z

This test failed again on master

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+artifactory/610/console

java.lang.AssertionError: Unexpected failure
	at __randomizedtesting.SeedInfo.seed([809C23D4D17E7B70:D4FBAB688F5BA95D]:0)
	at org.elasticsearch.action.bulk.BulkProcessorRetryIT.executeBulkRejectionLoad(BulkProcessorRetryIT.java:143)
	at org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff(BulkProcessorRetryIT.java:69)

...

Caused by: RemoteTransportException[[node_s0][127.0.0.1:44805][indices:data/write/bulk]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@16966335 on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@4990ff71[Running, pool size = 1, active threads = 1, queued tasks = 30, completed tasks = 1268]]];
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@16966335 on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@4990ff71[Running, pool size = 1, active threads = 1, queued tasks = 30, completed tasks = 1268]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:84)

Does not reproduce:

./gradlew :server:integTest --tests "org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff" \
  -Dtests.seed=809C23D4D17E7B70 \
  -Dtests.security.manager=true \
  -Dtests.locale=be-BY \
  -Dtests.timezone=America/Matamoros \
  -Dcompiler.java=12 \
  -Druntime.java=11

* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active * Fixes elastic#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well

original-brownbear · 2019-04-30T17:57:37Z

fix incoming in #41700 (reproducible by simply lowering the queue size in this test's settings to 1, but the failure was very unlikely for queue size 30)

* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active * Fixes #41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well

* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active * Fixes elastic#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well

…1325) * For elastic#41324

* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for elastic#40866 which now might lead to an outright rejection of the request instead of its items individually * Fixed by adding retry functionality to the top level request as well * Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did * closes elastic#41324

* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active * Fixes elastic#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well

* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active * Fixes #41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well

original-brownbear added :Distributed Coordination/Network Http and internode communication implementations >test-failure Triaged test failures from CI labels Apr 18, 2019

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 18, 2019

Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff

a097398

* For elastic#41324

original-brownbear mentioned this issue Apr 18, 2019

Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff #41325

Merged

original-brownbear self-assigned this Apr 18, 2019

original-brownbear added a commit that referenced this issue Apr 18, 2019

Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (#41325)

d29fa1a

* For #41324

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 18, 2019

Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (elastic#4…

05d64b1

…1325) * For elastic#41324

original-brownbear mentioned this issue Apr 18, 2019

Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (#41325) #41331

Merged

original-brownbear added a commit that referenced this issue Apr 18, 2019

Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (#41325) (#…

389a13b

…41331) * For #41324

original-brownbear mentioned this issue Apr 18, 2019

Fix BulkProcessor Retry Functionality #41338

Merged

original-brownbear closed this as completed in #41338 Apr 18, 2019

original-brownbear mentioned this issue Apr 24, 2019

Fix BulkProcessor Retry ITs (#41338) #41472

Merged

davidkyle reopened this Apr 30, 2019

original-brownbear mentioned this issue Apr 30, 2019

Fix BulkProcessorRetryIT #41700

Merged

original-brownbear closed this as completed in #41700 May 1, 2019

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019

Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (elastic#4…

b82409d

…1325) * For elastic#41324

original-brownbear mentioned this issue May 28, 2019

Fix BulkProcessorRetryIT (#41700) #42618

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Test Failure in org.elasticsearch.client.BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff #41324

[CI] Test Failure in org.elasticsearch.client.BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff #41324

original-brownbear commented Apr 18, 2019

elasticmachine commented Apr 18, 2019

original-brownbear commented Apr 18, 2019

davidkyle commented Apr 30, 2019

original-brownbear commented Apr 30, 2019

[CI] Test Failure in org.elasticsearch.client.BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff #41324

[CI] Test Failure in org.elasticsearch.client.BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff #41324

Comments

original-brownbear commented Apr 18, 2019

elasticmachine commented Apr 18, 2019

original-brownbear commented Apr 18, 2019

davidkyle commented Apr 30, 2019

original-brownbear commented Apr 30, 2019