Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Test Failure in org.elasticsearch.client.BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff #41324

Closed
original-brownbear opened this issue Apr 18, 2019 · 4 comments · Fixed by #41338 or #41700
Assignees
Labels
:Distributed Coordination/Network Http and internode communication implementations >test-failure Triaged test failures from CI

Comments

@original-brownbear
Copy link
Member

This is a result of #40866

We're running into https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/1107/console

  2> REPRODUCE WITH: ./gradlew :server:integTest --tests "org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff" -Dtests.seed=A9EF087F2D4CFAB -Dtests.security.manager=true -Dtests.locale=sr-Latn-RS -Dtests.timezone=America/Martinique -Dcompiler.java=12 -Druntime.java=8
  2> java.lang.AssertionError: Unexpected failure
        at __randomizedtesting.SeedInfo.seed([A9EF087F2D4CFAB:5EF9783BACF11D86]:0)
        at org.elasticsearch.action.bulk.BulkProcessorRetryIT.executeBulkRejectionLoad(BulkProcessorRetryIT.java:143)
        at org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff(BulkProcessorRetryIT.java:69)

        Caused by:
        RemoteTransportException[[node_s0][127.0.0.1:40957][indices:data/write/bulk]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@61f4536b on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1494a6e[Running, pool size = 1, active threads = 1, queued tasks = 30, completed tasks = 525]]];

            Caused by:
            org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@61f4536b on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30, 

The problem here is that by moving the bulk requests' initial handling to the WRITE pool entirely we lost the bulk processor's retry functionality for the case of the initial request getting rejected outright.
We will need to discuss whether to find a way to retry this after-all or adjust the test I think.

@original-brownbear original-brownbear added :Distributed Coordination/Network Http and internode communication implementations >test-failure Triaged test failures from CI labels Apr 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member Author

I will mute this for now, until we're clear on how to solve this.

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 18, 2019
@original-brownbear original-brownbear self-assigned this Apr 18, 2019
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 18, 2019
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 18, 2019
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for elastic#40866 which now might lead to an outright rejection of the request instead of its items individually
   * Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes elastic#41324
original-brownbear added a commit that referenced this issue Apr 18, 2019
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually
   * Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes #41324
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 24, 2019
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for elastic#40866 which now might lead to an outright rejection of the request instead of its items individually
   * Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes elastic#41324
original-brownbear added a commit that referenced this issue Apr 24, 2019
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually
   * Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes #41324
@davidkyle
Copy link
Member

This test failed again on master

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+artifactory/610/console

java.lang.AssertionError: Unexpected failure
	at __randomizedtesting.SeedInfo.seed([809C23D4D17E7B70:D4FBAB688F5BA95D]:0)
	at org.elasticsearch.action.bulk.BulkProcessorRetryIT.executeBulkRejectionLoad(BulkProcessorRetryIT.java:143)
	at org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff(BulkProcessorRetryIT.java:69)

...

Caused by: RemoteTransportException[[node_s0][127.0.0.1:44805][indices:data/write/bulk]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@16966335 on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@4990ff71[Running, pool size = 1, active threads = 1, queued tasks = 30, completed tasks = 1268]]];
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.InboundHandler$RequestHandler@16966335 on EsThreadPoolExecutor[name = node_s0/write, queue capacity = 30, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@4990ff71[Running, pool size = 1, active threads = 1, queued tasks = 30, completed tasks = 1268]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:84)

Does not reproduce:

./gradlew :server:integTest --tests "org.elasticsearch.action.bulk.BulkProcessorRetryIT.testBulkRejectionLoadWithBackoff" \
  -Dtests.seed=809C23D4D17E7B70 \
  -Dtests.security.manager=true \
  -Dtests.locale=be-BY \
  -Dtests.timezone=America/Matamoros \
  -Dcompiler.java=12 \
  -Druntime.java=11

@davidkyle davidkyle reopened this Apr 30, 2019
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Apr 30, 2019
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes elastic#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
@original-brownbear
Copy link
Member Author

fix incoming in #41700 (reproducible by simply lowering the queue size in this test's settings to 1, but the failure was very unlikely for queue size 30)

original-brownbear added a commit that referenced this issue May 1, 2019
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes #41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
akhil10x5 pushed a commit to akhil10x5/elasticsearch that referenced this issue May 2, 2019
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes elastic#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for elastic#40866 which now might lead to an outright rejection of the request instead of its items individually
   * Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes elastic#41324
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes elastic#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue May 28, 2019
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes elastic#41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
original-brownbear added a commit that referenced this issue May 28, 2019
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes #41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >test-failure Triaged test failures from CI
Projects
None yet
3 participants