Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransformRobustnessIT.testTaskRemovalAfterInternalIndexGotDeleted fails rarely #51347

Closed
henningandersen opened this issue Jan 23, 2020 · 5 comments · Fixed by #51406, #51523 or #52360
Closed
Assignees
Labels
:ml/Transform Transform >test-failure Triaged test failures from CI

Comments

@henningandersen
Copy link
Contributor

henningandersen commented Jan 23, 2020

Looks like this test was introduced on Jan 17th and has since failed in two 7.6 builds and a number of PR builds.

I investigated the build failure here (which is a PR build), but did not come to a conclusion on why this failed.

It happens during @After waitForDataFrame, where it calls wipeTransforms that searches .transform-internal-004 to check that it is empty, but gets an "all shards failed" error:

18:13:33   2> org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:34165], URI [.transform-internal-004/_search], status line [HTTP/1.1 503 Service Unavailable]
18:13:33     {"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[]},"status":503}
18:13:33         at __randomizedtesting.SeedInfo.seed([7FED7A091BC346F9:E0CF7958A02C5009]:0)
18:13:33         at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)
18:13:33         at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261)
18:13:33         at org.elasticsearch.client.RestClient.performRequest(RestClient.java:267)
18:13:33         at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
18:13:33         at org.elasticsearch.xpack.transform.integration.TransformRestTestCase.wipeTransforms(TransformRestTestCase.java:407)
18:13:33         at org.elasticsearch.xpack.transform.integration.TransformRestTestCase.waitForDataFrame(TransformRestTestCase.java:367)

The reproduction line did not reproduce locally:

./gradlew ':x-pack:plugin:transform:qa:single-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.transform.integration.TransformRobustnessIT.testTaskRemovalAfterInternalIndexGotDeleted" -Dtests.seed=7FED7A091BC346F9 -Dtests.security.manager=true -Dtests.locale=ar-SA -Dtests.timezone=Etc/GMT+6 -Dcompiler.java=13
@henningandersen henningandersen added >test-failure Triaged test failures from CI :ml/Transform Transform labels Jan 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@hendrikmuhs
Copy link

My best guess at the moment:

  1. The test removes the internal index (this is what the test is about)
  2. the index gets re-created automatically via a cluster state listener
  3. At the time we fire the search the index is in the process of re-creation, meaning shards are not initialized yet,
  4. we get a search phase execution failure with shard failures

Solution: call ensureNoInitializingShards() after step 1 or after the test.

@hendrikmuhs hendrikmuhs self-assigned this Jan 23, 2020
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Jan 24, 2020
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Jan 24, 2020
hendrikmuhs pushed a commit that referenced this issue Jan 24, 2020
ensure the cluster is not in some intermediate state when cleaning up.

fixes #51347
hendrikmuhs pushed a commit that referenced this issue Jan 24, 2020
ensure the cluster is not in some intermediate state when cleaning up.

fixes #51347
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Jan 24, 2020
ensure the cluster is not in some intermediate state when cleaning up.

fixes elastic#51347
hendrikmuhs pushed a commit that referenced this issue Jan 24, 2020
ensure the cluster is not in some intermediate state when cleaning up.

fixes #51347
@hendrikmuhs
Copy link

Still an issue, #51406 did not fix it: https://gradle-enterprise.elastic.co/s/jwniyzbhw4nje

@hendrikmuhs hendrikmuhs reopened this Jan 28, 2020
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Jan 28, 2020
hendrikmuhs pushed a commit that referenced this issue Jan 28, 2020
… 2 (#51523)

add wait for completion in transform robustness test to avoid occasional test failures during cleanup

fixes #51347
hendrikmuhs pushed a commit that referenced this issue Jan 28, 2020
… 2 (#51523)

add wait for completion in transform robustness test to avoid occasional test failures during cleanup

fixes #51347
hendrikmuhs pushed a commit that referenced this issue Jan 28, 2020
… 2 (#51523)

add wait for completion in transform robustness test to avoid occasional test failures during cleanup

fixes #51347
@dnhatn
Copy link
Member

dnhatn commented Feb 13, 2020

@dnhatn dnhatn reopened this Feb 13, 2020
@hendrikmuhs
Copy link

I investigated this issue. It's not the same, it's different. This time it fails in cleanup of TransformRestTestCase: it expects an empty index but finds a state doc. This state doc is probably written by the task after the index got deleted.

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 14, 2020
hendrikmuhs pushed a commit that referenced this issue Feb 16, 2020
… by the (#52360)

delete the transform to delete any docs which might have been written by the task after deleting
the index

fixes #51347
hendrikmuhs pushed a commit that referenced this issue Feb 16, 2020
… by the (#52360)

delete the transform to delete any docs which might have been written by the task after deleting
the index

fixes #51347
hendrikmuhs pushed a commit that referenced this issue Feb 16, 2020
… by the (#52360)

delete the transform to delete any docs which might have been written by the task after deleting
the index

fixes #51347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment