Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] improve TransformRestTestCase robustness #55786

Merged
merged 1 commit into from
Apr 27, 2020

Conversation

hendrikmuhs
Copy link

handles/retries temporary SearchPhaseExecutionErrors

fixes #54810

@hendrikmuhs hendrikmuhs added >test Issues or PRs that are addressing/adding tests v8.0.0 :ml/Transform Transform v7.8.0 labels Apr 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@droberts195
Copy link
Contributor

I wonder if the root cause of the failure is that

doesn't wait for hidden indices? The cluster health check in
Request request = new Request("GET", "/_cluster/health");
doesn't specify any indices options. This might explain why the failures started when hidden indices were introduced.

@hendrikmuhs
Copy link
Author

hendrikmuhs commented Apr 27, 2020

I wonder if the root cause of the failure is that


doesn't wait for hidden indices? The cluster health check in

I found this:

https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/admin/cluster/health/ClusterHealthRequest.java#L42

Looks like hidden indices are part of the default.
@gwbrown can you confirm?

Regarding ensureNoInitializingShards: I moved this in the last attempt to fix this bug:

df44a82

@hendrikmuhs
Copy link
Author

The test started failing march 26th, this is 6 days after the notification index was made hidden.

(The audit logging was added february 18th.)

The logs unfortunately do not show a full picture, because we lack when ensureNoInitializingShards has been called. We could add logging.

The cluster health reported in the logs, turns green after the search phase execution exception:

     [2020-04-24T09:29:35,602][INFO ][o.e.c.r.a.AllocationService] [integTest-0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.transform-notifications-000002][0]]])." previous.health="YELLOW" reason="shards started [[.transform-notifications-000002][0]]"

This indicates that the cluster health indeed might have been green at the time ensureNoInitializingShards has been called.

LBNL, the failure only happens for testDeleteConfigurationLeftOver, but there are many more tests using TransformRestTestCase. The test is special in the sense that it does not interact much with the transform API but simulates a corner case situation: It indexes a document and checks whether transform DELETE is able to delete the doc. I think there is only 1 audit message in total, send late by the delete call. The audit message is indexed async, so maybe it is indeed a timing issue of parallel threads on a slow host (I was never able to reproduce this issue on my laptop).

Instead of the assertBusy loop I added in the log forwarder, I could add a busy loop to check, that the notification index exists, which is a valid assumption for TransformRestTestCase.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like hidden indices are part of the default

In that case I am happy to work around this in the test code, so this PR LGTM

@hendrikmuhs hendrikmuhs merged commit 029a925 into elastic:master Apr 27, 2020
@hendrikmuhs hendrikmuhs deleted the ci-#54810 branch April 27, 2020 15:16
hendrikmuhs pushed a commit that referenced this pull request Apr 27, 2020
handles/retries temporary SearchPhaseExecutionErrors

fixes #54810
hendrikmuhs pushed a commit that referenced this pull request May 14, 2020
handles/retries temporary SearchPhaseExecutionErrors

fixes #54810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform >test Issues or PRs that are addressing/adding tests v7.7.1 v7.8.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] TransformConfigurationIndexIT.testDeleteConfigurationLeftOver failures
4 participants