Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to stabilise tests by checking for cluster health early #7184

Merged
merged 6 commits into from
Sep 27, 2023

Conversation

pebrc
Copy link
Collaborator

@pebrc pebrc commented Sep 26, 2023

Attempt to fix #7172

@pebrc pebrc added :ci Things related to Continuous Integration, automation and releases >test Related to unit/integration/e2e tests labels Sep 26, 2023
@pebrc
Copy link
Collaborator Author

pebrc commented Sep 26, 2023

buildkite test this -f p=kind,DEPLOYER_KIND_NODE_IMAGE=kindest/node:v1.28.0@sha256:b7a4cad12c197af3ba43202d3efe03246b3f0793f162afb40a33c923952d5b31,t=TestSystemIntegrationRecipe

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 35 to 37
// check cluster health here already to make sure any dependent builders can rely on a ready ES
// this is special casing Elasticsearch a little bit in the interest of test stability see https://github.com/elastic/cloud-on-k8s/issues/7172
return clusterHealthIs(b, k, esv1.ElasticsearchGreenHealth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we already have a dedicated step, no real advantage compared to your proposal except that we would have a dedicated entry in the logs:

=== RUN   TestAgentConfigRef/Elasticsearch_cluster_should_be_created
Retries (5m0s timeout): .
    --- PASS: TestAgentConfigRef/Elasticsearch_cluster_should_be_created (0.57s)
=== RUN   TestAgentConfigRef/ES_cluster_health_should_eventually_be_green
Retries (5m0s timeout): ......
    --- PASS: TestAgentConfigRef/ES_cluster_health_should_eventually_be_green (41.40s)
=== RUN   TestAgentConfigRef/Creating_an_Agent_should_succeed
    --- PASS: TestAgentConfigRef/Creating_an_Agent_should_succeed (1.48s)
Suggested change
// check cluster health here already to make sure any dependent builders can rely on a ready ES
// this is special casing Elasticsearch a little bit in the interest of test stability see https://github.com/elastic/cloud-on-k8s/issues/7172
return clusterHealthIs(b, k, esv1.ElasticsearchGreenHealth)
return nil
}),
},
// check cluster health here already to make sure any dependent builders can rely on a ready ES
// this is special casing Elasticsearch a little bit in the interest of test stability see https://github.com/elastic/cloud-on-k8s/issues/7172
CheckClusterHealth(b, k),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my motivation was to have this check in the create steps. While the existing check runs in the check steps. By having it in the create steps I want to introduce a delay so that the Agent creation (and other resource creations) does not run before ES is ready. I could remove the dedicated step from the check steps to avoid the duplication.

Copy link
Collaborator Author

@pebrc pebrc Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I misunderstood your proposal! I think that makes sense actually.

@pebrc
Copy link
Collaborator Author

pebrc commented Sep 27, 2023

buildkite test this -f E2E_TAGS=es,t=*

@pebrc
Copy link
Collaborator Author

pebrc commented Sep 27, 2023

buildkite test this -f E2E_TAGS=es,p=gke

@thbkrkr
Copy link
Contributor

thbkrkr commented Sep 27, 2023

We have e2e tests where ES does not start voluntarily (see TestForceUpgrade*). Perhaps a boolean SkipInitialCheckClusterHealth in the ES builder to conditionally add the check is enough to cover this.

@pebrc
Copy link
Collaborator Author

pebrc commented Sep 27, 2023

buildkite test this -f t=TestForceUpgrade,p=gke
Screenshot 2023-09-27 at 16 53 35

@pebrc
Copy link
Collaborator Author

pebrc commented Sep 27, 2023

buildkite test this -f t=TestRemoteCluster,p=gke

@pebrc pebrc requested a review from thbkrkr September 27, 2023 14:52
@pebrc
Copy link
Collaborator Author

pebrc commented Sep 27, 2023

buildkite test this -f t=TestRemoteCluster,p=gke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ci Things related to Continuous Integration, automation and releases >test Related to unit/integration/e2e tests v2.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES_data_should_pass_validations is (once again) flaky
3 participants