-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES_data_should_pass_validations is (once again) flaky #7172
Comments
Trying to get some interesting logs: {
"log.level": "info",
"@timestamp": "2023-09-18T23:17:40.892Z",
"log.origin": {
"file.name": "coordinator/coordinator.go",
"file.line": 989
},
"message": "Unit state changed log-default-logfile-system-835850a7-3f6b-4ae9-8d02-a9a15767cf39 (STARTING->HEALTHY): Healthy",
"log": {
"source": "elastic-agent"
},
"component": {
"id": "log-default",
"state": "HEALTHY"
},
"unit": {
"id": "log-default-logfile-system-835850a7-3f6b-4ae9-8d02-a9a15767cf39",
"type": "input",
"state": "HEALTHY",
"old_state": "STARTING"
},
"ecs.version": "1.6.0"
} Looks like Agent was happy at that point? |
Logs from the ES clusters that show that
|
I ran the test locally on Kind, it always passes.
If I ran The questions then are:
|
Could this be an initialisation problem where the data stream is not created in time before the first data arrives and then ES auto-creates an index instead? But our e2e test checks for the data stream and so it fails? |
On a successful local run creating index, cause [initialize_data_stream], templates [logs], shards [1]/[1]:
on a failed e2e run creating index, cause [auto(bulk api)], templates [], shards [1]/[1]:
Elastic Agent is using the default logs index template which should be installed by default through x-pack:
What I don't understand is why Elasticsearch accepts the bulk request before it has initialised the default index templates. We are still using 8.9.0 for these tests (pending #7166) |
Copying info from Slack: it's important for the process doing the indexing to include the Although Elasticsearch could in theory implement some special-case handling for the handful of templates that it installs shortly after startup, it's still important to set Relates https://github.com/elastic/cloud/issues/106249. |
Bah, sorry, it turns out that the |
@DaveCTurner thanks for you help (as always!) I think in the short term we could look into somehow minimising the chance of this happening somehow in the tests. It is tricky as we treat the Agent test builder and the ES test builder as completely independent. One idea I have is to sort the test builders to always run the ES one first and maybe add an additional check to the ES one for the default index templates. |
Yeah that's the best idea I can think of right now. The additional check in ES should be something like |
Add an immediate health check after creation to delay subsequent k8s resource creations to a point when ES is available. The main motivation is test stability in cases where Beats would send data immediately without waiting for proper ES initialisation see #7172 Add an opt-out mechanism for tests where we know the health check cannot succeed. TODO: it is unclear at this point why the remote cluster test fails if we wait for cluster 1 to come up first. I will create a follow up issue
This is to improve test stability see #7172
ES_data_should_pass_validations
is failing in several tests:🐞 TestMultipleOutputConfig ocp
🐞 TestAgentConfigRef kind-1-27-1
🐞 TestMultiOutputRecipe kind-1-27-1
🐞 TestMultiOutputRecipe eks-arm
🐞 TestAgentConfigRef kind-1-26-4
🐞 TestAgentConfigRef aks
I did not investigate all the cases but in at least 2 of them
logs-elastic_agent-default
is reported in the ES diagnostic:The text was updated successfully, but these errors were encountered: