Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Entity Analytics]WIP: determining cypress test flake #169714

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rylnd
Copy link
Contributor

@rylnd rylnd commented Oct 24, 2023

Seeing if this is a timing issue, or whether data from another test is to blame.

Relates to #169154.

Seeing if this is a timing issue, or whether data from another test is
to blame.
@rylnd
Copy link
Contributor Author

rylnd commented Oct 24, 2023

This should reduce the time/noise in the flaky test runner, but not
running other tests means these should definitely pass.
@rylnd rylnd changed the title [Security Solution][Entity Analytics]WIP: giving load/unload a little more time, and run only this test [Security Solution][Entity Analytics]WIP: determining cypress test flake Oct 25, 2023
@rylnd
Copy link
Contributor Author

rylnd commented Oct 25, 2023

https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3712 was all green, but upon closer inspection of @jpdjere 's flaky run it looks like the tests only failed legitimately 2/150 times.

I'm going to run this one more time (well, 150 more times) to see if I can't reproduce the failure in isolation like this: follow along here

@rylnd
Copy link
Contributor Author

rylnd commented Oct 25, 2023

Previous test run succeeded (with one random failure unrelated to the above issue). HOWEVER, taking an even closer look at @jpdjere 's flaky run it appears that the failing test there is NOT the one that had been skipped 🤷‍♂️ .

I think this invalidates the above run. I'm going to run both tests in this file, and see how the 150 runs behave: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3734

@rylnd
Copy link
Contributor Author

rylnd commented Oct 25, 2023

No (legit/expected) failures on the isolated EA FTR run; running again with all risk engine cypress tests to see if we can't get a failure: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3744

@rylnd
Copy link
Contributor Author

rylnd commented Oct 26, 2023

We had another 2/150 legit failures on the "run all EA cypress tests build".

I'm now adding some data guards to the failing tests and rerunning them. If these pass, it will confirm that it's data from other tests causing the issue. At that point, we'll either just keep the guards (good) or try to track down the contaminating tests (better).

@rylnd
Copy link
Contributor Author

rylnd commented Oct 26, 2023

The above tests did not fail, which is a good sign. Since the failure rate is so low, though (1/75), I'm running them another 200 times to try and surface an error: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3768

rylnd added 2 commits November 7, 2023 15:39
 Conflicts:
	x-pack/test/security_solution_cypress/cypress/e2e/entity_analytics/enrichments.cy.ts
	x-pack/test/security_solution_cypress/package.json
@rylnd
Copy link
Contributor Author

rylnd commented Nov 7, 2023

Tests failed above, so we're not quite there. It occurred to me in the interim, however, that this behavior we're seeing may not just be due to old risk scores, but also due to alerts containing risk enrichments. Based on that theory, I'm going to try another run that additionally deletes alerts. If those pass, I'll probably keep the potentially-unnecessary data guards prior to this as "just in case" test setup.

New run: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/3952

It was removed in elastic#170636, and appears not to have been replaced.
@kibana-ci
Copy link
Collaborator

kibana-ci commented Nov 8, 2023

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Serverless Security Cypress Tests #1 / Enrichment Custom query rule from legacy risk scores Should has enrichment fields from legacy risk Should has enrichment fields from legacy risk
  • [job] [logs] Serverless Security Cypress Tests #1 / Enrichment Custom query rule from legacy risk scores Should has enrichment fields from legacy risk Should has enrichment fields from legacy risk
  • [job] [logs] FTR Configs #68 / EPM Endpoints Install endpoint package install should have installed the [endpoint.metadata_current-default] transform

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@rylnd
Copy link
Contributor Author

rylnd commented Nov 22, 2023

Tests continue to fail, seemingly due to the presence of "old" risk score data on alerts. However, after deleting all alerts AND all risk score data before each test, they continue to fail. I'm stumped as to what's going on here, I'm going to have to rope in @nkhristinin for help as the original author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants