Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unskip enrichments tests #171983

Merged
merged 13 commits into from
Jan 12, 2024
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,7 @@ import { enableRiskEngine } from '../../tasks/entity_analytics';
const CURRENT_HOST_RISK_LEVEL = 'Current host risk level';
const ORIGINAL_HOST_RISK_LEVEL = 'Original host risk level';

// FLAKY: https://github.com/elastic/kibana/issues/169154
describe.skip('Enrichment', { tags: ['@ess', '@serverless'] }, () => {
describe('Enrichment', { tags: ['@ess', '@serverless'] }, () => {
before(() => {
cy.task('esArchiverUnload', 'risk_scores_new');
cy.task('esArchiverUnload', 'risk_scores_new_updated');
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,6 @@
},
"settings": {
"index": {
"lifecycle": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide an explanation for the issue that caused the test failures, and why removing these ILM settings fixes them? Other developers (myself included; this one was tricky) would benefit greatly from your findings here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding, is that ILM is not supported in serverless, which mean they not supported in cypress tests, I put some related PR/error in description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea, that maybe it was partially fixed by this PR, when EA tests were separated to a different execution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkhristinin it looks like link you provided to the error has expired. Can you please post something more permanent here for posterity? That way if this test fails again, we can validate whether the hypothesized ILM error is to blame.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE the ILM hypothesis: in my investigation I saw failures in ESS as well, so I think that invalidates ILM as the underlying issue.

I also don't think ILM explains the behavior I was observing: the test fails due to the presence of unexpected data (specifically, historical risk scores).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, what was with the link, but I changed it to the link of failing build in this pr.

The serverless tests are failing because of ILM, I am not sure if it's related to your initial investigation, or if it's a new problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not encountered that ILM error during my investigation, and since it's specific to serverless it cannot be the root cause of #169154.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not the root cause of those issues. After more deep look into that, this is my understanding of the timeline:

We did have some flaky tests, which you investigate in this PR

After some time, EA tests were moved to separate executions

When I started to work on those tests, there were constant test failures in serverless, not flaky tests, which this PR if fixed.

Then I ran several flaky test runs, which were successful for enrichments, but had failures not related to this test case:

x100 - errors related to dashboard

x200 - errors related to dashboard

And then I saw that there was one more x200 test run, which had 1 error (test 175) for enrichment tests, but this error also not related to original issues, but something that happened in before all hook

Original error:

AssertionError: Timed out retrying after 150000ms: Expected not to find content: 'Original host risk level' within the element: <div.euiFlexGroup.StyledEuiFlexGroup-sc-1az7ye2-2.chTqHU.css-1a15qtm-euiFlexGroup-responsive-none-flexStart-center-row> but continuously found it.
at Context.eval (webpack:///./e2e/entity_analytics/enrichments.cy.ts:129:69)

New error:

CypressError: cy.task('esArchiverUnload') timed out after waiting 60000ms.
https://on.cypress.io/api/task
Because this error occurred during a before all hook we are skipping the remaining tests in the current suite: Enrichment

Conclusion:
We have not yet seen that original issue, but there were other flaky tests, so I will continue to run the flaky test runner, and we should have more details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the last test run confirmed that it still flaky -

"name": "ml_host_risk_score_latest_default",
"rollover_alias": "ml_host_risk_score_latest_default"
},
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down Expand Up @@ -83,10 +79,6 @@
},
"settings": {
"index": {
"lifecycle": {
"name": "ml_host_risk_score_default",
"rollover_alias": "ml_host_risk_score_default"
},
"mapping": {
"total_fields": {
"limit": "10000"
Expand Down