Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Fix EQL cypress tests #80440

Merged
merged 8 commits into from
Oct 20, 2020

Conversation

rylnd
Copy link
Contributor

@rylnd rylnd commented Oct 14, 2020

Summary

This EQL suite was previously skipped. While these were skipped, a bug was introduced in elasticsearch that broke EQL rules. This bug should be fixed in elastic/elasticsearch#63573, which should fix these tests, but let's see if CI disagrees.

Checklist

For maintainers

@rylnd rylnd added Team:SIEM v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:Detections and Resp Security Detection Response Team labels Oct 14, 2020
@rylnd rylnd self-assigned this Oct 14, 2020
@rylnd rylnd marked this pull request as ready for review October 14, 2020 21:36
@rylnd rylnd requested review from a team as code owners October 14, 2020 21:36
@elasticmachine
Copy link
Contributor

Pinging @elastic/siem (Team:SIEM)

@rylnd rylnd marked this pull request as draft October 14, 2020 21:45
@rylnd
Copy link
Contributor Author

rylnd commented Oct 14, 2020

Moving back to Draft as it looks like there were some suspicious failures on the 7.x branch. Going to try and repro/fix locally.

rylnd added 5 commits October 16, 2020 19:42
These _should_ be fixed with the latest ES on master, but let's see if
CI disagrees.
Occasionally our tests hit a scenario where the rule has executed (its
status is "succeeded"), but the generated alerts have not populated in
the same time frame. In this case the test fails oddly, saying that the
"alert count" element is not there when it is.

I attempted to improve the error message by using a .should() with a
callback, but that lead to even stranger behavior as the .should() would
fail once (expected), and then not be able to find the element a second
time. :(

So we instead focus on fixing the real problem, here: wait until alerts
populate (have a non-zero count) before performing the assertion.
Because the page will not update automatically, we can't rely on
cypress' retryability and must instead assert, click Refresh, and assert
again, much like we're doing while waiting for the rule to execute. And
like `waitForTheRuleToBeExecuted`, we're using a while loop that has no
guarantee of ever exiting :(
* Uses should with a text matcher instead of using invoke('text')
* Use of not.equal between a string and an element may have been a false
  positive
We have a few tasks that require polling for some background work to be
completed. The basic form is: assert the byproduct, or refresh the page
and try again.

We were previously doing this with a while loop, which was not
guaranteed to ever complete, leading to cryptic failures if the process
ever hung.

Instead, this implements a safer polling mechanism with a definite
termination similar to the cypress-wait-until plugin.
* Do not automatically refresh the page
  * This is only necessary if we're not in the state we need. The
    `waitFor` helper functions automatically reload whatever needs to be
    reloaded, so we're delegating this task to them.
* Ensure we wait for alerts to be nonzero before our assertion
  * Otherwise we get some strange behavior around this field's
    availability; see previous commits
@rylnd rylnd marked this pull request as ready for review October 19, 2020 17:31
@rylnd
Copy link
Contributor Author

rylnd commented Oct 19, 2020

@elasticmachine merge upstream

@rylnd
Copy link
Contributor Author

rylnd commented Oct 19, 2020

@MadameSheema I ended up implementing a waitUntil function in order to get rid of those unbounded while loops. This may have made some other (non-EQL) tests less flaky as well, but I haven't been able to verify that.

Copy link
Member

@MadameSheema MadameSheema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of thanks for this fix @rylnd! Great work :)

rylnd added 2 commits October 19, 2020 13:28
Threat Match Rules introduced an additional query input, causing our
CUSTOM_QUERY_INPUT to be ambiguous.

However, instead of failing due to the ambiguity, the behavior of
cypress seems to be to pass! While I haven't yet tracked down the cause
of these false positives, disambiguating these selectors is the
immediate fix.
@rylnd
Copy link
Contributor Author

rylnd commented Oct 19, 2020

@MadameSheema I think that the behavior causing the above test failures is also present on master; however on master it leads to a passing test! I believe I've fixed the immediate problem in 24748c9, but we need to diagnose and prevent whatever is causing these false positives in the abstract.

@rylnd
Copy link
Contributor Author

rylnd commented Oct 19, 2020

How the false positive appears in cypress:open: security_solution

in cypress:run:
_file____Users_ryland_code_elastic_kibana_x-pack_plugins_security_solution_public_cases_containers_use_get_tags_test_tsx

@rylnd
Copy link
Contributor Author

rylnd commented Oct 20, 2020

Ok, I figured out the cause of the false positive: native promises. I found this issue that seemed to describe the behavior we were seeing, and sure enough, commenting out our use of waitForTheRuleToBeExecuted caused the error to propagate into a failure.

Because waitForTheRuleToBeExecuted was an async function, those native promises were causing all this weird behavior. Since I've updated that function on this branch, we saw the expected failure.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@rylnd rylnd merged commit b7ffefb into elastic:master Oct 20, 2020
@rylnd rylnd deleted the fix_eql_cypress branch October 20, 2020 16:44
rylnd added a commit to rylnd/kibana that referenced this pull request Oct 20, 2020
* Unskip EQL tests

These _should_ be fixed with the latest ES on master, but let's see if
CI disagrees.

* Wait until alerts have populated on Rule Details

Occasionally our tests hit a scenario where the rule has executed (its
status is "succeeded"), but the generated alerts have not populated in
the same time frame. In this case the test fails oddly, saying that the
"alert count" element is not there when it is.

I attempted to improve the error message by using a .should() with a
callback, but that lead to even stranger behavior as the .should() would
fail once (expected), and then not be able to find the element a second
time. :(

So we instead focus on fixing the real problem, here: wait until alerts
populate (have a non-zero count) before performing the assertion.
Because the page will not update automatically, we can't rely on
cypress' retryability and must instead assert, click Refresh, and assert
again, much like we're doing while waiting for the rule to execute. And
like `waitForTheRuleToBeExecuted`, we're using a while loop that has no
guarantee of ever exiting :(

* More robust cypress assertions

* Uses should with a text matcher instead of using invoke('text')
* Use of not.equal between a string and an element may have been a false
  positive

* Perform cypress loops in a manner guaranteed to exit

We have a few tasks that require polling for some background work to be
completed. The basic form is: assert the byproduct, or refresh the page
and try again.

We were previously doing this with a while loop, which was not
guaranteed to ever complete, leading to cryptic failures if the process
ever hung.

Instead, this implements a safer polling mechanism with a definite
termination similar to the cypress-wait-until plugin.

* Update other specs that are asserting on alerts

* Do not automatically refresh the page
  * This is only necessary if we're not in the state we need. The
    `waitFor` helper functions automatically reload whatever needs to be
    reloaded, so we're delegating this task to them.
* Ensure we wait for alerts to be nonzero before our assertion
  * Otherwise we get some strange behavior around this field's
    availability; see previous commits

* Remove unused import

* Fix false positive in Rule Creation specs

Threat Match Rules introduced an additional query input, causing our
CUSTOM_QUERY_INPUT to be ambiguous.

However, instead of failing due to the ambiguity, the behavior of
cypress seems to be to pass! While I haven't yet tracked down the cause
of these false positives, disambiguating these selectors is the
immediate fix.

Co-authored-by: Kibana Machine <[email protected]>
# Conflicts:
#	x-pack/plugins/security_solution/cypress/integration/alerts_detection_rules_eql.spec.ts
rylnd added a commit that referenced this pull request Oct 20, 2020
* Unskip EQL tests

These _should_ be fixed with the latest ES on master, but let's see if
CI disagrees.

* Wait until alerts have populated on Rule Details

Occasionally our tests hit a scenario where the rule has executed (its
status is "succeeded"), but the generated alerts have not populated in
the same time frame. In this case the test fails oddly, saying that the
"alert count" element is not there when it is.

I attempted to improve the error message by using a .should() with a
callback, but that lead to even stranger behavior as the .should() would
fail once (expected), and then not be able to find the element a second
time. :(

So we instead focus on fixing the real problem, here: wait until alerts
populate (have a non-zero count) before performing the assertion.
Because the page will not update automatically, we can't rely on
cypress' retryability and must instead assert, click Refresh, and assert
again, much like we're doing while waiting for the rule to execute. And
like `waitForTheRuleToBeExecuted`, we're using a while loop that has no
guarantee of ever exiting :(

* More robust cypress assertions

* Uses should with a text matcher instead of using invoke('text')
* Use of not.equal between a string and an element may have been a false
  positive

* Perform cypress loops in a manner guaranteed to exit

We have a few tasks that require polling for some background work to be
completed. The basic form is: assert the byproduct, or refresh the page
and try again.

We were previously doing this with a while loop, which was not
guaranteed to ever complete, leading to cryptic failures if the process
ever hung.

Instead, this implements a safer polling mechanism with a definite
termination similar to the cypress-wait-until plugin.

* Update other specs that are asserting on alerts

* Do not automatically refresh the page
  * This is only necessary if we're not in the state we need. The
    `waitFor` helper functions automatically reload whatever needs to be
    reloaded, so we're delegating this task to them.
* Ensure we wait for alerts to be nonzero before our assertion
  * Otherwise we get some strange behavior around this field's
    availability; see previous commits

* Remove unused import

* Fix false positive in Rule Creation specs

Threat Match Rules introduced an additional query input, causing our
CUSTOM_QUERY_INPUT to be ambiguous.

However, instead of failing due to the ambiguity, the behavior of
cypress seems to be to pass! While I haven't yet tracked down the cause
of these false positives, disambiguating these selectors is the
immediate fix.

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
rylnd added a commit that referenced this pull request Oct 20, 2020
* Unskip EQL tests

These _should_ be fixed with the latest ES on master, but let's see if
CI disagrees.

* Wait until alerts have populated on Rule Details

Occasionally our tests hit a scenario where the rule has executed (its
status is "succeeded"), but the generated alerts have not populated in
the same time frame. In this case the test fails oddly, saying that the
"alert count" element is not there when it is.

I attempted to improve the error message by using a .should() with a
callback, but that lead to even stranger behavior as the .should() would
fail once (expected), and then not be able to find the element a second
time. :(

So we instead focus on fixing the real problem, here: wait until alerts
populate (have a non-zero count) before performing the assertion.
Because the page will not update automatically, we can't rely on
cypress' retryability and must instead assert, click Refresh, and assert
again, much like we're doing while waiting for the rule to execute. And
like `waitForTheRuleToBeExecuted`, we're using a while loop that has no
guarantee of ever exiting :(

* More robust cypress assertions

* Uses should with a text matcher instead of using invoke('text')
* Use of not.equal between a string and an element may have been a false
  positive

* Perform cypress loops in a manner guaranteed to exit

We have a few tasks that require polling for some background work to be
completed. The basic form is: assert the byproduct, or refresh the page
and try again.

We were previously doing this with a while loop, which was not
guaranteed to ever complete, leading to cryptic failures if the process
ever hung.

Instead, this implements a safer polling mechanism with a definite
termination similar to the cypress-wait-until plugin.

* Update other specs that are asserting on alerts

* Do not automatically refresh the page
  * This is only necessary if we're not in the state we need. The
    `waitFor` helper functions automatically reload whatever needs to be
    reloaded, so we're delegating this task to them.
* Ensure we wait for alerts to be nonzero before our assertion
  * Otherwise we get some strange behavior around this field's
    availability; see previous commits

* Remove unused import

* Fix false positive in Rule Creation specs

Threat Match Rules introduced an additional query input, causing our
CUSTOM_QUERY_INPUT to be ambiguous.

However, instead of failing due to the ambiguity, the behavior of
cypress seems to be to pass! While I haven't yet tracked down the cause
of these false positives, disambiguating these selectors is the
immediate fix.

Co-authored-by: Kibana Machine <[email protected]>
# Conflicts:
#	x-pack/plugins/security_solution/cypress/integration/alerts_detection_rules_eql.spec.ts
spalger pushed a commit that referenced this pull request Oct 20, 2020
* Unskip EQL tests

These _should_ be fixed with the latest ES on master, but let's see if
CI disagrees.

* Wait until alerts have populated on Rule Details

Occasionally our tests hit a scenario where the rule has executed (its
status is "succeeded"), but the generated alerts have not populated in
the same time frame. In this case the test fails oddly, saying that the
"alert count" element is not there when it is.

I attempted to improve the error message by using a .should() with a
callback, but that lead to even stranger behavior as the .should() would
fail once (expected), and then not be able to find the element a second
time. :(

So we instead focus on fixing the real problem, here: wait until alerts
populate (have a non-zero count) before performing the assertion.
Because the page will not update automatically, we can't rely on
cypress' retryability and must instead assert, click Refresh, and assert
again, much like we're doing while waiting for the rule to execute. And
like `waitForTheRuleToBeExecuted`, we're using a while loop that has no
guarantee of ever exiting :(

* More robust cypress assertions

* Uses should with a text matcher instead of using invoke('text')
* Use of not.equal between a string and an element may have been a false
  positive

* Perform cypress loops in a manner guaranteed to exit

We have a few tasks that require polling for some background work to be
completed. The basic form is: assert the byproduct, or refresh the page
and try again.

We were previously doing this with a while loop, which was not
guaranteed to ever complete, leading to cryptic failures if the process
ever hung.

Instead, this implements a safer polling mechanism with a definite
termination similar to the cypress-wait-until plugin.

* Update other specs that are asserting on alerts

* Do not automatically refresh the page
  * This is only necessary if we're not in the state we need. The
    `waitFor` helper functions automatically reload whatever needs to be
    reloaded, so we're delegating this task to them.
* Ensure we wait for alerts to be nonzero before our assertion
  * Otherwise we get some strange behavior around this field's
    availability; see previous commits

* Remove unused import

* Fix false positive in Rule Creation specs

Threat Match Rules introduced an additional query input, causing our
CUSTOM_QUERY_INPUT to be ambiguous.

However, instead of failing due to the ambiguity, the behavior of
cypress seems to be to pass! While I haven't yet tracked down the cause
of these false positives, disambiguating these selectors is the
immediate fix.

Co-authored-by: Kibana Machine <[email protected]>
# Conflicts:
#	x-pack/plugins/security_solution/cypress/integration/alerts_detection_rules_eql.spec.ts
(cherry picked from commit 3fc1f8c)
@MindyRS MindyRS added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Sep 22, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM v7.10.0 v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants