-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Collect ideas for reducing flakiness and increasing speed of Cypress tests #153655
Comments
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
There were some ideas posted in #141069, some related to flakiness and performance, and some to maintainability:
|
on the same note, I would add that leveraging test/isolation when possible would drastically lower test run time. This requires to make sure each test cleans after itself to go back to the start that it was in when it started. This guarantees that each test can be run individually. |
We could also add that running the each test multiple times locally can bring to light flakiness. We can leverage Cypress'n times method to do that. In the past I've used this and ran tests 25, 50 or even 100 times (which can be very time consuming for developers..., but the higher the more confident we are about non-flakiness) |
Here some thoughts about the presented ideas:
For the cases where we need to have alerts generated, we should check if is faster to create the rule enabled by default or to create the rule disabled and then enable it.
100% agree with this since this is making some tests painfully slow.
100% agree
100% agree
100% agree. I would also add: To minimize the use of es_archiver. Try to use the default data (the one that is loaded at the very beginning), if this set of data is not enough, create a really small archive, the smaller the better (1 doc or 2 if possible).
100% agree.
100% agree. Following a bit above @PhilippeOberti's comment. Since the Cypress upgrade version done in x-mas, Cypress by default cleans the browser state forcing us to re-visit the page. Revisiting the page each time is a huge cost from the execution perspective. There are some specific situations/casuistics where we don't need to clean the state of the browser each time, and the test will still be isolated/independent without having to reload the page again and again. Happy to discuss this point if you don't agree :)
Yup! yup!! :) @PhilippeOberti regarding flakiness, I believe it is better if we use the flaky test runner. @banderror this is what I wrote on the Cypress readme about these topics:
To all the above I would also add:
|
I would add a third idea following these 2, that we should prefer to load an alerts index in most cases where we want to test alerts functionality, such as opening/closing alerts, viewing the details, etc, instead of creating and running a real rule to generate the alerts. There are a couple benefits to this approach: (1) running the rules is slow, so we'd speed things up, (2) reducing the number of steps in each test also makes the tests easier to read/maintain/debug when something goes wrong, and (3) the UI has to support alerts from previous versions anyway, so having a static archive of alerts from some old 8.x version could help verify our UI supports old alerts. We have some ES Archives of alerts from legacy version already, and somewhere (I'll have to search for it later) there's a tool Madi created to generate more of these fixtures for different Kibana versions. Separately, I would like to reduce the usage of |
I would first check which way is faster, the If we want to test legacy versions, absolutely that we need an archive here, in this case, as I said previously, ideally with just one doc to minimize the loading time.
I agree with this. The use of |
@PhilippeOberti Could you expand on this a bit? Are you saying we should disable test isolation ( |
@PhilippeOberti Thanks, that's an interesting suggestion. I guess one would primarily use it locally when working on a test to make sure it's stable? Do you mind sharing a code example? How the development workflow would look like compared to using the flaky test runner? |
@MadameSheema Definitely can't say I agree haha, but curious to learn more about your thoughts on that. |
@MadameSheema Yeah, and creating an enabled rule right away would be faster. What I meant is that the test should explicitly specify that this rule is enabled before calling the create endpoint. Pseudo-code: createCustomQueryRule({
name: 'Will run right away',
enabled: true,
}); So most of the tests would not specify it thus getting a disabled rule by default: createCustomQueryRule({
name: 'Will be disabled',
}); |
@MadameSheema +++, I'll add it to the list. There's a good old issue related to that, search for "archives" and "Large input data sets".
Agree 👍 Will add it to the list.
I'd measure the impact of
90% agree with that, except I'd say "most of the behavior should be covered by integration tests" 😬 Having 10000 unit tests won't make your app stable if it's not well architected and actual business logic in integration is not covered by tests. Some thoughts on that from the guy who created React Testing Library: https://kentcdodds.com/blog/the-testing-trophy-and-testing-classifications |
@marshallmain ++++, fully agree if we're talking about the features you mentioned. I'd add one more benefit: if rule execution logic breaks, these tests won't fail with a false negative result, and will still be testing the alerts management functionality. I think this approach is quite universal: if you need alert documents to test the alerts table, generate fake alerts instead of creating and running rules that would generate real alerts; if you need to test the rules table, create fake prebuilt rules instead of installing a Fleet package with real prebuilt rules; especially don't use UI for that, because that's how you start testing areas of the app completely irrelevant for the given test. Which increases flake and false negatives. Regarding ES archiver, agree with @MadameSheema that by default we should avoid it. In the case of alerts generation for data setup, we could have a bunch of fake alerts in the code, they would be strongly typed because we have the schema, and would be indexed from the code. In some cases like testing the app with historical alerts data, we could leverage ES archives. Don't personally have a strong opinion on Thanks all for your suggestions ❤️ |
you are absolutely correct, in terms of reducing flakiness, using Just want to make sure we're taking that into account :) |
@PhilippeOberti Oh, I didn't realize it's possible to override it for individual tests! Is this how you do it? And yeah, this optimization could make sense for some of the tests -- could you share a few examples where it reduced the test execution time? |
I've only used it at the top level |
This isn't necessarily ideal but it's a quick way to check if a test is flaky or not. In the past job we were requiring a screenshot in the description or each PR that was purely focusing on Cypress tests, to show that the tests have successfully ran a bunch of times with no failure... a bit like I did for this PR. To use it it's pretty simple, just wrap a specific test as follows:
this would run the test 50 times. Not saying we should use this, it's just a nice, quick and easy way to verify things. |
Thank you for your suggestions @PhilippeOberti, I think it all makes sense 👍 I'll add both to the list. |
Just adding a note that testing on our M1 Mac's could provide alternate results to what CI does, so it might be worth providing guidance around using |
Additional thoughts:
|
@banderror Perhaps refactoring some code to make it easier to test with unit tests could be considered. |
When working with Prebuilt Rules on Cypress tests, avoid installing the whole Instead:
All of this logic has been encapsulated into a single utility function that can be used: |
Avoid usage of While its usage should be safe to use, it has caused flakyness issues in unrelated test suites which need data loaded into indeces through the |
Pinging @elastic/security-detection-engine (Team:Detection Engine) |
Epic: #153633
Summary
Our E2E Cypress tests continue to be flaky and are slow both locally and on CI. Let's approach this problem by first creating a list of ideas for what we can do to fix this.
Please add your ideas in the comments. I'll be adding them to the list below.
I'll start with a list of ideas we wrote down while planning the 8.8 release cycle.
Ideas
data-test-subj
attributes instead of by class names, ids, and tags.before
/beforeEach
instead ofafter
/afterEach
blocks.The text was updated successfully, but these errors were encountered: