Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed parallel script for cypress tests in QA and buildkite #169311

Merged
merged 49 commits into from
Nov 6, 2023

Conversation

dkirchan
Copy link
Contributor

@dkirchan dkirchan commented Oct 18, 2023

Summary

A new parallel script is introduced, specifically for QA - Serverless environment and Cypress tests of security solution.
To be extended for:

  • Prod
  • Dev-Test
  • Potentially to be working with FTR tests.

A new target is created in package.json of security_solution_cypress in order to run the tests. With the introduced parallel script, the following steps are handled by the script during the test runtime.

  • Create Environment
  • Reset Credentials
  • Delete Environment

TEST RUNTIME
With this change any new test development can be directly tested by the kibana serverless pipeline providing the branch/fork name and the commit hash in case a fork is under test.

FOR LOCAL RUN
The developer needs to have an API key configured for the QA environment. It can either live in ~/.elastic/cloud.json file or be provided as an env var :
API_KEY=... yarn run cypress:run:qa:serverless:parallel

If the credentials of the required environment are needed to be crosschecked then run the yarn target with the DEBUG env var:
DEBUG=1 yarn run cypress:run:qa:serverless:parallel

As mentioned above, at the time being, the only environment where we run the suites and this script is QA.

Succesful Buildkite run for serverless tests and the specific test functionality

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@dkirchan dkirchan requested a review from a team October 18, 2023 20:27
@dkirchan dkirchan added release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.11.0 v8.12.0 labels Oct 18, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@dkirchan dkirchan self-assigned this Oct 18, 2023
@dkirchan dkirchan force-pushed the security/dkirchan-create-envs branch from a59a310 to 3a1327e Compare October 18, 2023 20:33
@dkirchan dkirchan force-pushed the security/dkirchan-create-envs branch 7 times, most recently from 1431a92 to 9dc4e91 Compare October 19, 2023 15:43
Copy link
Member

@jbudz jbudz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.buildkite LGTM

@dkirchan dkirchan force-pushed the security/dkirchan-create-envs branch from 244f289 to a53ab3a Compare October 22, 2023 15:26
@dkirchan dkirchan requested a review from a team as a code owner October 23, 2023 09:08
Copy link
Contributor

@maximpn maximpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkirchan thank you for addressing my comments and making the script better 👍

I tested locally and it works as expected. The only problem it takes a lot of time to run the tests. On top of that I left some extra comments.

Overall the PR looks like almost finalized. I approve it in advance to unblock.

product: response.data.type,
};
} catch (error) {
log.error(`${error}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log an error.message instead of implicit error.toString()? It's not transparent what's error.toString() outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with d25ab4e

});
log.info(`Environment ${projectName} was successfully deleted!`);
} catch (error) {
log.error(`${error}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log an error.message instead of implicit error.toString()? It's not transparent what's error.toString() outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with d25ab4e

username: response.data.username,
};
} catch (error) {
throw new Error(`${error}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you throw new Error(error.message) instead of implicit error.toString()? It's not transparent what's error.toString() outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with d25ab4e

throw new Error(`${runnerId} - ${error}`);
},
retries: 50,
factor: 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can have a delay before attempting to fetch the status. As far as I know an MKI env takes around 6 minutes to be up and running. A delay can be a minute or two or the other amount of time we certain about.

Additionally we can "play" with params like factor, minTimeout and maxTimeout to find an optimal approach. Most probably a linear or fixed attempt interval with a delay before will work better.

It doesn't have to be addresses in this PR, just a general though.

@@ -14,18 +14,17 @@ export default defineCypressConfig({
reporterOptions: {
configFile: './cypress/reporter_config.json',
},
defaultCommandTimeout: 150000,
defaultCommandTimeout: 300000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the timeout was increased?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MadameSheema can you respond this?

@@ -27,6 +27,7 @@
"cypress:investigations:run:serverless": "yarn cypress:serverless --spec './cypress/e2e/investigations/**/*.cy.ts'",
"cypress:explore:run:serverless": "yarn cypress:serverless --spec './cypress/e2e/explore/**/*.cy.ts'",
"cypress:changed-specs-only:serverless": "yarn cypress:serverless --changed-specs-only --env burn=5",
"cypress:burn:serverless": "yarn cypress:serverless --env burn=5"
"cypress:burn:serverless": "yarn cypress:serverless --env burn=5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let't set burn to 2 instead of 5. It should be enough to verify the tests doesn't fail due to artefacts left.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with d25ab4e

@@ -0,0 +1,11 @@
steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often to we plan to run it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On demand, when triggered by the quality gate, maybe in different PRs..... Not yet strictly defined

agents:
queue: n2-4-spot
timeout_in_minutes: 300
parallelism: 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 6 as parallelism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the test suites to non explore/investigations and these two categories on their own.

@MadameSheema
Copy link
Member

Note that once this PR is merged, there is more work we need to do. We want to merge this PR as it is because we are not breaking any existing or new flow and if we continue working on it is going to become a huge/monster PR and we may face the risk of having huge conflicts. On following PRs:

  • We'll continue stabilizing our Cypress tests on MKI to make them more robust and reliable.
  • We'll continue improving our pipeline
  • We'll work on the requirements we need to meet in order to have the quality gate in the release pipeline
  • Once everything is stabilized and we feel ready, we'll integrate our quality gate with the release pipeline

@MadameSheema MadameSheema enabled auto-merge (squash) November 6, 2023 16:47
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #48 / EPM Endpoints EPM - list list api tests lists all limited packages from the registry

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 472 478 +6

Total ESLint disabled count

id before after diff
securitySolution 540 546 +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @dkirchan

@MadameSheema MadameSheema merged commit ed4ef2a into main Nov 6, 2023
@MadameSheema MadameSheema deleted the security/dkirchan-create-envs branch November 6, 2023 17:21
@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.11 Backport failed because of merge conflicts

You might need to backport the following PRs to 8.11:
- [Security Solution] Unskip and enable for Serverless shared_exception_lists_management Cypress tests (#169182)
- [Security Solution] fix cypress config to run all tests (#169942)
- [Security Solution] Adding serverlessQA tag (#167494)

Manual backport

To create the backport manually run:

node scripts/backport --pr 169311

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.