Flakiness quality gate for new tests #24009

hjetpoluru · 2024-04-12T19:27:02Z

What is this about?

Newly created tests may initially exhibit flakiness and require fine-tuning, could we introduce “check-new-flaky” CLI tool that will help specifically designed to execute them multiple times.

Scenario

Feature: Check New Flaky CLI Tool
In order to reduce the flakiness of newly created tests
As a developer
I want a CLI tool that can execute these tests multiple times to identify flakiness

Scenario: Identifying flaky tests with the CLI tool

Given a set of newly created tests
When I execute these tests using the "check-new-flaky" CLI tool
Then the tool should run the tests multiple times
And identify any tests that exhibit flakiness

Design

No response

Technical Details

@pedronfigueiredo provided below plan to achieve this task

Currently we have a script from Pedro that runs locally multiple tests
We need to convert it typescript script that modifies run-e2e-test.js to accept multiple test runs.

Collect test outputs as text file
optional: include notification nudge at the end of the test runs (needs to be modified to be cross platform)
ability to run headless
ideally we can reuse/expand test/e2e/run-e2e-test.js

Creating a GitHub action that takes a code diff and extracts all e2e test cases that were added or modified.

Run these cases X amount of times using the script in 2.

Threat Modeling Framework

No response

Acceptance Criteria

No response

Stakeholder review needed before the work gets merged

Engineering (needed in most cases)
Design
Product
QA (automation tests are required to pass before merging PRs but not all changes are covered by automation tests - please review if QA is needed beyond automation tests)
Security
Legal
Marketing
Management (please specify)
Other (please specify)

References

No response

@HowardBraham

… have been modified (#24556)  ## **Description** This PR adds a quality gate for new or modified e2e spec files. Whenever there is a PR which modifies or changes a test, this will be run more times, in order to prevent introducing a flakiness accidentally. It is done as follows: - Identifies any new or modified e2e file from inside the test/ folder using `git diff` and using these 2 filters: - `file.filename.startsWith('test/e2e/') &&` - `file.filename.endsWith('.spec.js') || file.filename.endsWith('.spec.ts') ` - Copies the given specs x5 times in the list of testpaths to execute -> this number is arbitrary, we could modify it to any value we want. The reason for taking this approach instead of changing the retrial number is to benefit of the parallelization, as @HowardBraham pointed out in a comment. - Since we already had a flag which could support the re-running successful tests, `--retry-until-failure` I just leveraged this into the `for` loop for each test, and if that testcase was identified as new/modified, the flag is added so the new tests fail fast without retrials ### Incremental git fetch depth within shallow clone We use git fetch with incremental depth as @danjm suggested. The ci environment uses a shallow clone, meaning we won't be able to succeed just by using git diff as it won't find the merge base. For fixing that, we start with a git fetch depth of 1, and keep incrementing the depth (1, 10, 100) it the error is `no merge base` up until 100. If the git diff still fails, we then do a full git fetch with the `unshallow` flag. - [Update] This is the working result with the last commit https://app.circleci.com/pipelines/github/MetaMask/metamask-extension/89269/workflows/103b78a8-8f0d-4323-96b0-9e235c4cbc81/jobs/3296802 ![Screenshot from 2024-06-26 11-39-19](https://github.com/MetaMask/metamask-extension/assets/54408225/a2a89d6a-3a73-48ba-91a3-20aeadc38573) ### New ci Job The git diff is done in a new ci job which runs at the beginning in parallel of prep-deps. [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/MetaMask/metamask-extension/pull/24556?quickstart=1) ## **Related issues** Fixes: #24009 ## **Manual testing steps** 1. Check ci runs (notice previous runs had failing and changed tests on purpose, in order to try the different scenarios described below) ## **Screenshots/Recordings** =============================================== [UPDATE with the new code changes] - 🟢 Case 1: A test has changed -> it's rerun 1+5 times and it's successful (it will be run in different buckets) https://github.com/MetaMask/metamask-extension/assets/54408225/c1456104-1f5f-4ef3-9364-4e435f8797f4 https://app.circleci.com/pipelines/github/MetaMask/metamask-extension/89277/workflows/7fce0a2e-773f-46da-8ab9-1dbec7992b58/jobs/3297267/parallel-runs/10?filterBy=ALL - 🟢 Case 2: A test has changed, but it has a mistake in the code (intentionally to simulate a flaky test) -> it fails immediately and there are no more retries. The rest of the tests, are retried if they failed as usual - Case for main test build test: https://app.circleci.com/pipelines/github/MetaMask/metamask-extension/89277/workflows/7fce0a2e-773f-46da-8ab9-1dbec7992b58/jobs/3297267/artifacts - Case for flask specific test: https://app.circleci.com/pipelines/github/MetaMask/metamask-extension/89277/workflows/7fce0a2e-773f-46da-8ab9-1dbec7992b58/jobs/3297277/artifacts - 🟢 Case 3: A PR has no test spec files changed -> nothing different happens - ci run: check current ci ## **Pre-merge author checklist** - [ ] I’ve followed [MetaMask Coding Standards](https://github.com/MetaMask/metamask-extension/blob/develop/.github/guidelines/CODING_GUIDELINES.md). - [ ] I've completed the PR template to the best of my ability - [ ] I’ve included tests if applicable - [ ] I’ve documented my code using [JSDoc](https://jsdoc.app/) format if applicable - [ ] I’ve applied the right labels on the PR (see [labeling guidelines](https://github.com/MetaMask/metamask-extension/blob/develop/.github/guidelines/LABELING_GUIDELINES.md)). Not required for external contributors. ## **Pre-merge reviewer checklist** - [ ] I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed). - [ ] I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots. --------- Co-authored-by: Mark Stacey <[email protected]> Co-authored-by: Howard Braham <[email protected]>

hjetpoluru added the team-extension-platform label Apr 12, 2024

gauthierpetetin changed the title ~~Flakiness check for new test~~ Flakiness quality gate for new tests Apr 24, 2024

gauthierpetetin assigned seaona May 14, 2024

seaona mentioned this issue May 21, 2024

chore: adds quality gate for rerunning e2e spec files that are new or have been modified #24556

Merged

7 tasks

seaona closed this as completed in #24556 Jul 1, 2024

metamaskbot added the release-12.2.0 Issue or pull request that will be included in release 12.2.0 label Jul 1, 2024

hjetpoluru mentioned this issue Jul 26, 2024

Setup stricter rules in our CI to prevent introduction of new flaky tests #24444

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flakiness quality gate for new tests #24009

Flakiness quality gate for new tests #24009

hjetpoluru commented Apr 12, 2024 •

edited

Loading

Flakiness quality gate for new tests #24009

Flakiness quality gate for new tests #24009

Comments

hjetpoluru commented Apr 12, 2024 • edited Loading

What is this about?

Scenario

Design

Technical Details

Threat Modeling Framework

Acceptance Criteria

Stakeholder review needed before the work gets merged

References

hjetpoluru commented Apr 12, 2024 •

edited

Loading