Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of Skipped Tests in Helix framework #7286

Closed
miniksa opened this issue Aug 13, 2020 · 2 comments · Fixed by #15831
Closed

Handling of Skipped Tests in Helix framework #7286

miniksa opened this issue Aug 13, 2020 · 2 comments · Fixed by #15831
Labels
Area-Build Issues pertaining to the build system, CI, infrastructure, meta Issue-Task It's a feature request, but it doesn't really need a major design. Needs-Tag-Fix Doesn't match tag requirements Product-Conhost For issues in the Console codebase Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal.

Comments

@miniksa
Copy link
Member

miniksa commented Aug 13, 2020

We have been using the Skipped state for two situations in our code:

  1. The test is dynamically deciding it cannot run because of situations only detectible at runtime in the operating system
  2. The test is broken or flaky for some reason and we want to turn it off temporarily until we can fix it in a work item.

Unfortunately, with the adoption of the Helix scripts from Microsoft-UI-XAML, this isn't fully compatible.
This is because the Helix scripts, when run on the Helix machines, have some retry logic in them. They take anything that is NOT a pass and they re-run it up to 10 times. They store all the subresults and bucket tests into 3 categories:

  1. Pass - Just flat out passed
  2. Skipped - Flaky/unreliable, had some mix of pass/fail on the runs
  3. Fail - Just flat out failed.

Then in the final post-processing steps for test results, things that are Skipped are evaluated for the pass/fail threshold and converted to either Pass or Fail based on whether they indicated "Pass" above or below the threshold count.

This makes our usage of Skipped do the following:

  1. The scripts see the 'Skipped' out of TAEF as !Passed and mark them for rerun.
  2. They rerun the 'Skipped' test up to 10 times and see !Passed and mark each run as a Fail.
  3. The whole flakiness is reported as 'Skipped' temporarily on the build machine
  4. In the post processing steps, the subruns of 'Skipped' are evaluated, every one of them is !Passed (and thus Fail) and the test is marked as Fail.

So in summary, the scripts right now only understand two things: Pass and Fail. Also they reuse Skipped as an intermediate counter state only to further evaluate it into Pass/Fail later.

That doesn't work for us.

To complete the PR where I brought Helix online, I used the TestAttribute "Ignore" to "True" instead on tests that need to be disabled temporarily for some reason. I did this only on the UIA/LocalTests because they're the only ones affected by running inside Helix with the retry scripts. The Unit/Feature tests that run on the build machine (and use a small subset of the scripts just to report their test results to AzDO) are fine reporting Skipped as they are.

On talking with the Microsoft UI XAML team, for handling tests that dynamically decide they're OK, they instead chose to report "Pass" and exit early as necessary.

Overall, I think this is what we need to do going forward:

  1. For tests dynamically deciding we need to either
    • Convert them to report Pass early and accept that.
    • Teach HelixTestHelpers.cs and the retry logic how to handle either the Skipped or NotRun state (and update the relevant tests as appropriate) to pass that all the way through and report up to AzDO without rerunning them when it's intentionally skipped (and without post-processing them into a binary Pass/Fail state.)
  2. For tests that are broken or flaky and need turning off with a filed work item, just set Ignore=True as the test attribute. (This also allows the te.exe command-line argument /runIgnoredTests to be used to run them when attempting to fix them up.)
@miniksa miniksa added Product-Conhost For issues in the Console codebase Product-Conpty For console issues specifically related to conpty Area-Build Issues pertaining to the build system, CI, infrastructure, meta Product-Terminal The new Windows Terminal. Issue-Task It's a feature request, but it doesn't really need a major design. labels Aug 13, 2020
@ghost ghost added the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Aug 13, 2020
@miniksa
Copy link
Member Author

miniksa commented Aug 13, 2020

See also #7281 and #7282 for disabled tests as well as #6992 for the enablement of Helix

@miniksa miniksa removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Aug 13, 2020
@miniksa miniksa added this to the Terminal Backlog milestone Aug 13, 2020
This was referenced Aug 13, 2020
ghost pushed a commit that referenced this issue Aug 18, 2020
Use the Helix testing orchestration framework to run our Terminal LocalTests and Console Host UIA tests.

## References
#### Creates the following new issues:
- #7281 - re-enable local tests that were disabled to turn on Helix
- #7282 - re-enable UIA tests that were disabled to turn on Helix
- #7286 - investigate and implement appropriate compromise solution to how Skipped is handled by MUX Helix scripts

#### Consumes from:
- #7164 - The update to TAEF includes wttlog.dll. The WTT logs are what MUX's Helix scripts use to track the run state, convert to XUnit format, and notify both Helix and AzDO of what's going on.

#### Produces for:
- #671 - Making Terminal UIA tests is now possible
- #6963 - MUX's Helix scripts are already ready to capture PGO data on the Helix machines as certain tests run. Presuming we can author some reasonable scenarios, turning on the Helix environment gets us a good way toward automated PGO.

#### Related:
- #4490 - We lost the AzDO integration of our test data when I moved from the TAEF/VSTest adapter directly back to TE. Thanks to the WTTLog + Helix conversion scripts to XUnit + new upload phase, we have it back!

## PR Checklist
* [x] Closes #3838
* [x] I work here.
* [x] Literally adds tests.
* [ ] Should I update a testing doc in this repo?
* [x] Am core contributor. Hear me roar.
* [ ] Correct spell-checking the right way before merge.

## Detailed Description of the Pull Request / Additional comments
We have had two classes of tests that don't work in our usual build-machine testing environment:
1. Tests that require interactive UI automation or input injection (a.k.a. require a logged in user)
2. Tests that require the entire Windows Terminal to stand up (because our Xaml Islands dependency requires 1903 or later and the Windows Server instance for the build is based on 1809.)

The Helix testing environment solves both of these and is brought to us by our friends over in https://github.com/microsoft/microsoft-ui-xaml.

This PR takes a large portion of scripts and pipeline configuration steps from the Microsoft-UI-XAML repository and adjusts them for Terminal needs.
You can see the source of most of the files in either https://github.com/microsoft/microsoft-ui-xaml/tree/master/build/Helix or https://github.com/microsoft/microsoft-ui-xaml/tree/master/build/AzurePipelinesTemplates

Some of the modifications in the files include (but are not limited to) reasons like:
- Our test binaries are named differently than MUX's test binaries
- We don't need certain types of testing that MUX does.
- We use C++ and C# tests while MUX was using only C# tests (so the naming pattern and some of the parsing of those names is different e.g. :: separators in C++ and . separators in C#)
- Our pipeline phases work a bit differently than MUX and/or we need significantly fewer pieces to the testing matrix (like we don't test a wide variety of OS versions).

The build now runs in a few stages:
1. The usual build and run of unit tests/feature tests, packaging verification, and whatnot. This phase now also picks up and packs anything required for running tests in Helix into an artifact. (It also unifies the artifact name between the things Helix needs and the existing build outputs into the single `drop` artifact to make life a little easier.)
2. The Helix preparation build runs that picks up those artifacts, generates all the scripts required for Helix to understand the test modules/functions from our existing TAEF tests, packs it all up, and queues it on the Helix pool.
3. Helix generates a VM for our testing environment and runs all the TAEF tests that require it. The orchestrator at helix.dot.net watches over this and tracks the success/fail and progress of each module and function. The scripts from our MUX friends handle installing dependencies, making the system quiet for better reliability, detecting flaky tests and rerunning them, and coordinating all the log uploads (including for the subruns of tests that are re-run.)
4. A final build phase is run to look through the results with the Helix API and clean up the marking of tests that are flaky, link all the screenshots and console output logs into the AzDO tests panel, and other such niceities.

We are set to run Helix tests on the Feature test policy of only x64 for now. 

Additionally, because the set up of the Helix VMs takes so long, we are *NOT* running these in PR trigger right now as I believe we all very much value our 15ish minute PR turnaround (and the VM takes another 15 minutes to just get going for whatever reason.) For now, they will only run as a rolling build on master after PRs are merged. We should still know when there's an issue within about an hour of something merging and multiple PRs merging fast will be done on the rolling build as a batch run (not one per).

In addition to setting up the entire Helix testing pipeline for the tests that require it, I've preserved our classic way of running unit and feature tests (that don't require an elaborate environment) directly on the build machines. But with one bonus feature... They now use some of the scripts from MUX to transform their log data and report it to AzDO so it shows up beautifully in the build report. (We used to have this before I removed the MStest/VStest wrapper for performance reasons, but now we can have reporting AND performance!) See https://dev.azure.com/ms/terminal/_build/results?buildId=101654&view=ms.vss-test-web.build-test-results-tab for an example. 

I explored running all of the tests on Helix but.... the Helix setup time is long and the resources are more expensive. I felt it was better to preserve the "quick signal" by continuing to run these directly on the build machine (and skipping the more expensive/slow Helix setup if they fail.) It also works well with the split between PR builds not running Helix and the rolling build running Helix. PR builds will get a good chunk of tests for a quick turn around and the rolling build will finish the more thorough job a bit more slowly.

## Validation Steps Performed
- [x] Ran the updated pipelines with Pull Request configuration ensuring that Helix tests don't run in the usual CI
- [x] Ran with simulation of the rolling build to ensure that the tests now running in Helix will pass. All failures marked for follow on in reference issues.
@zadjii-msft
Copy link
Member

Oh hey look at that, Niksa knew this was a problem. I think I did this back in #15831, more or less. We don't do any of the XAML retrying, but we also do report these as skipped now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Build Issues pertaining to the build system, CI, infrastructure, meta Issue-Task It's a feature request, but it doesn't really need a major design. Needs-Tag-Fix Doesn't match tag requirements Product-Conhost For issues in the Console codebase Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants