Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check precise Windows GIX_TEST_IGNORE_ARCHIVES expectations on CI #1663

Merged

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented Nov 9, 2024

This modifies the CI test-fixtures-windows job introduced in #1657, by making it so that, instead of checking that no more than the expected number of test cases fail, it checks that the exact test cases expected to fail are the ones that do fail. (The other check done before as part of this job, that no tests report errors, is still done.) The approach taken is:

  1. To collect actual failures, parse the XML output saved by cargo nextest to get the package name and test name of each failing test and save them, one per line, to a file.

    Getting this information is pretty straightforward in PowerShell. When iterating through testcase nodes, which are filtered by the presence of a failure subnode, $_.name gives the name of the test case, including :: qualifications within the containing package, and $_.classname gives the name of the containing package. Thus "$($_.classname) $($_.name)" identifies a test in the same way it is identified in human-readable cargo nextest output.

    The unintuitive attribute name classname is because cargo nextest XML is in the JUnit format, which originated for Java tests where test cases are methods in classes. An alternative approach is to use $_.ParentNode.name instead of $_.classname. I've verified that this works too, but I don't know of any strong reason to prefer one over another, so I went with the more compact $_.classname.

  2. To collect expected failures, use gh (the GitHub CLI) to download the text of #1358, and parse the first ```text code block with regex, extracting the list of failing tests and saving them, one per line, to a file.

    One of the tests currently listed there, gix-ref-tests::refs packed::iter::performance, is a performance test that seems not to fail on CI (anymore?), though I still find it to fail locally. If it were to start failing on CI, we would want to know about it. So, before these changes, it was not included in the count. I've carried that over to the precise matching done here: tests that appear to be performance tests due to having performance in their names (not as part of any longer \w+ word) are not regarded as expected to fail.

    It is not obvious that the general approach of downloading the information from an issue is the best one. Although I like this approach and I think it's not too cumbersome, this is admittedly the less elegant step. Other approaches, including hard-coding the expected failures in the workflow or another file, could be used if this approach is not wanted.

  3. To compare them, use git diff --no-index with various other options, printing the entire list of failures as context if they differ. The job fails if there are any differences.

    Because this neither a required check for PR auto-merge nor a dependency of one, I think failing not only if tests unexpectedly fail but also if any unexpectedly pass is unlikely to cause any significant problems, and that knowing whenever the set of failing tests changes in any way is worthwhile.

I intentionally excluded performance tests late in the process, so that there would be a commit whose CI results could be inspected that would verify that the check is capable of failing when the diff is nonempty (and that the output would display in a useful way, with effective colorization, etc., in this situation). That can be observed in this run output and compared to passing runs, such as the runs in this PR.

This modifies the `test-fixtures-windows` job that tests on Windows
with `GIX_TEST_IGNORE_ARCHIVES=1` so that, instead of checking that
no more than 14 failures occur, it checks that the failing tests
are exactly those that are documented in GitoxideLabs#1358 as expected to fail.

The initial check that no tests have *error* status is preserved,
with only stylistic changes, and kept separate from the subsequent
logic so that the output is clearer.

The new steps are no longer conditional on `nextest` having exited
with a failure status, since (a) that was probably unnecessary
before and definitely unnecessary now, (b) at last for now, the
comparison is precise, so it would be strange to pass if the diff
were to have changes on *all* lines, and (c) this makes it slightly
less likely that GitoxideLabs#1358 will accidentally stay open even once fixed.

The current approach is to actually retrieve the list of tests
expected to fail on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` from
the GitoxideLabs#1358 issue body. This has the advantage that it automatically
keeps up to date with changes made to that issue description, but
this is of course not the only possible approach for populating the
expected value.

Two changes should be made before this is ready:

- As noted in the "FIXME" comment, the job should currently fail
  becuase the performance test reported to fail in GitoxideLabs#1358 is not
  being filtered out from the expected failures list. It's left in
  as of this commit, to verify that the job is capable of failing.

  (After that, the performance test should either be filtered out
  or removed from the list in GitoxideLabs#1358, but the former approach is
  currently preferable because I have not done diverse enough
  testing to check if the failure on my main Windows system is due
  to that system being too slow rather than a performance bug.)

- The scratchwork file should be removed once no longer needed.
This is to fix the error:

    gh: To use GitHub CLI in a GitHub Actions workflow, set the GH_TOKEN environment variable. Example:
      env:
        GH_TOKEN: ${{ github.token }}
    InvalidOperation: D:\a\_temp\ba9d92b7-3a94-4f7c-b541-d19003f40e19.ps1:9
    Line |
       9 |  $expected_failures = $match_info.Matches.Groups[1].Value -split "`n"  …
         |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         | Cannot index into a null array.
    Error: Process completed with exit code 1.
Including color in the diff of expected vs. actual failed tests
makes the output easier to see (much as color helps in `nextest`).

This commit also makes some stylistic changes to the command so it
is easier to read.
This omits tests containing `performance` (and not as part of a
larger "word", not even with `_`) from being expected to fail on CI
with `GIX_TEST_IGNORE_ARCHIVES=1` on Windows.

Currently there is one such test listed in GitoxideLabs#1358,
`gix-ref-tests::refs packed::iter::performance`.
The relevant code is in the `test-fixtures-windows` CI job and it
is working (both to fail the job when there is a mismatch and to
have the job succeed when there is agreement).
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for making this happen!

I am very impressed by what can be done with powershell, it clearly is .NET in a shell :).

And indeed, like predicted, reading the text block from the issue breaks isolation too much and is the reason I'd love to see these extracted into a file, for example if that's OK with you.

Thanks again.

.github/workflows/ci.yml Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@EliahKagan EliahKagan requested a review from Byron November 9, 2024 08:32
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, thanks so much!

@Byron Byron merged commit 1df68e4 into GitoxideLabs:main Nov 9, 2024
17 checks passed
@EliahKagan EliahKagan deleted the run-ci/test-fixtures-windows-precise branch November 9, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants