-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial TAP test support #4958
Conversation
Flake8 detected 7 issues on 4aef0f5 posted by Sider |
Fixes #2923. |
Conceptually seems like a nice idea. Do we have someone with TAP usage experience (I have zero) to review the usability and integration parts? |
This pull request does not really improve the usability and integration; all it does is remove the need for a wrapper parser such as the one mentioned in issue #2923, see here. Integration improvements could come from printing failing subtests directly in the test list (like this), but that's beyond the scope of this initial work. The important part is to decide on the syntax, so that users get future improvements for free. |
Trunk has been fixed, please rebase so CI gets rerun. Thanks. |
28593d3
to
0d35be7
Compare
There was a different issue in trunk. It has now been fixed. A rebase should finally get things green. |
This is the first step towards adding support for TAP.
Parse the error code outside SingleTestRunner's run() method. This will let us add TAP support without complicating that long method further.
--print-errorlogs is using the test's return code to look for failed tests, instead of just looking at the TestResult. Simplify the code and make it work for TAP too.
Hard errors also come from the GNU Automake test protocol. They happen when e.g., the set-up of a test case scenario fails, or when some other unexpected or highly undesirable condition is encountered. TAP will use them for parse errors too. Add them to the exitcode protocol first.
This provides an initial support for parsing TAP output. It detects failures and skipped tests without relying on exit code, as well as early termination of the test due to an error or a crash. For now, subtests are not recorded in the TestRun object. However, because the TAP output goes on stdout, it is printed by --print-errorlogs when a test does not behave as expected. Handling subtests as TestRuns, and serializing them to JSON, can be added later. The parser was written specifically for Meson, and comes with its own test suite. Fixes mesonbuild#2923.
This is rather confusing, since a TAP parser that doesn't actually do TAP wastes our time thinking we should implement it, then completely fails to work as advertised, causes a critical blocker in deprecating autotools, and necessitates re-adding complicated meson-specific test harness logic that we'd previously deleted because we thought "oh, meson supports TAP now".
I think maybe my project does, in fact, need to re-add our wrapper parser if we want to correctly handle XFAIL? Not really sure what the story is here. A frustrating time-waster. At the least, I think the responsible thing to do would be to document in the reference manual, "note: |
I don't understand. A TAP parser parses TAP, it doesn't "do" TAP (whatever that means). If there's any bad result (
What do you mean by "correctly handle XFAIL"? Meson's TAP parser supports
Perhaps it doesn't do what you think it does, because I'm using it just fine in QEMU's port to Meson... |
TODO is reported as "okay" instead of "expected fail":
In order to mark a test as xfail, you need to mark it as failed in TAP, then mark it as xfail in meson. |
The
and you'll get Most of the time you should not use |
I will thank you to kindly not put words into my mouth! I am aware meson unittests are inaccurate things that only exist in order to generate subprocess return codes indicating whether anything exploded. But this test case is something trivially accessible. Since we are all intelligent people here, we can look up its definition and see that the test I highlighted is defined as such:
It emits a TAP stream with the thing I am complaining about, that is to say, a "not ok # todo", with the protocol: 'tap' kwarg and no should_fail kwarg -- that is to say, it is a bog-standard TAP test which uses the TAP protocol to communicate the message "this has xfailed". I then posted output showing that meson parses this as "TAP status: ok". But you know what -- fine. Ignore the meson test, ignore my attempt to be helpful, we'll do this the hard way. I have a project: https://git.archlinux.org/pacman.git/
Then build it under meson, observe that the tests which are defined as protocol: 'tap' and rely on the tap protocol to communicate their pass/fail/xfail/xpass/skip... do not, in fact, report the correct output:
This violates the TAP protocol. As I said above, meson is a TAP parser that doesn't actually do TAP -- it is not, in fact, a TAP parser, it is a "looks similar to TAP but isn't actually" parser. It lies and says the EXPECTEDFAIL test cases are full passes which are specifically distinct from the row which states for fact that "Expected Fail: 0", which means IT IS LYING TO ME and hiding the fact that there are known bugs in the application. This is completely unacceptable, and violates TAP. The project lead has stated we will NOT switch from autotools to meson until the test harness correctly reports EXPECTEDFAIL results as EXPECTEDFAIL.
.... or wait, wtf. You actually agree with me that everything I said is correct, but you consider this to be valid behavior for meson to do? WTF. This is not "indeed different from Automake". It is "indeed different from TAP consumers", of which Automake happens to be... one of. TAP says an xfail test is EXPECTEDFAIL, but the overall test harness should return success. meson's pretty-print TAP consumer provides an EXPECTEDFAIL category, which granted, not all TAP pretty-printers do, then fails to categorize EXPECTEDFAIL correctly, instead inserting them into the OK row. This would only be acceptable if meson's pretty-printer was a rudimentary one like It's extremely disconcerting to see you call this "how it should be" instead of "something which should be implemented to enhance meson". And no one other than QEMU cares what QEMU does. If you're that uninterested in summaries, perhaps you could implement a --quiet mode which causes mtest to emit zero lines of output on success, not even a count of the number of "Ok" tests. You are NOT PERMITTED to command other people what is fine for them. You can offer a recommendation "we don't care about XXXX and suggest that you shouldn't care either", but even if you convince me of that, you'd need to convince everyone of that, each and every time, in the face of no documentation.
Who on earth is using --verbose in their CI pipelines, then manually grepping for keywords in a giant stream of text, in order double-check whether or not meson is lying to them when it says "332 expected passes (OK), 0 expected failures (XFAIL)"? Answer: the handful of people who are in the know that meson lies. But why not make those show up in the summary, just like they do for non-TAP tests? Why are you suggesting that that's undesirable behavior? |
"Doesn't do TAP" is quite an exaggeration. "meson test" returns exit code 0 if autotools "make check" returns exit code 0, and exit code 1 if autotools "make check" returns exit code 1, and that's enough for me to say that Meson "does TAP". It's certainly correct that it "doesn't report results at the TAP test level" and requires you to look at the logs to see that. Meson indeed only reports the result of tests at the program invocation level rather than at the testcase level. This also explains why you see OK for the
would you qualify it as a PASS, SKIP or XFAIL? Meson says it's a pass because overall the executable has matched its expectations.
It's a decision. Making them show up the summary would mean that you get
and the difference between the 332 lines before and the 508+6=514 total after is just as puzzling. So Meson is not lying to you, it's saying that all the executables matched their expectations. "meson test" didn't have the concept of testcases within a program invocation, so the granularity you get from it is a program invocation. You're welcome to fix that and open a pull request for that. I promise you that any such change won't require a change to |
This series lets Meson handle the TAP protocol for tests. TAP output is parsed and converted into a pass/fail/skip outcome which is then checked against
should_fail
. In addition, errors in the test itself, such as TAP parsing errors, are treated as a hard error outcome that cannot be overridden byshould_fail
.For now, it only supports outputting the results on a per-
test()
level; the outcomes of each TAP-level testcase are recorded in the standard output but not e.g. in the JSON test results. However, this pull request introduces the required language changes and parsing support, so that further improvements can go in on top.