Add initial TAP test support #4958

bonzini · 2019-02-21T17:28:18Z

This series lets Meson handle the TAP protocol for tests. TAP output is parsed and converted into a pass/fail/skip outcome which is then checked against should_fail. In addition, errors in the test itself, such as TAP parsing errors, are treated as a hard error outcome that cannot be overridden by should_fail.

For now, it only supports outputting the results on a per-test() level; the outcomes of each TAP-level testcase are recorded in the standard output but not e.g. in the JSON test results. However, this pull request introduces the required language changes and parsing support, so that further improvements can go in on top.

ignatenkobrain · 2019-02-21T17:29:08Z

Flake8 detected 7 issues on 4aef0f5
Visit https://sider.review/gh/repos/19784232/pulls/4958 to review the issues.

_{posted by Sider}

bonzini · 2019-02-21T17:30:58Z

Fixes #2923.

jpakkane · 2019-02-21T21:05:26Z

Conceptually seems like a nice idea. Do we have someone with TAP usage experience (I have zero) to review the usability and integration parts?

bonzini · 2019-02-21T21:43:03Z

This pull request does not really improve the usability and integration; all it does is remove the need for a wrapper parser such as the one mentioned in issue #2923, see here.

Integration improvements could come from printing failing subtests directly in the test list (like this), but that's beyond the scope of this initial work. The important part is to decide on the syntax, so that users get future improvements for free.

jpakkane · 2019-02-26T00:12:16Z

Trunk has been fixed, please rebase so CI gets rerun. Thanks.

jpakkane · 2019-03-01T22:23:32Z

There was a different issue in trunk. It has now been fixed. A rebase should finally get things green.

This is the first step towards adding support for TAP.

Parse the error code outside SingleTestRunner's run() method. This will let us add TAP support without complicating that long method further.

--print-errorlogs is using the test's return code to look for failed tests, instead of just looking at the TestResult. Simplify the code and make it work for TAP too.

Hard errors also come from the GNU Automake test protocol. They happen when e.g., the set-up of a test case scenario fails, or when some other unexpected or highly undesirable condition is encountered. TAP will use them for parse errors too. Add them to the exitcode protocol first.

This provides an initial support for parsing TAP output. It detects failures and skipped tests without relying on exit code, as well as early termination of the test due to an error or a crash. For now, subtests are not recorded in the TestRun object. However, because the TAP output goes on stdout, it is printed by --print-errorlogs when a test does not behave as expected. Handling subtests as TestRuns, and serializing them to JSON, can be added later. The parser was written specifically for Meson, and comes with its own test suite. Fixes mesonbuild#2923.

eli-schwartz · 2020-04-20T12:49:23Z

TAP output is parsed and converted into a pass/fail/skip outcome which is then checked against should_fail

This is rather confusing, since a TAP parser that doesn't actually do TAP wastes our time thinking we should implement it, then completely fails to work as advertised, causes a critical blocker in deprecating autotools, and necessitates re-adding complicated meson-specific test harness logic that we'd previously deleted because we thought "oh, meson supports TAP now".

This pull request does not really improve the usability and integration; all it does is remove the need for a wrapper parser such as the one mentioned in issue #2923, see here.

I think maybe my project does, in fact, need to re-add our wrapper parser if we want to correctly handle XFAIL? Not really sure what the story is here.

A frustrating time-waster. At the least, I think the responsible thing to do would be to document in the reference manual, "note: protocol: 'tap' support is incomplete and doesn't do what you think it does".

bonzini · 2020-04-20T12:56:32Z

This is rather confusing, since a TAP parser that doesn't actually do TAP

I don't understand. A TAP parser parses TAP, it doesn't "do" TAP (whatever that means). If there's any bad result (not ok, ok # TODO aka XPASS, Bail out!, or non-zero status code) in the test's TAP output, the test is overall a failure. If all of them are ok (or not ok # TODO, i.e. XFAIL), the test is overall a pass.

I think maybe my project does, in fact, need to re-add our wrapper parser if we want to correctly handle XFAIL? Not really sure what the story is here.

What do you mean by "correctly handle XFAIL"? Meson's TAP parser supports TODO directives correctly.

note: protocol: 'tap' support is incomplete and doesn't do what you think it does

Perhaps it doesn't do what you think it does, because I'm using it just fine in QEMU's port to Meson...

eli-schwartz · 2020-04-20T13:11:26Z

TODO is reported as "okay" instead of "expected fail":

[eschwartz@arch ~/git/meson/test cases/common/213 tap tests]$ meson test -C builddir
ninja: Entering directory `/home/eschwartz/git/meson/test cases/common/213 tap tests/builddir'
ninja: no work to do.
1/7 pass                                    OK      0.01997089385986328 s 
2/7 fail                                    EXPECTEDFAIL0.01022648811340332 s 
3/7 xfail                                   OK      0.01635885238647461 s

In order to mark a test as xfail, you need to mark it as failed in TAP, then mark it as xfail in meson.

bonzini · 2020-04-20T13:34:10Z

The 213 tap tests is an integration test, it's not an example of how things work in reality. In practice you'll have a single test that produces output like

ok 1
ok 2
ok 3
not ok 4 # TODO
ok 5

and you'll get OK in meson test, meaning that all results are as expected (either PASS or XFAIL). It's indeed different from Automake but we've used this reporting style in QEMU for a long time and it is just fine, especially if you use --verbose in your CI pipelines.

Most of the time you should not use should_fail at all if you use TAP.

eli-schwartz · 2020-04-20T15:52:33Z

213 tap tests is an integration test, it's not an example of how things work in reality.

I will thank you to kindly not put words into my mouth! I am aware meson unittests are inaccurate things that only exist in order to generate subprocess return codes indicating whether anything exploded. But this test case is something trivially accessible. Since we are all intelligent people here, we can look up its definition and see that the test I highlighted is defined as such:

meson/test cases/common/213 tap tests/meson.build

Line 6 in f798207

test('xfail', tester, args : ['not ok # todo'], protocol: 'tap')

It emits a TAP stream with the thing I am complaining about, that is to say, a "not ok # todo", with the protocol: 'tap' kwarg and no should_fail kwarg -- that is to say, it is a bog-standard TAP test which uses the TAP protocol to communicate the message "this has xfailed". I then posted output showing that meson parses this as "TAP status: ok".

But you know what -- fine. Ignore the meson test, ignore my attempt to be helpful, we'll do this the hard way.

I have a project: https://git.archlinux.org/pacman.git/
Please clone it, build it under autotools, run the unittests under autotools, observe that the TAP compliant tests emit:

Testsuite summary for pacman 5.2.1
============================================================================
# TOTAL: 514
# PASS:  508
# SKIP:  0
# XFAIL: 6
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

Then build it under meson, observe that the tests which are defined as protocol: 'tap' and rely on the tap protocol to communicate their pass/fail/xfail/xpass/skip... do not, in fact, report the correct output:

Ok:                 332 
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0

This violates the TAP protocol. As I said above, meson is a TAP parser that doesn't actually do TAP -- it is not, in fact, a TAP parser, it is a "looks similar to TAP but isn't actually" parser. It lies and says the EXPECTEDFAIL test cases are full passes which are specifically distinct from the row which states for fact that "Expected Fail: 0", which means IT IS LYING TO ME and hiding the fact that there are known bugs in the application. This is completely unacceptable, and violates TAP.

The project lead has stated we will NOT switch from autotools to meson until the test harness correctly reports EXPECTEDFAIL results as EXPECTEDFAIL.

and you'll get OK in meson test, meaning that all results are as expected. It's indeed different from Automake but we've used this reporting style in QEMU for a long time and it is just fine, especially if you use --verbose in your CI pipelines.

.... or wait, wtf. You actually agree with me that everything I said is correct, but you consider this to be valid behavior for meson to do?

WTF.

This is not "indeed different from Automake".

It is "indeed different from TAP consumers", of which Automake happens to be... one of. TAP says an xfail test is EXPECTEDFAIL, but the overall test harness should return success. meson's pretty-print TAP consumer provides an EXPECTEDFAIL category, which granted, not all TAP pretty-printers do, then fails to categorize EXPECTEDFAIL correctly, instead inserting them into the OK row. This would only be acceptable if meson's pretty-printer was a rudimentary one like /usr/bin/prove which only reported Ok/Fail, and the reader knew that in order to get a better breakdown they must painfully grep through logs. But this is not the case. meson is a TAP pretty-printer which explicitly provides an EXPECTEDFAIL category, then refuses to use it.

It's extremely disconcerting to see you call this "how it should be" instead of "something which should be implemented to enhance meson". And no one other than QEMU cares what QEMU does. If you're that uninterested in summaries, perhaps you could implement a --quiet mode which causes mtest to emit zero lines of output on success, not even a count of the number of "Ok" tests.

You are NOT PERMITTED to command other people what is fine for them. You can offer a recommendation "we don't care about XXXX and suggest that you shouldn't care either", but even if you convince me of that, you'd need to convince everyone of that, each and every time, in the face of no documentation.

especially if you use --verbose in your CI pipelines.

Who on earth is using --verbose in their CI pipelines, then manually grepping for keywords in a giant stream of text, in order double-check whether or not meson is lying to them when it says "332 expected passes (OK), 0 expected failures (XFAIL)"?

Answer: the handful of people who are in the know that meson lies.

But why not make those show up in the summary, just like they do for non-TAP tests? Why are you suggesting that that's undesirable behavior?

bonzini · 2020-04-20T16:45:16Z

As I said above, meson is a TAP parser that doesn't actually do TAP -- it is not, in fact, a TAP parser

"Doesn't do TAP" is quite an exaggeration. "meson test" returns exit code 0 if autotools "make check" returns exit code 0, and exit code 1 if autotools "make check" returns exit code 1, and that's enough for me to say that Meson "does TAP".

It's certainly correct that it "doesn't report results at the TAP test level" and requires you to look at the logs to see that. Meson indeed only reports the result of tests at the program invocation level rather than at the testcase level.

This also explains why you see OK for the xfail subtest. If a program printed

ok 1
ok 1 # SKIP
not ok 1 # TODO

would you qualify it as a PASS, SKIP or XFAIL? Meson says it's a pass because overall the executable has matched its expectations.

why not make those show up in the summary, just like they do for non-TAP tests? Why are you suggesting that that's undesirable behavior?

It's a decision. Making them show up the summary would mean that you get

(1)      ...                  OK
(2)      ...                  OK
...
(332)    ...                  OK

Ok:                 508
Expected Fail:      6   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0

and the difference between the 332 lines before and the 508+6=514 total after is just as puzzling.

So Meson is not lying to you, it's saying that all the executables matched their expectations. "meson test" didn't have the concept of testcases within a program invocation, so the granularity you get from it is a program invocation. You're welcome to fix that and open a pull request for that. I promise you that any such change won't require a change to meson.build, because (even though I didn't implement it) I did take that into account in the design.

bonzini force-pushed the tap branch from 4aef0f5 to 5415b0c Compare February 21, 2019 17:30

bonzini force-pushed the tap branch 2 times, most recently from 28593d3 to 0d35be7 Compare February 27, 2019 07:58

bonzini added 5 commits March 2, 2019 09:07

interpreter: add "protocol" kwarg to test

4b5cf3f

This is the first step towards adding support for TAP.

mtest: refactor TestRun creation

1264f03

Parse the error code outside SingleTestRunner's run() method. This will let us add TAP support without complicating that long method further.

mtest: do not use return code to look for failed tests

d830945

--print-errorlogs is using the test's return code to look for failed tests, instead of just looking at the TestResult. Simplify the code and make it work for TAP too.

bonzini force-pushed the tap branch from 0d35be7 to 91f847d Compare March 2, 2019 08:08

jpakkane merged commit 1997ac2 into mesonbuild:master Mar 3, 2019

bonzini deleted the tap branch March 28, 2019 07:48

ePirat mentioned this pull request Apr 16, 2020

Add TAP tests support #2923

Closed

v1993 mentioned this pull request Apr 11, 2023

Add meson as consumer TestAnything/testanything.github.io#176

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial TAP test support #4958

Add initial TAP test support #4958

bonzini commented Feb 21, 2019

ignatenkobrain commented Feb 21, 2019

bonzini commented Feb 21, 2019

jpakkane commented Feb 21, 2019

bonzini commented Feb 21, 2019 •

edited

Loading

jpakkane commented Feb 26, 2019

jpakkane commented Mar 1, 2019

eli-schwartz commented Apr 20, 2020

bonzini commented Apr 20, 2020

eli-schwartz commented Apr 20, 2020 •

edited

Loading

bonzini commented Apr 20, 2020 •

edited

Loading

eli-schwartz commented Apr 20, 2020

bonzini commented Apr 20, 2020

Add initial TAP test support #4958

Add initial TAP test support #4958

Conversation

bonzini commented Feb 21, 2019

ignatenkobrain commented Feb 21, 2019

bonzini commented Feb 21, 2019

jpakkane commented Feb 21, 2019

bonzini commented Feb 21, 2019 • edited Loading

jpakkane commented Feb 26, 2019

jpakkane commented Mar 1, 2019

eli-schwartz commented Apr 20, 2020

bonzini commented Apr 20, 2020

eli-schwartz commented Apr 20, 2020 • edited Loading

bonzini commented Apr 20, 2020 • edited Loading

eli-schwartz commented Apr 20, 2020

bonzini commented Apr 20, 2020

bonzini commented Feb 21, 2019 •

edited

Loading

eli-schwartz commented Apr 20, 2020 •

edited

Loading

bonzini commented Apr 20, 2020 •

edited

Loading