Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial TAP test support #4958

Merged
merged 5 commits into from
Mar 3, 2019
Merged

Add initial TAP test support #4958

merged 5 commits into from
Mar 3, 2019

Conversation

bonzini
Copy link
Contributor

@bonzini bonzini commented Feb 21, 2019

This series lets Meson handle the TAP protocol for tests. TAP output is parsed and converted into a pass/fail/skip outcome which is then checked against should_fail. In addition, errors in the test itself, such as TAP parsing errors, are treated as a hard error outcome that cannot be overridden by should_fail.

For now, it only supports outputting the results on a per-test() level; the outcomes of each TAP-level testcase are recorded in the standard output but not e.g. in the JSON test results. However, this pull request introduces the required language changes and parsing support, so that further improvements can go in on top.

@ignatenkobrain
Copy link
Member

Flake8 detected 7 issues on 4aef0f5
Visit https://sider.review/gh/repos/19784232/pulls/4958 to review the issues.

posted by Sider

@bonzini
Copy link
Contributor Author

bonzini commented Feb 21, 2019

Fixes #2923.

@jpakkane
Copy link
Member

Conceptually seems like a nice idea. Do we have someone with TAP usage experience (I have zero) to review the usability and integration parts?

@bonzini
Copy link
Contributor Author

bonzini commented Feb 21, 2019

This pull request does not really improve the usability and integration; all it does is remove the need for a wrapper parser such as the one mentioned in issue #2923, see here.

Integration improvements could come from printing failing subtests directly in the test list (like this), but that's beyond the scope of this initial work. The important part is to decide on the syntax, so that users get future improvements for free.

@jpakkane
Copy link
Member

Trunk has been fixed, please rebase so CI gets rerun. Thanks.

@bonzini bonzini force-pushed the tap branch 2 times, most recently from 28593d3 to 0d35be7 Compare February 27, 2019 07:58
@jpakkane
Copy link
Member

jpakkane commented Mar 1, 2019

There was a different issue in trunk. It has now been fixed. A rebase should finally get things green.

bonzini added 5 commits March 2, 2019 09:07
This is the first step towards adding support for TAP.
Parse the error code outside SingleTestRunner's run() method.  This
will let us add TAP support without complicating that long method
further.
--print-errorlogs is using the test's return code to look for failed
tests, instead of just looking at the TestResult.  Simplify the code and
make it work for TAP too.
Hard errors also come from the GNU Automake test protocol.  They happen when
e.g., the set-up of a test case scenario fails, or when some
other unexpected or highly undesirable condition is encountered.

TAP will use them for parse errors too.  Add them to the exitcode protocol
first.
This provides an initial support for parsing TAP output.  It detects failures
and skipped tests without relying on exit code, as well as early termination
of the test due to an error or a crash.

For now, subtests are not recorded in the TestRun object.  However, because the
TAP output goes on stdout, it is printed by --print-errorlogs when a test does
not behave as expected.  Handling subtests as TestRuns, and serializing them
to JSON, can be added later.

The parser was written specifically for Meson, and comes with its own
test suite.

Fixes mesonbuild#2923.
@jpakkane jpakkane merged commit 1997ac2 into mesonbuild:master Mar 3, 2019
@bonzini bonzini deleted the tap branch March 28, 2019 07:48
@ePirat ePirat mentioned this pull request Apr 16, 2020
@eli-schwartz
Copy link
Member

TAP output is parsed and converted into a pass/fail/skip outcome which is then checked against should_fail

This is rather confusing, since a TAP parser that doesn't actually do TAP wastes our time thinking we should implement it, then completely fails to work as advertised, causes a critical blocker in deprecating autotools, and necessitates re-adding complicated meson-specific test harness logic that we'd previously deleted because we thought "oh, meson supports TAP now".

This pull request does not really improve the usability and integration; all it does is remove the need for a wrapper parser such as the one mentioned in issue #2923, see here.

I think maybe my project does, in fact, need to re-add our wrapper parser if we want to correctly handle XFAIL? Not really sure what the story is here.

A frustrating time-waster. At the least, I think the responsible thing to do would be to document in the reference manual, "note: protocol: 'tap' support is incomplete and doesn't do what you think it does".

@bonzini
Copy link
Contributor Author

bonzini commented Apr 20, 2020

This is rather confusing, since a TAP parser that doesn't actually do TAP

I don't understand. A TAP parser parses TAP, it doesn't "do" TAP (whatever that means). If there's any bad result (not ok, ok # TODO aka XPASS, Bail out!, or non-zero status code) in the test's TAP output, the test is overall a failure. If all of them are ok (or not ok # TODO, i.e. XFAIL), the test is overall a pass.

I think maybe my project does, in fact, need to re-add our wrapper parser if we want to correctly handle XFAIL? Not really sure what the story is here.

What do you mean by "correctly handle XFAIL"? Meson's TAP parser supports TODO directives correctly.

note: protocol: 'tap' support is incomplete and doesn't do what you think it does

Perhaps it doesn't do what you think it does, because I'm using it just fine in QEMU's port to Meson...

@eli-schwartz
Copy link
Member

eli-schwartz commented Apr 20, 2020

TODO is reported as "okay" instead of "expected fail":

[eschwartz@arch ~/git/meson/test cases/common/213 tap tests]$ meson test -C builddir
ninja: Entering directory `/home/eschwartz/git/meson/test cases/common/213 tap tests/builddir'
ninja: no work to do.
1/7 pass                                    OK      0.01997089385986328 s 
2/7 fail                                    EXPECTEDFAIL0.01022648811340332 s 
3/7 xfail                                   OK      0.01635885238647461 s 

In order to mark a test as xfail, you need to mark it as failed in TAP, then mark it as xfail in meson.

@bonzini
Copy link
Contributor Author

bonzini commented Apr 20, 2020

The 213 tap tests is an integration test, it's not an example of how things work in reality. In practice you'll have a single test that produces output like

ok 1
ok 2
ok 3
not ok 4 # TODO
ok 5

and you'll get OK in meson test, meaning that all results are as expected (either PASS or XFAIL). It's indeed different from Automake but we've used this reporting style in QEMU for a long time and it is just fine, especially if you use --verbose in your CI pipelines.

Most of the time you should not use should_fail at all if you use TAP.

@eli-schwartz
Copy link
Member

213 tap tests is an integration test, it's not an example of how things work in reality.

I will thank you to kindly not put words into my mouth! I am aware meson unittests are inaccurate things that only exist in order to generate subprocess return codes indicating whether anything exploded. But this test case is something trivially accessible. Since we are all intelligent people here, we can look up its definition and see that the test I highlighted is defined as such:

test('xfail', tester, args : ['not ok # todo'], protocol: 'tap')

It emits a TAP stream with the thing I am complaining about, that is to say, a "not ok # todo", with the protocol: 'tap' kwarg and no should_fail kwarg -- that is to say, it is a bog-standard TAP test which uses the TAP protocol to communicate the message "this has xfailed". I then posted output showing that meson parses this as "TAP status: ok".

But you know what -- fine. Ignore the meson test, ignore my attempt to be helpful, we'll do this the hard way.

I have a project: https://git.archlinux.org/pacman.git/
Please clone it, build it under autotools, run the unittests under autotools, observe that the TAP compliant tests emit:

Testsuite summary for pacman 5.2.1
============================================================================
# TOTAL: 514
# PASS:  508
# SKIP:  0
# XFAIL: 6
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

Then build it under meson, observe that the tests which are defined as protocol: 'tap' and rely on the tap protocol to communicate their pass/fail/xfail/xpass/skip... do not, in fact, report the correct output:

Ok:                 332 
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0   

This violates the TAP protocol. As I said above, meson is a TAP parser that doesn't actually do TAP -- it is not, in fact, a TAP parser, it is a "looks similar to TAP but isn't actually" parser. It lies and says the EXPECTEDFAIL test cases are full passes which are specifically distinct from the row which states for fact that "Expected Fail: 0", which means IT IS LYING TO ME and hiding the fact that there are known bugs in the application. This is completely unacceptable, and violates TAP.

The project lead has stated we will NOT switch from autotools to meson until the test harness correctly reports EXPECTEDFAIL results as EXPECTEDFAIL.

and you'll get OK in meson test, meaning that all results are as expected. It's indeed different from Automake but we've used this reporting style in QEMU for a long time and it is just fine, especially if you use --verbose in your CI pipelines.

.... or wait, wtf. You actually agree with me that everything I said is correct, but you consider this to be valid behavior for meson to do?

WTF.

This is not "indeed different from Automake".

It is "indeed different from TAP consumers", of which Automake happens to be... one of. TAP says an xfail test is EXPECTEDFAIL, but the overall test harness should return success. meson's pretty-print TAP consumer provides an EXPECTEDFAIL category, which granted, not all TAP pretty-printers do, then fails to categorize EXPECTEDFAIL correctly, instead inserting them into the OK row. This would only be acceptable if meson's pretty-printer was a rudimentary one like /usr/bin/prove which only reported Ok/Fail, and the reader knew that in order to get a better breakdown they must painfully grep through logs. But this is not the case. meson is a TAP pretty-printer which explicitly provides an EXPECTEDFAIL category, then refuses to use it.

It's extremely disconcerting to see you call this "how it should be" instead of "something which should be implemented to enhance meson". And no one other than QEMU cares what QEMU does. If you're that uninterested in summaries, perhaps you could implement a --quiet mode which causes mtest to emit zero lines of output on success, not even a count of the number of "Ok" tests.

You are NOT PERMITTED to command other people what is fine for them. You can offer a recommendation "we don't care about XXXX and suggest that you shouldn't care either", but even if you convince me of that, you'd need to convince everyone of that, each and every time, in the face of no documentation.

especially if you use --verbose in your CI pipelines.

Who on earth is using --verbose in their CI pipelines, then manually grepping for keywords in a giant stream of text, in order double-check whether or not meson is lying to them when it says "332 expected passes (OK), 0 expected failures (XFAIL)"?

Answer: the handful of people who are in the know that meson lies.

But why not make those show up in the summary, just like they do for non-TAP tests? Why are you suggesting that that's undesirable behavior?

@bonzini
Copy link
Contributor Author

bonzini commented Apr 20, 2020

As I said above, meson is a TAP parser that doesn't actually do TAP -- it is not, in fact, a TAP parser

"Doesn't do TAP" is quite an exaggeration. "meson test" returns exit code 0 if autotools "make check" returns exit code 0, and exit code 1 if autotools "make check" returns exit code 1, and that's enough for me to say that Meson "does TAP".

It's certainly correct that it "doesn't report results at the TAP test level" and requires you to look at the logs to see that. Meson indeed only reports the result of tests at the program invocation level rather than at the testcase level.

This also explains why you see OK for the xfail subtest. If a program printed

ok 1
ok 1 # SKIP
not ok 1 # TODO

would you qualify it as a PASS, SKIP or XFAIL? Meson says it's a pass because overall the executable has matched its expectations.

why not make those show up in the summary, just like they do for non-TAP tests? Why are you suggesting that that's undesirable behavior?

It's a decision. Making them show up the summary would mean that you get

(1)      ...                  OK
(2)      ...                  OK
...
(332)    ...                  OK

Ok:                 508
Expected Fail:      6   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0   

and the difference between the 332 lines before and the 508+6=514 total after is just as puzzling.

So Meson is not lying to you, it's saying that all the executables matched their expectations. "meson test" didn't have the concept of testcases within a program invocation, so the granularity you get from it is a program invocation. You're welcome to fix that and open a pull request for that. I promise you that any such change won't require a change to meson.build, because (even though I didn't implement it) I did take that into account in the design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants