Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pendings tests #787

Closed
jzaefferer opened this issue Mar 17, 2015 · 30 comments
Closed

Pendings tests #787

jzaefferer opened this issue Mar 17, 2015 · 30 comments
Labels
Category: API Component: HTML Reporter Status: Ready A "Meta" type issue that has reached consensus. Type: Enhancement New idea or feature request.

Comments

@jzaefferer
Copy link
Member

Something that came out of a discussion with Yehuda: During a big refactoring they have hundreds of failing tests. They turned them off using QUnit.skip, but that doesn't tell them which of the skipped tests starting passing again at some point. To quickly test changes in passing tests, they ended up with this:

QUnit.skip = QUnit.test;

There's probably a better way to deal with that situation. One idea is to have something like a QUnit.pending method, that has the same signature as QUnit.test(), but will reverse the result. So if all assertions pass, the test fails, if at least one fails, the test passes.

The HTML reporter could use a similar marker as QUnit.skip.

Thoughts?


Update 2015-04-15:

After some discussion, we came up with this:

// This test will show up with a "todo" badge, with a similar style as the "skipped" badge
// It will show as passed when at least one assertion fails, or fail when all assertions pass
QUnit.todo( "this isn't implemented, yet" function( assert ) {
  assert.ok( featureTest(), "waiting for feature" );
} );

QUnit.testDone( function( details ) {
  console.log( details.name, details.todo );
  //> "this isn't implemented, yet" true
} );

QUnit.log( function( details ) {
  console.log( details.message, details.todo );
  //> "waiting for feature" true
} );
@wycats
Copy link

wycats commented Mar 17, 2015

That sounds great to me. We'd use this!

@wycats
Copy link

wycats commented Mar 17, 2015

It could also be useful for tests that we expect to fail in all browsers, but which mark code that could be removed once browsers started implementing a feature. If we start to see the pending test "fail" in Saucelabs, it's time to investigate using the browser feature.

@jzaefferer
Copy link
Member Author

That's an interesting idea. In both cases this would be considered a "temporary" test, so I still think some visual highlight makes sense.

@gibson042
Copy link
Member

This was touched on in the conversation that led to QUnit.skip.

Relevant excerpts:
#637 (comment)

[TAP] TODO: Expects failure (i.e., a reversal of normal operation—passes are interesting instead of failures, and failures do not effect pass→fail state changes in the containing run/module/suite/test). Reporters must take care to separate TODO failures from non-TODO failures, and should highlight TODO passes.

#637 (comment)

todo means "run, but don't let failures fail the containing group"

EDIT: removed unrelated comment and wholesale endorsement.

@leobalter
Copy link
Member

qunit-pending from @JamesMGreene works differently from this proposal, where a test without a function argument like QUnit.test("works"); will become a QUnit.skip.

This issue proposes a test that contains at least one failing assertion. Rather than skipping them, they'll run all the assertions and it will FAIL when all of them pass. This way will lead the developer to keep a progressive work on a big refactoring, changing the pending tests to actual tests when they are ready and working.

@gibson042
Copy link
Member

qunit-pending from @JamesMGreene works differently from this proposal

Noted and updated.

I also want to add more nuance to this:

the same signature as QUnit.test(), but will reverse the result. So if all assertions pass, the test fails, if at least one fails, the test passes.

A blind reversal like that is insufficient for general use, because it doesn't highlight what work still needs to happen. TODO failures are still failures, they're just expected. This could manifest in a few different ways, but I think minimal changes require at least something like failedAsExpected in callback details (either double-counted as failed or understood as a distinct category) and distinct HTML reporting.

On the other hand, converting unexpected successes to unexpected failures as you propose makes sense to me, but is against the spirit of TAP ("Should a todo test point begin succeeding, the harness should report it as a bonus")... if we follow those footsteps instead, it'll mean something like passedUnexpectedly (again, either double-counted as passed or understood as a fifth partition) and possibly also distinct HTML reporting (or possibly reuse of e.g. "failed" styling).

No matter what, though, this promotes "expectations" as a QUnit concept, which is worth keeping in mind.

@JamesMGreene
Copy link
Member

I'd say definitely don't call it QUnit.pending(). I'd be behind some name more like QUnit.todo (based on the TAP concept) as mentioned by @gibson042.

It's also worth noting that no other JS unit testing framework we've talked to has this functionality. It's marked as "Status Type" of "Inconclusive" in the table here: qunitjs/js-reporters#4

Whether that is good or bad remains to be seen....

  • Pros:
    • We could be the first and only JS unit testing framework to offer it
  • Cons:
    • We would be adding a new status type to the list that would not likely be compatible/translate-able into other testing frameworks (e.g. a concern for the JS Reporters project)

@leobalter
Copy link
Member

todo sounds better than pending.

Regarding it as a new functionality, I believe the use case is the argument to sustain this new API feature.

@jzaefferer
Copy link
Member Author

A blind reversal like that is insufficient for general use, because it doesn't highlight what work still needs to happen. TODO failures are still failures, they're just expected. This could manifest in a few different ways, but I think minimal changes require at least something like failedAsExpected in callback details (either double-counted as failed or understood as a distinct category) and distinct HTML reporting.

On the other hand, converting unexpected successes to unexpected failures as you propose makes sense to me, but is against the spirit of TAP ("Should a todo test point begin succeeding, the harness should report it as a bonus")... if we follow those footsteps instead, it'll mean something like passedUnexpectedly (again, either double-counted as passed or understood as a fifth partition) and possibly also distinct HTML reporting (or possibly reuse of e.g. "failed" styling).

No matter what, though, this promotes "expectations" as a QUnit concept, which is worth keeping in mind.

I've read this a few times, but still fail to grasp the essence. Could you or someone else provide a summary beyond "let's add QUnit.todo()"? That is, what exactly would we implement in QUnit?

@gibson042
Copy link
Member

I'll give it a shot. Recall that the QUnit callbacks provide "failed" and "passed" assertion counts, with an implicit assumption that every assertion is expected to pass. Introducing TODO means violating that assumption, because a run with failed assertions is still a success if they are all associated with TODO tests. We need to capture them in a new "failed as expected" count so consumers can behave correctly. We should also add a "passed unexpectedly" count so consumers can identify potentially-complete TODOs (e.g., QUnit.testDone(function( testResults ) { toDoNoMore = testResults.passedUnexpectedly > 0 && !testResults.failedAsExpected; … })). Note that both of these categories can be considered distinct from passed/failed, or can be considered subsets, with the decision probably controlled by exploring effects on backwards compatibility (e.g., whether it's better for a TODO test to report zero or nonzero "failed").

Having done that, the HTML reporter will need to style expectedly failing and unexpectedly passing tests.

@jzaefferer
Copy link
Member Author

That sounds like overkill. Adding a "todo" detail to the log and badge to the reporter (styled like the one for skip) and reversing the result of all assertions inside a QUnit.todo() callback should be enough to make this useful. Its then up to the developer to decide what to do with a todo test once it fails, but that should be enough notification.

@gibson042
Copy link
Member

How is a developer to identify a todo test, especially if they're not using the HTML reporter?

@jzaefferer
Copy link
Member Author

HTML reporter: "todo" badge, similar to "skip" badge
stdout reporter: Put "todo" in color in front of test name, or whatever works.

As long as the details for the test indicate the "todo" state, both can be implemented to output whatever we need.

@gibson042
Copy link
Member

I'm specifically thinking of Javascript reporters. The current callbacks arguments don't provide that information. Are you proposing a new property in the testDone details and requiring consumers to register at that level if they care about the distinction (e.g., for reporting at suite completion)?

var failedAsExpected = 0,
    toDoNoMore = [];

QUnit.testDone(function( details ) {
    if ( details.todo ) {
        failedAsExpected += details.failed;
        if ( !details.failed ) {
            toDoNoMore.push( details );
        }
    }
});

I think I can get on board with that.

@jzaefferer
Copy link
Member Author

I'm not sure what you refer to as "Javascript reporters". I was talking about #790.

Are you proposing a new property in the testDone details and requiring consumers to register at that level if they care about the distinction (e.g., for reporting at suite completion)?

Yeah, that.

@gibson042
Copy link
Member

I'm not sure what you refer to as "Javascript reporters"

A poor choice of words. What matters is the API provided by QUnit—data passed to callbacks should be complete enough to implement any reporter (hence the need for an additional property here). I was just blinded by the presence of failed/passed/total assertion counts in moduleDone and done details, where they are entirely redundant with testDone and will be misleading when including "todo" tests.

Also, note that there will be edge cases with the interaction between "todo reversal" and automatically-added assertions (uncaught exceptions, timeouts, globals pollution, expected count mismatch, etc.).

@jzaefferer
Copy link
Member Author

@gibson042 @wycats I've updated the ticket above with a summary, can you verify that captures everything we discussed?

@gibson042
Copy link
Member

Looks good except for this:

This test will pass when this assertion fails, along with all other assertions, or "fail" when at least one passes

I expect "todo" tests to pass when at least one assertion fails, or "fail" when all assertions pass.

@leobalter
Copy link
Member

I expect "todo" tests to pass when at least one assertion fails, or "fail" when all assertions pass.

Agreed. It might point some bad practice but some assertions might pass on todo tests by random changes, as a collateral effect. It's good to flag only with the entire test passing rather than single assertions.

@jzaefferer
Copy link
Member Author

Makes sense. Updated the code block above, please take another look.

@gibson042
Copy link
Member

LGTM 👍

@leobalter
Copy link
Member

LGTM as well. Let's do it.

@leobalter leobalter added the Status: Ready A "Meta" type issue that has reached consensus. label Oct 20, 2015
@trentmwillis
Copy link
Member

Is this feature still wanted? If so, I volunteer to do the work. I'd very much like to see this as my primary project at work would use it.

@leobalter
Copy link
Member

yes, it would be great to have this feature.

@trentmwillis trentmwillis self-assigned this May 26, 2016
@trentmwillis
Copy link
Member

I began attempting to implement this and ran into some issues that I think need discussing.

Adding details.todo makes sense at the testDone / log reporting level, but beyond that it starts to be a little problematic. For instance, consider a module that contains a normal test and a todo test. When moduleDone is logged, it will have failing assertions with no way to distinguish legitimate failures from expected failures in the todo. This is further amplified during done callbacks. This is similiar to the initial concern @gibson042 had above.

Now, this is technically able to be worked around by reporters only caring about values at the testDone / log level, but that feels a bit wonky as it invalidates much of the value of the aggregated values we report in moduleDone / done (e.g., having "failures" may not actually mean failing).

That said, maybe the "details" we report in moduleDone / done need to be reworked. I tend to think those hooks should serve more as "callback" points than "reporting hooks". Thoughts?

@gibson042
Copy link
Member

I don't think implementing TODO should require breaking changes to the module callback data, even though those breaking changes may be independently valuable. I still think framing this in the context of expectations is the way to go, and would imagine a module with two tests (one normal with two passing assertions and one TODO with a passing assertion and a failing assertion) to report something like

{
  "total": 4,
  "passed": 3,
  "failed": 1,

  // new properties
  // names and meaning subject to bikeshedding
  // …but values should always be less than or equal to unqualified analogs
  "passedUnexpectedly": 1,
  "failedAsExpected": 1,}

As you can see, the new properties are adding context to the existing data (even though passedUnexpectedly is of negligible value at the module level) without altering the present meaning. A consumer that wasn't yet aware of TODO would show expected failures as unexpected (prompting bug reports to update their code), and would fail to highlight expected failures that start passing (a problem so minor that we needn't worry about it IMO).

@trentmwillis
Copy link
Member

I think this approach makes sense. It solves the problems I mention above and maintains backwards compatibility across the board.

One thought (largely bikeshedding though) would be to frame the numbers we report both as "unexpected". My thinking on this is two-fold: (1) it keeps the naming (and mental model) consistent across the two new variables, and, (2) failedAsExpected only carries real meaning when combined with failed so that you can determine if anything failed that was not supposed to fail.

{
  "total": 4,
  "passed": 3,
  "failed": 1,
  "unexpected": {
    "passed": 2, // bonuses
    "failed": 2  // problems
  }
}

@gibson042
Copy link
Member

I'm fine with both new fields being "unexpected", or even "expected" (e.g., under normal circumstances, and always until this is implemented, "expected failures" is zero/undefined). But I do think consumer code will be simpler if we don't wrap them in a sub-object.

@trentmwillis
Copy link
Member

I had considered using "expected", but that actually doesn't provide enough contextual information. For example, if you have two expected failures from a todo, but wind up with an unexpected failure it would still be reported as:

{
  "failed": 1,
  "expectedFailures": 2
}

Giving you no way of knowing whether that failure was one of the expected ones or not. Since we're primarily concerned with the "unexpected" events, reporting those doesn't have that issue.

I do think consumer code will be simpler if we don't wrap them in a sub-object.

That's fine with me, I'm particularly concerned with naming consistency over the exact shape. I'm going to pursue an implementation following like so:

{
  "total": 4,
  "passed": 3,
  "failed": 1,
  "passedUnexpectedly": 1,
  "failedUnexpectedly": 0
}

@wycats
Copy link

wycats commented Apr 29, 2017

Incidentally, my first comment here was that if this landed we'd use it. Glimmer is already using it :)

Thanks for keeping on top of this over a very long development cycle. It's why I love qunit.

slow and steady wins the race

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: API Component: HTML Reporter Status: Ready A "Meta" type issue that has reached consensus. Type: Enhancement New idea or feature request.
Development

No branches or pull requests

7 participants