Pendings tests #787

jzaefferer · 2015-03-17T14:22:55Z

Something that came out of a discussion with Yehuda: During a big refactoring they have hundreds of failing tests. They turned them off using QUnit.skip, but that doesn't tell them which of the skipped tests starting passing again at some point. To quickly test changes in passing tests, they ended up with this:

QUnit.skip = QUnit.test;

There's probably a better way to deal with that situation. One idea is to have something like a QUnit.pending method, that has the same signature as QUnit.test(), but will reverse the result. So if all assertions pass, the test fails, if at least one fails, the test passes.

The HTML reporter could use a similar marker as QUnit.skip.

Thoughts?

Update 2015-04-15:

After some discussion, we came up with this:

// This test will show up with a "todo" badge, with a similar style as the "skipped" badge
// It will show as passed when at least one assertion fails, or fail when all assertions pass
QUnit.todo( "this isn't implemented, yet" function( assert ) {
  assert.ok( featureTest(), "waiting for feature" );
} );

QUnit.testDone( function( details ) {
  console.log( details.name, details.todo );
  //> "this isn't implemented, yet" true
} );

QUnit.log( function( details ) {
  console.log( details.message, details.todo );
  //> "waiting for feature" true
} );

The text was updated successfully, but these errors were encountered:

wycats · 2015-03-17T14:23:52Z

That sounds great to me. We'd use this!

wycats · 2015-03-17T14:24:39Z

It could also be useful for tests that we expect to fail in all browsers, but which mark code that could be removed once browsers started implementing a feature. If we start to see the pending test "fail" in Saucelabs, it's time to investigate using the browser feature.

jzaefferer · 2015-03-17T14:28:21Z

That's an interesting idea. In both cases this would be considered a "temporary" test, so I still think some visual highlight makes sense.

gibson042 · 2015-03-17T16:05:31Z

This was touched on in the conversation that led to QUnit.skip.

Relevant excerpts:
#637 (comment)

[TAP] TODO: Expects failure (i.e., a reversal of normal operation—passes are interesting instead of failures, and failures do not effect pass→fail state changes in the containing run/module/suite/test). Reporters must take care to separate TODO failures from non-TODO failures, and should highlight TODO passes.

#637 (comment)

todo means "run, but don't let failures fail the containing group"

EDIT: removed unrelated comment and wholesale endorsement.

leobalter · 2015-03-17T16:14:04Z

qunit-pending from @JamesMGreene works differently from this proposal, where a test without a function argument like QUnit.test("works"); will become a QUnit.skip.

This issue proposes a test that contains at least one failing assertion. Rather than skipping them, they'll run all the assertions and it will FAIL when all of them pass. This way will lead the developer to keep a progressive work on a big refactoring, changing the pending tests to actual tests when they are ready and working.

gibson042 · 2015-03-17T17:31:28Z

qunit-pending from @JamesMGreene works differently from this proposal

Noted and updated.

I also want to add more nuance to this:

the same signature as QUnit.test(), but will reverse the result. So if all assertions pass, the test fails, if at least one fails, the test passes.

A blind reversal like that is insufficient for general use, because it doesn't highlight what work still needs to happen. TODO failures are still failures, they're just expected. This could manifest in a few different ways, but I think minimal changes require at least something like failedAsExpected in callback details (either double-counted as failed or understood as a distinct category) and distinct HTML reporting.

On the other hand, converting unexpected successes to unexpected failures as you propose makes sense to me, but is against the spirit of TAP ("Should a todo test point begin succeeding, the harness should report it as a bonus")... if we follow those footsteps instead, it'll mean something like passedUnexpectedly (again, either double-counted as passed or understood as a fifth partition) and possibly also distinct HTML reporting (or possibly reuse of e.g. "failed" styling).

No matter what, though, this promotes "expectations" as a QUnit concept, which is worth keeping in mind.

JamesMGreene · 2015-03-20T18:15:23Z

I'd say definitely don't call it QUnit.pending(). I'd be behind some name more like QUnit.todo (based on the TAP concept) as mentioned by @gibson042.

It's also worth noting that no other JS unit testing framework we've talked to has this functionality. It's marked as "Status Type" of "Inconclusive" in the table here: qunitjs/js-reporters#4

Whether that is good or bad remains to be seen....

Pros:
- We could be the first and only JS unit testing framework to offer it
Cons:
- We would be adding a new status type to the list that would not likely be compatible/translate-able into other testing frameworks (e.g. a concern for the JS Reporters project)

leobalter · 2015-03-20T19:11:31Z

todo sounds better than pending.

Regarding it as a new functionality, I believe the use case is the argument to sustain this new API feature.

jzaefferer · 2015-04-09T16:49:17Z

A blind reversal like that is insufficient for general use, because it doesn't highlight what work still needs to happen. TODO failures are still failures, they're just expected. This could manifest in a few different ways, but I think minimal changes require at least something like failedAsExpected in callback details (either double-counted as failed or understood as a distinct category) and distinct HTML reporting.

On the other hand, converting unexpected successes to unexpected failures as you propose makes sense to me, but is against the spirit of TAP ("Should a todo test point begin succeeding, the harness should report it as a bonus")... if we follow those footsteps instead, it'll mean something like passedUnexpectedly (again, either double-counted as passed or understood as a fifth partition) and possibly also distinct HTML reporting (or possibly reuse of e.g. "failed" styling).

No matter what, though, this promotes "expectations" as a QUnit concept, which is worth keeping in mind.

I've read this a few times, but still fail to grasp the essence. Could you or someone else provide a summary beyond "let's add QUnit.todo()"? That is, what exactly would we implement in QUnit?

gibson042 · 2015-04-09T19:51:27Z

I'll give it a shot. Recall that the QUnit callbacks provide "failed" and "passed" assertion counts, with an implicit assumption that every assertion is expected to pass. Introducing TODO means violating that assumption, because a run with failed assertions is still a success if they are all associated with TODO tests. We need to capture them in a new "failed as expected" count so consumers can behave correctly. We should also add a "passed unexpectedly" count so consumers can identify potentially-complete TODOs (e.g., QUnit.testDone(function( testResults ) { toDoNoMore = testResults.passedUnexpectedly > 0 && !testResults.failedAsExpected; … })). Note that both of these categories can be considered distinct from passed/failed, or can be considered subsets, with the decision probably controlled by exploring effects on backwards compatibility (e.g., whether it's better for a TODO test to report zero or nonzero "failed").

Having done that, the HTML reporter will need to style expectedly failing and unexpectedly passing tests.

jzaefferer · 2015-04-10T13:13:01Z

That sounds like overkill. Adding a "todo" detail to the log and badge to the reporter (styled like the one for skip) and reversing the result of all assertions inside a QUnit.todo() callback should be enough to make this useful. Its then up to the developer to decide what to do with a todo test once it fails, but that should be enough notification.

gibson042 · 2015-04-10T14:22:22Z

How is a developer to identify a todo test, especially if they're not using the HTML reporter?

jzaefferer · 2015-04-13T15:24:25Z

HTML reporter: "todo" badge, similar to "skip" badge
stdout reporter: Put "todo" in color in front of test name, or whatever works.

As long as the details for the test indicate the "todo" state, both can be implemented to output whatever we need.

gibson042 · 2015-04-13T15:57:13Z

I'm specifically thinking of Javascript reporters. The current callbacks arguments don't provide that information. Are you proposing a new property in the testDone details and requiring consumers to register at that level if they care about the distinction (e.g., for reporting at suite completion)?

var failedAsExpected = 0,
    toDoNoMore = [];

QUnit.testDone(function( details ) {
    if ( details.todo ) {
        failedAsExpected += details.failed;
        if ( !details.failed ) {
            toDoNoMore.push( details );
        }
    }
});

I think I can get on board with that.

jzaefferer · 2015-04-14T11:14:25Z

I'm not sure what you refer to as "Javascript reporters". I was talking about #790.

Are you proposing a new property in the testDone details and requiring consumers to register at that level if they care about the distinction (e.g., for reporting at suite completion)?

Yeah, that.

gibson042 · 2015-04-14T14:29:08Z

I'm not sure what you refer to as "Javascript reporters"

A poor choice of words. What matters is the API provided by QUnit—data passed to callbacks should be complete enough to implement any reporter (hence the need for an additional property here). I was just blinded by the presence of failed/passed/total assertion counts in moduleDone and done details, where they are entirely redundant with testDone and will be misleading when including "todo" tests.

Also, note that there will be edge cases with the interaction between "todo reversal" and automatically-added assertions (uncaught exceptions, timeouts, globals pollution, expected count mismatch, etc.).

jzaefferer · 2015-04-15T16:28:20Z

@gibson042 @wycats I've updated the ticket above with a summary, can you verify that captures everything we discussed?

gibson042 · 2015-04-15T17:04:37Z

Looks good except for this:

This test will pass when this assertion fails, along with all other assertions, or "fail" when at least one passes

I expect "todo" tests to pass when at least one assertion fails, or "fail" when all assertions pass.

leobalter · 2015-04-16T02:30:10Z

I expect "todo" tests to pass when at least one assertion fails, or "fail" when all assertions pass.

Agreed. It might point some bad practice but some assertions might pass on todo tests by random changes, as a collateral effect. It's good to flag only with the entire test passing rather than single assertions.

jzaefferer · 2015-04-17T14:08:25Z

Makes sense. Updated the code block above, please take another look.

gibson042 · 2015-04-17T14:19:09Z

LGTM 👍

leobalter · 2015-04-17T14:19:36Z

LGTM as well. Let's do it.

trentmwillis · 2016-03-06T20:19:06Z

Is this feature still wanted? If so, I volunteer to do the work. I'd very much like to see this as my primary project at work would use it.

leobalter · 2016-03-06T23:34:28Z

yes, it would be great to have this feature.

trentmwillis · 2016-05-26T17:06:52Z

I began attempting to implement this and ran into some issues that I think need discussing.

Adding details.todo makes sense at the testDone / log reporting level, but beyond that it starts to be a little problematic. For instance, consider a module that contains a normal test and a todo test. When moduleDone is logged, it will have failing assertions with no way to distinguish legitimate failures from expected failures in the todo. This is further amplified during done callbacks. This is similiar to the initial concern @gibson042 had above.

Now, this is technically able to be worked around by reporters only caring about values at the testDone / log level, but that feels a bit wonky as it invalidates much of the value of the aggregated values we report in moduleDone / done (e.g., having "failures" may not actually mean failing).

That said, maybe the "details" we report in moduleDone / done need to be reworked. I tend to think those hooks should serve more as "callback" points than "reporting hooks". Thoughts?

gibson042 · 2016-05-30T21:47:20Z

I don't think implementing TODO should require breaking changes to the module callback data, even though those breaking changes may be independently valuable. I still think framing this in the context of expectations is the way to go, and would imagine a module with two tests (one normal with two passing assertions and one TODO with a passing assertion and a failing assertion) to report something like

{
  "total": 4,
  "passed": 3,
  "failed": 1,

  // new properties
  // names and meaning subject to bikeshedding
  // …but values should always be less than or equal to unqualified analogs
  "passedUnexpectedly": 1,
  "failedAsExpected": 1,

  …
}

As you can see, the new properties are adding context to the existing data (even though passedUnexpectedly is of negligible value at the module level) without altering the present meaning. A consumer that wasn't yet aware of TODO would show expected failures as unexpected (prompting bug reports to update their code), and would fail to highlight expected failures that start passing (a problem so minor that we needn't worry about it IMO).

trentmwillis · 2016-06-01T22:58:04Z

I think this approach makes sense. It solves the problems I mention above and maintains backwards compatibility across the board.

One thought (largely bikeshedding though) would be to frame the numbers we report both as "unexpected". My thinking on this is two-fold: (1) it keeps the naming (and mental model) consistent across the two new variables, and, (2) failedAsExpected only carries real meaning when combined with failed so that you can determine if anything failed that was not supposed to fail.

{
  "total": 4,
  "passed": 3,
  "failed": 1,
  "unexpected": {
    "passed": 2, // bonuses
    "failed": 2  // problems
  }
}

gibson042 · 2016-06-02T00:45:08Z

I'm fine with both new fields being "unexpected", or even "expected" (e.g., under normal circumstances, and always until this is implemented, "expected failures" is zero/undefined). But I do think consumer code will be simpler if we don't wrap them in a sub-object.

trentmwillis · 2016-06-07T16:29:17Z

I had considered using "expected", but that actually doesn't provide enough contextual information. For example, if you have two expected failures from a todo, but wind up with an unexpected failure it would still be reported as:

{
  "failed": 1,
  "expectedFailures": 2
}

Giving you no way of knowing whether that failure was one of the expected ones or not. Since we're primarily concerned with the "unexpected" events, reporting those doesn't have that issue.

I do think consumer code will be simpler if we don't wrap them in a sub-object.

That's fine with me, I'm particularly concerned with naming consistency over the exact shape. I'm going to pursue an implementation following like so:

{
  "total": 4,
  "passed": 3,
  "failed": 1,
  "passedUnexpectedly": 1,
  "failedUnexpectedly": 0
}

wycats · 2017-04-29T17:48:57Z

Incidentally, my first comment here was that if this landed we'd use it. Glimmer is already using it :)

Thanks for keeping on top of this over a very long development cycle. It's why I love qunit.

jzaefferer added Component: HTML Reporter Category: API Type: Enhancement New idea or feature request. labels Mar 17, 2015

jzaefferer added the help welcome label Mar 17, 2015

leobalter added the Status: Ready A "Meta" type issue that has reached consensus. label Oct 20, 2015

rwjblue mentioned this issue Jan 5, 2016

Should auto-generated test cases fail by default? ember-cli/rfcs#31

Closed

trentmwillis self-assigned this May 26, 2016

trentmwillis mentioned this issue Jun 16, 2016

Core: Implement QUnit.todo #1006

Closed

leobalter unassigned trentmwillis Nov 1, 2016

gibson042 closed this as completed in 9c6efad Feb 18, 2017

trentmwillis mentioned this issue Feb 18, 2017

Core: Implement QUnit.todo #1080

Merged

Krinkle removed help welcome labels Dec 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pendings tests #787

Pendings tests #787

jzaefferer commented Mar 17, 2015

wycats commented Mar 17, 2015

wycats commented Mar 17, 2015

jzaefferer commented Mar 17, 2015

gibson042 commented Mar 17, 2015

leobalter commented Mar 17, 2015

gibson042 commented Mar 17, 2015

JamesMGreene commented Mar 20, 2015

leobalter commented Mar 20, 2015

jzaefferer commented Apr 9, 2015

gibson042 commented Apr 9, 2015

jzaefferer commented Apr 10, 2015

gibson042 commented Apr 10, 2015

jzaefferer commented Apr 13, 2015

gibson042 commented Apr 13, 2015

jzaefferer commented Apr 14, 2015

gibson042 commented Apr 14, 2015

jzaefferer commented Apr 15, 2015

gibson042 commented Apr 15, 2015

leobalter commented Apr 16, 2015

jzaefferer commented Apr 17, 2015

gibson042 commented Apr 17, 2015

leobalter commented Apr 17, 2015

trentmwillis commented Mar 6, 2016

leobalter commented Mar 6, 2016

trentmwillis commented May 26, 2016

gibson042 commented May 30, 2016

trentmwillis commented Jun 1, 2016

gibson042 commented Jun 2, 2016

trentmwillis commented Jun 7, 2016

wycats commented Apr 29, 2017 •

edited

Loading

Pendings tests #787

Pendings tests #787

Comments

jzaefferer commented Mar 17, 2015

wycats commented Mar 17, 2015

wycats commented Mar 17, 2015

jzaefferer commented Mar 17, 2015

gibson042 commented Mar 17, 2015

leobalter commented Mar 17, 2015

gibson042 commented Mar 17, 2015

JamesMGreene commented Mar 20, 2015

leobalter commented Mar 20, 2015

jzaefferer commented Apr 9, 2015

gibson042 commented Apr 9, 2015

jzaefferer commented Apr 10, 2015

gibson042 commented Apr 10, 2015

jzaefferer commented Apr 13, 2015

gibson042 commented Apr 13, 2015

jzaefferer commented Apr 14, 2015

gibson042 commented Apr 14, 2015

jzaefferer commented Apr 15, 2015

gibson042 commented Apr 15, 2015

leobalter commented Apr 16, 2015

jzaefferer commented Apr 17, 2015

gibson042 commented Apr 17, 2015

leobalter commented Apr 17, 2015

trentmwillis commented Mar 6, 2016

leobalter commented Mar 6, 2016

trentmwillis commented May 26, 2016

gibson042 commented May 30, 2016

trentmwillis commented Jun 1, 2016

gibson042 commented Jun 2, 2016

trentmwillis commented Jun 7, 2016

wycats commented Apr 29, 2017 • edited Loading

wycats commented Apr 29, 2017 •

edited

Loading