Make harness errors block PRs (with override) #10877

zcorpan · 2018-05-07T18:42:57Z

@jugglinmike made a list of tests producing harness errors in web-platform-tests/results-collection#478 (comment)

@foolip do we want to make harness errors fail in Travis?

foolip · 2018-05-07T21:30:22Z

I think we should try to make all harness errors fail the Travis jobs, unless while trying to do that we discover some reason why harness errors really should be allowed. I imagine that @jgraham has thoughts on this.

jgraham · 2018-05-09T07:44:25Z

What do you mean "harness errors"? Do you mean errors in testharness.js or in wptrunner?

Assuming you mean the former, there is an already-discussed ambiguity between single page tests which are likely to produce ERROR if the feature in question isn't implemented, and tests that unintentionally error due to a bug. One could certainly look for cases where multiple browsers produce ERROR. This was the problem that the pulls dashboard was supposed to solve, but that seems to have stopped working.

foolip · 2018-05-09T09:00:05Z

We're talking about testharness.js harness errors. We can make progress on this even if #10853 remains unresolved, it just means that if #10853 is fixed, we might need to fix a bunch of new harness errors at the same time. I would slightly prefer the other order of fixing though.

zcorpan · 2018-05-31T12:45:48Z

Also TIMEOUT may be good to prevent.

jgraham · 2018-05-31T13:08:15Z

TIMEOUT is fine in the case where you're waiting for an event that never happens. The alternative is that authors add some shorter timeout, making the test more prone to intermittent failures.

Even forERROR, where a top-level error status might well indicate a bug, I'd be interested to see evidence for or against the theory that an ERROR in a subset of implemenatations is usually indcative of a test bug and so should block landing. For cases where no browser has a non-ERROR, non-TIMEOUT status, the case for blocking landing is more obviously strong, but it does disadvantage tests written against implemenations we can't run in CI (notably Safari/WebKit and Edge)

foolip · 2018-05-31T13:14:56Z

@zcorpan @jugglinmike, in your triage of tests that produce harness errors, have you come across any that you don't believe are test bugs? Failing to find such hard cases is IMHO a reason to prevent them from being introduced as suggested in this issue, but I think we should still have an escape hatch in lint.whitelist for cases when browser bugs are the cause.

zcorpan · 2018-05-31T13:31:32Z

I found #11269 (comment) today.

Maybe we shouldn't block landing of TIMEOUT or even ERROR, but making it visible when it happens somehow could be useful; in most cases it's bug in the test.

jgraham · 2018-05-31T13:54:52Z

That was one of the things that I thought was planned for the PR dashboard before it got abandoned. I certainly think that requiring people to think before landing a PR that introduces ERROR everywhere is a good plan.

foolip · 2018-05-31T15:28:19Z

Indeed, there are patterns of changes that we want to make sure do not happen accidentally, but which are still sometimes correct. For things like that, I think the options are to either post a comment about it, which is possible to overlook, or to have a failing status check that leads to a page describing a problem, along with a "make the status green" checkbox for when it's actually appropriate. A bit complex, I know. cc @lukebjerring

foolip · 2018-08-03T12:05:29Z

As a first step towards this, when implementing #7475, we can treat new harness errors as more serious and a different category of problem than new failing tests.

@jugglinmike, do you have a somewhat up-to-date link to the report of harness errors, so we can see just how long the path to eliminating harness errors might be?

jugglinmike · 2018-08-03T19:07:31Z

Not off hand, but it's quick to generate one. Here are the failures for WPT revision 4a15e98, dated Thu Aug 2 11:45:23 2018 +0200:

357 tests ~~failing~~ producing errors in 1 browser(s)

137 tests ~~failing~~ producing errors in 2 browser(s)

65 tests ~~failing~~ producing errors in 3 browser(s)

30 tests ~~failing~~ producing errors in 4 browser(s)

2 tests ~~failing~~ producing errors in 5 browser(s)

0 tests ~~failing~~ producing errors in 6 browser(s)

Script used to collect these results

#!/usr/bin/env python

import gzip
import json
import sys

def count_errors(filenames):
    by_title = {}
    revision = None

    for filename in filenames:
        with gzip.open(filename) as handle:
            results = json.load(handle)

        if revision is None:
            revision = results['run_info']['revision']

        if revision != results['run_info']['revision']:
            raise Exception('revision mismatch: %s and %s' % (revision, results['run_info']['revision']))

    for filename in filenames:
        with gzip.open(filename) as handle:
            results = json.load(handle)

        for result in results['results']:
            if result['status'] != 'ERROR':
                continue

            title = result['test']

            if title not in by_title:
                by_title[title] = []

            by_title[title].append(filename)

    by_count = [[] for _ in xrange(len(filenames))]

    for title, filenames in by_title.iteritems():
        by_count[len(filenames) - 1].append(title)

    for titles in by_count:
        titles.sort()

    return revision, by_count

if __name__ == '__main__':
    revision, by_count = count_errors(sys.argv[1:])

    for index, titles in enumerate(by_count):
        browser_count = index + 1
        failure_count = len(titles)
        title_listing = '\n'.join(
            ['- [{path}](https://wpt.fyi/results{path}?sha={revision})'.format(revision=revision, path=title) for title in titles]
        )
        print '''<details>
<summary>
  {failure_count} tests producing errors in {browser_count} browser(s)
</summary>

{title_listing}
</details>'''.format(
            failure_count=failure_count, browser_count=browser_count,
            title_listing=title_listing
        )

jugglinmike · 2018-08-03T19:08:19Z

Uhg, by "failing" I really mean "producing errors"

foolip · 2018-08-03T21:00:11Z

Alright, that isn't a huge number, 591 in total. I know that @zcorpan has already fixed a bunch. @jgraham, if we're able to fix >80% of these as test bugs, do you think that'd be a strong enough case and not many remain because of implementation bugs, would that be a strong enough case to turn harness errors into something that by default fails PRs unless the test is whitelisted?

Found via #10877 (comment).

foolip · 2018-08-03T21:10:03Z

@jugglinmike, the list in #10877 (comment) is great! I sent PRs to fix the "producing errors in 5 browser(s)" tests, and I imagine many of the rest are similar oversights. Preventing this with tooling would be fabufantastic.

Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via #10877 (comment).

jgraham · 2018-08-04T18:39:34Z

I certainly agree that some mechanism to require manual approval if there are errors in multiple browsers seems reasonable, independent of how hard it is to fix. For tests that only error in a single browser, I guess it depends what the fix looks like; in general I guess these addiitional checks aren't going to get run on the Chrome/Gecko/other upstream side, so it seems like triggering this failure too often might be problematic for developers primarilly contributing upstream. Therefore if the ERROR case mostly describes valid tests that could simply have been written more carefully to not error when the features are missing, glagging them seems like a poor tradeoff (so far I haven't seen complaints about tests we have synced that error, but I did just get a complaint about some tests we synced that were buggy and failed, so we should be realistic about the value of treating ERROR as uniquely bad).

All that said, I don't know what the mechanism for allowing tests that produce multiple errors through should be; putting it in the source seems wrong because it's not a property of a specific git revision, but a property of running that revision in specific revisions of a set of browsers. Ideally I would like some kind of checkbox in the PR like

[x] Test that ERRORs is not buggy.

But enforcing that could be tricky without building a landing bot.

foolip · 2018-08-04T21:16:35Z

Therefore if the ERROR case mostly describes valid tests that could simply have been written more carefully to not error when the features are missing, glagging

I'm unable to guess what this is a type of...

them seems like a poor tradeoff (so far I haven't seen complaints about tests we have synced that error, but I did just get a complaint about some tests we synced that were buggy and failed, so we should be realistic about the value of treating ERROR as uniquely bad).

Sure, harness errors aren't the worst kind of problem a test can have, but it is one that's pretty easy to create tooling around, which would help catch some smallish mistakes. (See my 3 PRs above.)

Ideally I would like some kind of checkbox in the PR

With https://developer.github.com/v3/checks/ we'd be able to have a button that says "this harness error isn't a bug", see the "Ignore this" in https://github.com/foolip/web-platform-tests/runs/9087881 for an example.

It would require us to create/host some infrastructure for such status checks, though, or perhaps wait for Taskcluster to get it.

Found via #10877 (comment).

jgraham · 2018-08-06T21:36:18Z

I'm unable to guess what this is a type of...

flagging.

With https://developer.github.com/v3/checks/ we'd be able to have a button that says "this harness error isn't a bug", see the "Ignore this" in https://github.com/foolip/web-platform-tests/runs/9087881 for an example.

I don't see such a button, are you sure you don't have to be an admin of the repo to use it? That would be equivalent to the situation today, but we certainly want the PR author or reviewers to be able to override the checks.

foolip · 2018-08-07T09:25:21Z

Yeah, I guess you need write access to the repo. Here's a screenshot:

…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303

Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via #10877 (comment).

…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305

foolip · 2018-10-09T22:02:50Z

Given that we'll stop running stability checks in Travis in favor results+stability in Taskcluster, this issue isn't quite accurate. I'll change the title to reflect that it might not involve Travis.

mdittmer · 2018-12-12T21:12:18Z

Ping from your friendly neighbourhood ecosystem infra rotation

If this is priority:roadmap should it have an assignee?

lukebjerring · 2018-12-12T21:44:56Z

So, in theory a harness error will be picked up as a regression by the wpt.fyi status check.
That check is currently neutral when it detects a regression, but we do have a feature for making it fail (and an Ignore action built into the check), which I think satisfies this issues' current title:

Make harness errors block PRs (with override)

So, currently blocked on launch of https://github.com/web-platform-tests/wpt.fyi/projects/6 ?

foolip · 2018-12-12T23:01:20Z

@lukebjerring can this be assigned to you, or is harness errors further down the line than regressions? I guess harness error regressions are already prevented? I actually don't know that we need to do more than that...

foolip · 2019-03-22T10:22:08Z

This is something I think we might still want to do, leaving as roadmap but probably won't get done in Q2 because @lukebjerring is OOO.

foolip · 2019-06-10T13:52:41Z

@lukebjerring is this something we should look at again in Q2? I don't know how big of a problem it is that tests with new harness errors land vs. the effort required to prevent it. I guess it'd just be a failing wpt.fyi check, and a new action for "I really mean it"?

…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303 UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74

…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305 UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7

…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303 UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74

…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305 UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7

zcorpan mentioned this issue May 7, 2018

Disambiguating uncaught errors #10853

Open

foolip added infra priority:roadmap labels May 7, 2018

Ms2ger mentioned this issue May 30, 2016

Avoid panics when using HTMLAnchorElement attribute setters #3086

Merged

foolip added a commit that referenced this issue Aug 3, 2018

Fix a t.assert_unreached typo in a CSP test helper

a5dd140

Found via #10877 (comment).

foolip mentioned this issue Aug 3, 2018

Fix a t.assert_unreached typo in a CSP test helper #12303

Merged

foolip added a commit that referenced this issue Aug 3, 2018

Wrap a stray assert (causing harness error) in a test

5c627e5

Found via #10877 (comment).

foolip mentioned this issue Aug 3, 2018

Wrap a stray assert (causing harness error) in a test #12304

Merged

foolip added a commit that referenced this issue Aug 3, 2018

Wrap an assert in step_func in a CSP test

d481e92

Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via #10877 (comment).

foolip mentioned this issue Aug 3, 2018

Wrap an assert in step_func in a CSP test #12305

Merged

foolip added a commit that referenced this issue Aug 6, 2018

Fix a t.assert_unreached typo in a CSP test helper (#12303)

64a9cbb

Found via #10877 (comment).

foolip added a commit that referenced this issue Aug 9, 2018

Wrap an assert in step_func in a CSP test (#12305)

590992e

Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via #10877 (comment).

foolip mentioned this issue Aug 17, 2018

Distinguish harness status from subtests in test results view web-platform-tests/wpt.fyi#449

Closed

foolip changed the title ~~Make harness errors fail in Travis~~ Make new harness errors block PRs (with bypass mechanism) Oct 9, 2018

foolip changed the title ~~Make new harness errors block PRs (with bypass mechanism)~~ Make harness errors block PRs (with bypass mechanism) Oct 9, 2018

foolip changed the title ~~Make harness errors block PRs (with bypass mechanism)~~ Make harness errors block PRs (with override) Oct 9, 2018

lukebjerring self-assigned this Dec 13, 2018

LukeZielinski unassigned lukebjerring Mar 31, 2020

LukeZielinski added priority:backlog and removed priority:roadmap labels Mar 31, 2020

gsnedders mentioned this issue Feb 7, 2023

Blocking metrics updates on test regressions web-platform-tests/interop#276

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make harness errors block PRs (with override) #10877

Make harness errors block PRs (with override) #10877

zcorpan commented May 7, 2018

foolip commented May 7, 2018

jgraham commented May 9, 2018

foolip commented May 9, 2018

zcorpan commented May 31, 2018

jgraham commented May 31, 2018

foolip commented May 31, 2018

zcorpan commented May 31, 2018

jgraham commented May 31, 2018

foolip commented May 31, 2018

foolip commented Aug 3, 2018

jugglinmike commented Aug 3, 2018 •

edited

Loading

jugglinmike commented Aug 3, 2018

foolip commented Aug 3, 2018

foolip commented Aug 3, 2018

jgraham commented Aug 4, 2018 •

edited

Loading

foolip commented Aug 4, 2018

jgraham commented Aug 6, 2018

foolip commented Aug 7, 2018

foolip commented Oct 9, 2018

mdittmer commented Dec 12, 2018

lukebjerring commented Dec 12, 2018

foolip commented Dec 12, 2018

foolip commented Mar 22, 2019

foolip commented Jun 10, 2019

Make harness errors block PRs (with override) #10877

Make harness errors block PRs (with override) #10877

Comments

zcorpan commented May 7, 2018

foolip commented May 7, 2018

jgraham commented May 9, 2018

foolip commented May 9, 2018

zcorpan commented May 31, 2018

jgraham commented May 31, 2018

foolip commented May 31, 2018

zcorpan commented May 31, 2018

jgraham commented May 31, 2018

foolip commented May 31, 2018

foolip commented Aug 3, 2018

jugglinmike commented Aug 3, 2018 • edited Loading

jugglinmike commented Aug 3, 2018

foolip commented Aug 3, 2018

foolip commented Aug 3, 2018

jgraham commented Aug 4, 2018 • edited Loading

foolip commented Aug 4, 2018

jgraham commented Aug 6, 2018

foolip commented Aug 7, 2018

foolip commented Oct 9, 2018

mdittmer commented Dec 12, 2018

lukebjerring commented Dec 12, 2018

foolip commented Dec 12, 2018

foolip commented Mar 22, 2019

foolip commented Jun 10, 2019

jugglinmike commented Aug 3, 2018 •

edited

Loading

jgraham commented Aug 4, 2018 •

edited

Loading