-
Notifications
You must be signed in to change notification settings - Fork 46
Research: Why do non-Sauce Labs browsers report fewer than 100% of results? #478
Comments
As discussed previously, the term "completeness" can be interpreted in a few ways. If we define "completeness" in terms of "tests executed" (as opposed to "subtests executed"), we can calculate a score for a given result set procedurally. By that coarse-grained metric, we are now consistently collecting results for 100% of tests in both Chrome and Firefox. For Safari, we are collecting results for 99% of the available tests. Most of the missing results are for "wdspec" (WebDriver specification) tests. This is expected because WPT does not run wdspec tests for Sauce Labs-mediated browsers. The relevant code is here, and you can verify experimentally by running For all trials under consideration here, only a single "chunk" omits results. When we ran Safari with chunk 81 of 100 of WPT at revision 709111adbc, we failed to collect results for 6 tests. In the logs currently available at http://builds.wpt.fyi, the Sauce Connect Tunnel closed abruptly near the end of that trial, which caused the WPT CLI to finish results collection without recording results for the six tests that remained. I can't explain why this occurred; the test being executed at the time was a nondescript reference test. @foolip I think this puts to rest our concerns about high-level completeness. As we've been discussing, understanding subtest-level completeness is much trickier. Not only is the expected number impossible to determine procedurally, it can also be influenced by implementation status (meaning disparities do not always reflect a error in results collection). Possible explanations include:
...and of course, there is the possibility of still other classes of errors that we have not yet identified. Due to the uncertainty here, I'm still interested in researching completeness at the subtest level. Following the introduction of experimental Firefox and Chrome, I'd like to spend 3-5 days investigating the results in those terms. I'd organize my findings into a report so that we could understand the scope of the problem and what fixing it would look like. "Completeness" script#!/bin/bash
shas='162e9dda47
2b3f901276
c218fe33f4
1331d68a18
6e2b4a77cb
7164dbb89f
939b30a8b2
24f7e6d2f6
af485dcf5f
3589b85af3
fc33be9acf
21be95d426
067f918f9a
d01ce1055e
709111adbc
149116dc79
a87d0554fa
55846d56d8
bc7c640b39
42f43956d4
6fca0b2cd6
383fd735a5
45d92ebc55
e96e038f14'
platforms='firefox-59.0-linux chrome-64.0-linux safari-11.0-macos-10.12-sauce'
echo sha,expected,platform,actual,completeness
for sha in $shas; do
git checkout $sha -- > /dev/null 2>&1
tests_expected=$(./wpt run --list-tests firefox | grep -E '^/' | wc -l)
for platform in $platforms; do
summary_url=https://storage.googleapis.com/wptd/$sha/$platform-summary.json.gz
results=$(curl --fail $summary_url 2> /dev/null)
if [ $? != '0' ]; then
continue
fi
count=$(echo $results | python -c 'import json; import sys; print len(json.load(sys.stdin))')
pct=$(python -c "print 100 * float($count) / $tests_expected")
printf '%s,%s,%s,%s,%s\n' $sha $tests_expected $platform $count $pct
done
done "Completeness" script results
Missing results identification script#!/bin/bash
shas='c218fe33f4
3589b85af3
709111adbc
45d92ebc55'
platform='safari-11.0-macos-10.12-sauce'
for sha in $shas; do
git checkout $sha -- > /dev/null 2>&1 || exit 1
expected_tests=$(./wpt run --list-tests firefox | grep -E '^/')
summary_url=https://storage.googleapis.com/wptd/$sha/$platform-summary.json.gz
results=$(curl --fail $summary_url 2> /dev/null)
actual_tests=$(echo $results | python -c 'import json; import sys; print "\n".join(json.load(sys.stdin).keys())')
echo $sha
comm -23 <(echo "$expected_tests" | sort) <(echo "$actual_tests" | sort)
echo
done Missing results from recent Safari trials"Missing" wdspec tests have been removed.
Excerpt of logs at the time of Safari crash
|
Excellent writeup hiding in my inbox, thanks for brining my attention to it! On wdspec, I agree that trying to run them over Sauce doesn't make sense, and that we can just not run them using For Sauce, is each chunk not being retried 3 times? Does this mean that the Sauce tunnel was closed all three times for those incomplete Safari runs? "Undetected browser/WebDriver crashes" seems like it would have to result in missing results and not partial subtest, results, right? Addressing that and anything else that results in less than 100% manifest-test completion (modulo wdspec) seems important to me. Also bugs in On testharness.js subtest completion, I think I would rank the issues roughly in order of tests fixed per time spent fixing, so:
At the end of this is is roughly where I would put investigating mismatches subtest mismatches that do also come with test failures/timeouts/etc. or harness errors. If we could count them (we can!) then it'd be easier to say how big the pile is, but my hunch is that it's big and only ~50-80% of the cases are ones we'd really want to fix, i.e. there's no way to lock in the wins. Related to locking in the wins, fixing all harness errors and making Travis block on harness errors seems like a decent opportunity. |
The source code I linked to implements this for Sauce Labs-powered browsers. To
That recovery mechanism went by the wayside when I introduced Buildbot. Today,
Yup. This is the source of the consistent manifest-level incompleteness when
I have not been able to reproduce this locally in a Windows VM, so it may be
@rwaldron tells me that the referenced report was created at the end of last
The "ERROR" status is applied to tests which produce an uncaught exception. Output from one of two consistently-incomplete Edge trials
|
Yes :) Commenting there.
Yeah, doing the work to figure out how many failures are in each of the interesting buckets, or other interesting ones you come up with, sounds very useful. Maybe it's easier starting from single JSON files per run, so starting to save those somewhere ASAP might be a good idea.
I think I may have become too acclimated to terrible things to see the problem here :) I don't actually know when "ERROR" is the test-level status, but that uncaught exceptions result in a harness errors does make good sense to me. Without it Would a serious effort to get rid of harness errors help, or is the "ERROR" status something different? |
Actually, I somehow forgot that
We won't be able to answer this by looking at the report data, and that's why |
I don't disagree, and making |
Do you mean just looking to see in which directories most of the problems are? What answers couldn't we get by looking at the same information that is available to wpt.fyi? |
I mean that because the "ERROR" status can be applied for a number of reasons, we'll need to review the specific conditions that produced it. Focusing on the groups of similar tests that report an ERROR is a way to avoid getting lost in the weeds. I have to admit that I'm getting a little confused, though. I just discovered that uncaught exceptions are reported as "FAIL" for "single-page" tests. I can see why this would be the case, but it complicates the way I think about the different statuses. Further complicating the classification is the fact that all tests are assumed to be single-page tests until they use a function like An example is geolocation-sensor/GeolocationSensor-enabled-by-feature-policy-attribute-redirect-on-load.https.html, which is reported as a "Failure" on wpt.fyi. Using only the data available there, we might say, "That is not a problem; it is simply a test failure which accurately describes implementation status." But if we agree that tests should not produce uncaught errors, then the truth is that the test needs to be re-written so that it runs to completion, defining (and failing) a number of sub-tests. At this point, our conversation about the investigation is evolving into the investigation itself... Except I'm not being nearly as disciplined about my approach as the problem deserves. Maybe it's enough to say this justifies dedicating time for creating that report. |
So, re:
Currently wptrunner does not distinguish these cases internally, but it could (we do something similar for timeouts, distinguishing between the case where the js timeout fires and the case where we are still running a few seconds after that timeout ought to have fired). I don't think it would be hard to fix things so that harness problems always result in an |
To me, this sounds like the bug is in wptrunner insofar as we don't handle the Sauce Connect Tunnel closing abruptly? |
Here's the script I wrote to analyze the results data: analyze.pyimport json
import os
import re
import sys
class Result(object):
def __init__(self, outer_dir, test_file):
self.name = test_file
self.outer_dir = outer_dir
with open(os.path.join(outer_dir, test_file)) as handle:
self._data = json.load(handle)
@property
def url(self):
return 'https://wpt.fyi/%s' % self.name
@property
def is_error(self):
return self._data['status'] == 'ERROR'
@property
def is_timeout(self):
return self._data['status'] == 'TIMEOUT'
@property
def is_okay(self):
return self._data['status'] == 'OK'
@property
def total_subtests(self):
return len(self._data['subtests'])
@property
def is_reference_failure(self):
if self.total_subtests != 1:
return False
return bool(re.search('ReferenceError:| is undefined',
str(self._data['subtests'][0]['message'])))
@property
def potential_error(self):
'''Identify single-page tests that are reported as failing. If a test
file containing multiple sub-tests produces an uncaught error before
any sub-tests are defined, it will be classified in this way (despite
being technially a harness error.'''
if not self.is_okay:
return False
if len(self._data['subtests']) > 1:
return False
return self._data['subtests'][0]['status'] == 'FAIL'
def walk_mirrored(dirs):
ref = dirs[0]
for root, _, file_names in os.walk(ref):
wpt_dir = root[len(ref)+1:]
for file_name in file_names:
results = []
for outer_dir in dirs:
full_name = os.path.join(outer_dir, wpt_dir, file_name)
try:
result = Result(outer_dir, os.path.join(wpt_dir, file_name))
except (IOError, OSError):
continue
results.append(result)
yield results
def compare(results):
all_error = reduce(lambda acc, result: acc and result.is_error, results, True)
all_timeout = reduce(lambda acc, result: acc and result.is_timeout, results, True)
all_bad = reduce(lambda acc, result: acc and (result.is_error or result.is_timeout), results, True)
any_bad = reduce(lambda acc, result: acc or (result.is_error or result.is_timeout), results, False)
all_single_fail = reduce(lambda acc, result: acc and result.potential_error, results)
subtest_counts_differ = False
subtest_counts_differ_meaningfully = False
total_subtests = results[0].total_subtests
for result in results[1:]:
if result.total_subtests == total_subtests:
continue
subtest_counts_differ = True
if not result.is_reference_failure:
subtest_counts_differ_meaningfully = True
if all_error:
#if all_timeout:
#if not all_error and not all_timeout and all_bad:
#if subtest_counts_differ and not any_bad:
#if subtest_counts_differ_meaningfully and not any_bad:
#if any_bad and not all_bad:
print(results[0].url)
if __name__ == '__main__':
directories = sys.argv[1:]
if len(directories) < 1:
raise ValueError('At least one results directory must be provided')
for results in walk_mirrored(directories):
compare(results) |
FYI - I'm working on visual diffs.
https://experimental-diff-dot-wptdashboard.appspot.com/results/?label=chrome&diff
https://experimental-diff-dot-wptdashboard.appspot.com/results/?label=firefox&diff
…On Wed, 2 May 2018 at 20:42 jugglinmike ***@***.***> wrote:
Another interesting way to look at the data is to compare results between
stable and experimental releases of the same browser. We can't assign much
significance to the pass/fail rate itself (because as discussed above, more
advanced implementations may "unlock" more sub-tests and report an overall
lower percentage). However, it seems fair to expect that the number of
passing sub-tests (that is, ignoring the total number of sub-tests) should
never decrease. Even when an experimental release is subject to more
sub-tests than a stable release, it shouldn't fail where the stable release
passes.
It turns out that this is not always true! In 84 tests, the experimental
release of Chrome passes fewer sub-tests than the stable release. There are
51 tests where the stable/experimental Firefox results demonstrate this
pattern.
Three explanations for this come to mind:
- The experimental release of the browser has regressed in some way
(which seems highly unlikely given the rigor of each browser's release
process)
- The test or the browser is flaky, and we're observing the flakiness
between the two test executions (i.e. the difference in release channel is
only a coincidence)
- The test has been authored with some questionable practice
I've only just begun to look into these tests, but I have found an
explanation for 75% of the Chrome cases: the recent change in media
"autoplay" policy
<https://developers.google.com/web/updates/2017/09/autoplay-policy-changes>.
This doesn't quite fit in to any of the explanations listed above.
Although we've identified this issue by comparing "stable" and
"experimental" results, the dataset is showing its age. Chrome 66 has
recently been promoted to "stable", so on https://wpt.fyi today, we can
see the errors introduced by the policy change. For example:
https://wpt.fyi/media-source/mediasource-redundant-seek.html?sha=43dd25c888
That makes the problem much more visible, so I've submitted a patch at
gh-552 <#552>
.
I'm curious to see if there are any other patterns in the tests that
remain.
Suspect tests from Chrome stable/experimental comparison
- "Ps" - sub-tests passing in stable release
- "Pe" - sub-tests passing in experimental release
Ps Pe Test
10 8 /webauthn/createcredential-badargs-user.https.html
<https://wpt.fyi/webauthn/createcredential-badargs-user.https.html?sha=709111adbc>
3 1 /webauthn/createcredential-extensions.https.html
<https://wpt.fyi/webauthn/createcredential-extensions.https.html?sha=709111adbc>
5 0 /webauthn/createcredential-badargs-authnrselection.https.html
<https://wpt.fyi/webauthn/createcredential-badargs-authnrselection.https.html?sha=709111adbc>
8 0 /webauthn/createcredential-badargs-rp.https.html
<https://wpt.fyi/webauthn/createcredential-badargs-rp.https.html?sha=709111adbc>
1 0 /media-source/mediasource-endofstream.html
<https://wpt.fyi/media-source/mediasource-endofstream.html?sha=709111adbc>
250 248
/WebCryptoAPI/wrapKey_unwrapKey/wrapKey_unwrapKey.https.worker.html
<https://wpt.fyi/WebCryptoAPI/wrapKey_unwrapKey/wrapKey_unwrapKey.https.worker.html?sha=709111adbc>
316 310 /css/cssom-view/interfaces.html
<https://wpt.fyi/css/cssom-view/interfaces.html?sha=709111adbc>
5 4 /html/browsers/the-window-object/window-open-noopener.html
<https://wpt.fyi/html/browsers/the-window-object/window-open-noopener.html?sha=709111adbc>
1 0
https://wpt.fyi/html/semantics/embedded-content/the-iframe-element/sandbox_002.htm
2 0
/html/semantics/embedded-content/media-elements/autoplay-with-broken-track.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/autoplay-with-broken-track.html?sha=709111adbc>
4 2
/html/semantics/embedded-content/media-elements/event_order_canplay_playing.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/event_order_canplay_playing.html?sha=709111adbc>
4 2 /html/semantics/embedded-content/media-elements/event_pause.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/event_pause.html?sha=709111adbc>
4 2 /html/semantics/embedded-content/media-elements/event_playing.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/event_playing.html?sha=709111adbc>
2 0 /html/semantics/embedded-content/media-elements/event_timeupdate.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/event_timeupdate.html?sha=709111adbc>
1 0
https://wpt.fyi/html/semantics/embedded-content/media-elements/video_008.htm
4 2
/html/semantics/embedded-content/media-elements/paused_false_during_play.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/paused_false_during_play.html?sha=709111adbc>
4 2
/html/semantics/embedded-content/media-elements/readyState_during_playing.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/readyState_during_playing.html?sha=709111adbc>
4 2 /html/semantics/embedded-content/media-elements/event_play.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/event_play.html?sha=709111adbc>
1 0
/html/semantics/embedded-content/media-elements/ready-states/autoplay-with-slow-text-tracks.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/ready-states/autoplay-with-slow-text-tracks.html?sha=709111adbc>
4 0
/html/semantics/embedded-content/media-elements/loading-the-media-resource/autoplay-overrides-preload.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/loading-the-media-resource/autoplay-overrides-preload.html?sha=709111adbc>
1 0
/html/semantics/embedded-content/media-elements/loading-the-media-resource/resource-selection-invoke-set-src-networkState.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/loading-the-media-resource/resource-selection-invoke-set-src-networkState.html?sha=709111adbc>
1 0
/html/semantics/embedded-content/media-elements/track/track-element/track-cues-pause-on-exit.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/track/track-element/track-cues-pause-on-exit.html?sha=709111adbc>
11 10 /webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html
<https://wpt.fyi/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html?sha=709111adbc>
1 0
/mediacapture-streams/MediaStreamTrack-MediaElement-disabled-audio-is-silence.https.html
<https://wpt.fyi/mediacapture-streams/MediaStreamTrack-MediaElement-disabled-audio-is-silence.https.html?sha=709111adbc>
3 2
/content-security-policy/reporting/report-same-origin-with-cookies.html
<https://wpt.fyi/content-security-policy/reporting/report-same-origin-with-cookies.html?sha=709111adbc> Suspect
tests from Firefox stable/experimental comparison
- "Ps" - sub-tests passing in stable release
- "Pe" - sub-tests passing in experimental release
Ps Pe Test
1 0 /preload/dynamic-adding-preload.html
<https://wpt.fyi/preload/dynamic-adding-preload.html?sha=709111adbc>
1 0 /preload/preload-with-type.html
<https://wpt.fyi/preload/preload-with-type.html?sha=709111adbc>
1 0 /preload/delaying-onload-link-preload-after-discovery.html
<https://wpt.fyi/preload/delaying-onload-link-preload-after-discovery.html?sha=709111adbc>
4 2 /intersection-observer/same-document-zero-size-target.html
<https://wpt.fyi/intersection-observer/same-document-zero-size-target.html?sha=709111adbc>
10 5 /intersection-observer/multiple-thresholds.html
<https://wpt.fyi/intersection-observer/multiple-thresholds.html?sha=709111adbc>
5 3 /intersection-observer/multiple-targets.html
<https://wpt.fyi/intersection-observer/multiple-targets.html?sha=709111adbc>
4 3 /intersection-observer/same-document-no-root.html
<https://wpt.fyi/intersection-observer/same-document-no-root.html?sha=709111adbc>
10 5
/webaudio/the-audio-api/the-convolvernode-interface/convolver-response-1-chan.html
<https://wpt.fyi/webaudio/the-audio-api/the-convolvernode-interface/convolver-response-1-chan.html?sha=709111adbc>
8 6 https://wpt.fyi/webdriver/tests/element_send_keys/form_controls.py
79 62 https://wpt.fyi/webdriver/tests/interaction/element_clear.py
11 9 /css/css-tables/table-model-fixup.html
<https://wpt.fyi/css/css-tables/table-model-fixup.html?sha=709111adbc>
10 7 /css/css-tables/html5-table-formatting-2.html
<https://wpt.fyi/css/css-tables/html5-table-formatting-2.html?sha=709111adbc>
22 19 /css/css-tables/table-model-fixup-2.html
<https://wpt.fyi/css/css-tables/table-model-fixup-2.html?sha=709111adbc>
5 2 /css/css-tables/html5-table-formatting-1.html
<https://wpt.fyi/css/css-tables/html5-table-formatting-1.html?sha=709111adbc>
12 4 /css/selectors/focus-within-009.html
<https://wpt.fyi/css/selectors/focus-within-009.html?sha=709111adbc>
156 118 /css/css-values/viewport-units-css2-001.html
<https://wpt.fyi/css/css-values/viewport-units-css2-001.html?sha=709111adbc>
8 7 /css/cssom/shorthand-values.html
<https://wpt.fyi/css/cssom/shorthand-values.html?sha=709111adbc>
17 15 /css/css-grid/layout-algorithm/grid-find-fr-size-gutters-001.html
<https://wpt.fyi/css/css-grid/layout-algorithm/grid-find-fr-size-gutters-001.html?sha=709111adbc>
2 1 /css/cssom-view/elementsFromPoint-iframes.html
<https://wpt.fyi/css/cssom-view/elementsFromPoint-iframes.html?sha=709111adbc>
5 1 /css/cssom-view/elementsFromPoint-simple.html
<https://wpt.fyi/css/cssom-view/elementsFromPoint-simple.html?sha=709111adbc>
4 0 /css/cssom-view/elementsFromPoint-table.html
<https://wpt.fyi/css/cssom-view/elementsFromPoint-table.html?sha=709111adbc>
4 0 /css/cssom-view/elementsFromPoint-svg.html
<https://wpt.fyi/css/cssom-view/elementsFromPoint-svg.html?sha=709111adbc>
40 36 /css/cssom-view/scrollintoview.html
<https://wpt.fyi/css/cssom-view/scrollintoview.html?sha=709111adbc>
3 1 /css/cssom-view/elementsFromPoint-invalid-cases.html
<https://wpt.fyi/css/cssom-view/elementsFromPoint-invalid-cases.html?sha=709111adbc>
2 1 /css/cssom-view/CaretPosition-001.html
<https://wpt.fyi/css/cssom-view/CaretPosition-001.html?sha=709111adbc>
5 4 /mathml/presentation-markup/spaces/space-1.html
<https://wpt.fyi/mathml/presentation-markup/spaces/space-1.html?sha=709111adbc>
5 4 /mathml/presentation-markup/scripts/underover-1.html
<https://wpt.fyi/mathml/presentation-markup/scripts/underover-1.html?sha=709111adbc>
5 4 /mathml/presentation-markup/scripts/subsup-3.html
<https://wpt.fyi/mathml/presentation-markup/scripts/subsup-3.html?sha=709111adbc>
2 0
/uievents/order-of-events/focus-events/focus-automated-blink-webkit.html
<https://wpt.fyi/uievents/order-of-events/focus-events/focus-automated-blink-webkit.html?sha=709111adbc>
2 0 /webvr/webvr-disabled-by-feature-policy.https.sub.html
<https://wpt.fyi/webvr/webvr-disabled-by-feature-policy.https.sub.html?sha=709111adbc>
1 0 /2dcontext/drawing-paths-to-the-canvas/drawFocusIfNeeded_005.html
<https://wpt.fyi/2dcontext/drawing-paths-to-the-canvas/drawFocusIfNeeded_005.html?sha=709111adbc>
1 0 /2dcontext/drawing-paths-to-the-canvas/drawFocusIfNeeded_001.html
<https://wpt.fyi/2dcontext/drawing-paths-to-the-canvas/drawFocusIfNeeded_001.html?sha=709111adbc>
1 0 /2dcontext/drawing-paths-to-the-canvas/drawFocusIfNeeded_004.html
<https://wpt.fyi/2dcontext/drawing-paths-to-the-canvas/drawFocusIfNeeded_004.html?sha=709111adbc>
14 0
/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/open-features-tokenization-width-height.html
<https://wpt.fyi/html/browsers/the-window-object/apis-for-creating-and-navigating-browsing-contexts-by-name/open-features-tokenization-width-height.html?sha=709111adbc>
128 126 /html/infrastructure/urls/resolving-urls/query-encoding/utf-8.html
<https://wpt.fyi/html/infrastructure/urls/resolving-urls/query-encoding/utf-8.html?sha=709111adbc>
128 126
/html/infrastructure/urls/resolving-urls/query-encoding/utf-16le.html
<https://wpt.fyi/html/infrastructure/urls/resolving-urls/query-encoding/utf-16le.html?sha=709111adbc>
112 110
/html/infrastructure/urls/resolving-urls/query-encoding/windows-1252.html
<https://wpt.fyi/html/infrastructure/urls/resolving-urls/query-encoding/windows-1252.html?sha=709111adbc>
128 126
/html/infrastructure/urls/resolving-urls/query-encoding/utf-16be.html
<https://wpt.fyi/html/infrastructure/urls/resolving-urls/query-encoding/utf-16be.html?sha=709111adbc>
1 0 /html/editing/focus/composed.window.html
<https://wpt.fyi/html/editing/focus/composed.window.html?sha=709111adbc>
1 0
/html/editing/focus/sequential-focus-navigation-and-the-tabindex-attribute/focus-tabindex-positive.html
<https://wpt.fyi/html/editing/focus/sequential-focus-navigation-and-the-tabindex-attribute/focus-tabindex-positive.html?sha=709111adbc>
1 0
/html/editing/focus/sequential-focus-navigation-and-the-tabindex-attribute/focus-tabindex-zero.html
<https://wpt.fyi/html/editing/focus/sequential-focus-navigation-and-the-tabindex-attribute/focus-tabindex-zero.html?sha=709111adbc>
1 0
/html/editing/focus/sequential-focus-navigation-and-the-tabindex-attribute/focus-tabindex-negative.html
<https://wpt.fyi/html/editing/focus/sequential-focus-navigation-and-the-tabindex-attribute/focus-tabindex-negative.html?sha=709111adbc>
2 0 /html/editing/focus/focus-management/focus-events.html
<https://wpt.fyi/html/editing/focus/focus-management/focus-events.html?sha=709111adbc>
60 0
/html/webappapis/system-state-and-capabilities/the-navigator-object/protocol.html
<https://wpt.fyi/html/webappapis/system-state-and-capabilities/the-navigator-object/protocol.html?sha=709111adbc>
1 0
/html/semantics/scripting-1/the-script-element/execution-timing/031.html
<https://wpt.fyi/html/semantics/scripting-1/the-script-element/execution-timing/031.html?sha=709111adbc>
30 0 /html/semantics/forms/textfieldselection/select-event.html
<https://wpt.fyi/html/semantics/forms/textfieldselection/select-event.html?sha=709111adbc>
1 0 /html/semantics/selectors/pseudo-classes/focus-autofocus.html
<https://wpt.fyi/html/semantics/selectors/pseudo-classes/focus-autofocus.html?sha=709111adbc>
5 0 /html/semantics/selectors/pseudo-classes/focus.html
<https://wpt.fyi/html/semantics/selectors/pseudo-classes/focus.html?sha=709111adbc>
1 0
/html/semantics/embedded-content/media-elements/track/track-element/track-cues-missed.html
<https://wpt.fyi/html/semantics/embedded-content/media-elements/track/track-element/track-cues-missed.html?sha=709111adbc>
7 1 /html/semantics/embedded-content/the-img-element/usemap-casing.html
<https://wpt.fyi/html/semantics/embedded-content/the-img-element/usemap-casing.html?sha=709111adbc>
14 13 /html/dom/usvstring-reflection.html
<https://wpt.fyi/html/dom/usvstring-reflection.html?sha=709111adbc>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#478 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAve5Or5ijB-6DxCVZ8q8pk2kXDlQvnDks5tulKMgaJpZM4SMjZs>
.
|
Even if it doesn't explain all of the failures, I would expect this to happen regularly, actually, and it's one of the first things I'd like to look for. From https://experimental-diff-dot-wptdashboard.appspot.com/results/?label=chrome&diff, I found http://w3c-test.org/longtask-timing/longtask-in-sibling-iframe-crossorigin.html which really does seem to time out more easily on Chrome Dev, but it's disabled for being flaky so we didn't notice. Even if it's mostly flaky tests we find this way, that's still good :) Comparing one stable version to the next would also be very interesting, although by then it's too late to catch regressions. |
No description provided.
The text was updated successfully, but these errors were encountered: