Add /api/badge #674

lukebjerring · 2018-10-18T13:44:23Z

Description

Copying @beaufortfrancois 's badge code, but with a few caching benefits server-side, and also handles a few bad-use-cases, e.g. a path of /, and a few other query params (e.g. product=chrome[experimental]).

Review Information

Visit /api/badge, for a product spec and a path, e.g. /api/badge?product=chrome&path=/webdriver

wpt-pr-bot · 2018-10-18T13:50:05Z

Staging instance deployed by Travis CI!
Running at https://badge-dot-wptdashboard-staging.appspot.com

beaufortfrancois · 2018-10-18T13:57:11Z

Yeah!

I don't see the label ;(
https://badge-dot-wptdashboard-staging.appspot.com/api/badge?product=chrome&path=/picture-in-picture

Moreover, why no color?
I liked the color to get a glimpse of implementation status.

wpt-pr-bot · 2018-10-18T13:57:33Z

Staging instance deployed by Travis CI!
Running at https://badge-dot-announcer-dot-wptdashboard-staging.appspot.com

lukebjerring · 2018-10-18T14:06:30Z

Eh, my bad, a few bugs (needs content type too)
https://zenith-case.glitch.me/public/examples.html

foolip · 2018-10-18T20:33:18Z

Should the colors align with what we arrive at in #423, or is there a de-facto color scheme for this type of badge that we should follow.

One thing that I'd be more concerned with here than on wpt.fyi proper is the discontinuity in color between 99% (yellow) and 100% (green) since once engine gets 100% they might push back on changes that will make them yellow again, and argue technically correct but pedantic points about the spec to prevent actually useful tests from being shared.

lukebjerring · 2018-10-19T06:51:30Z

Its just what was in the prototype I mimiced. That sounds like a specific example of the concerns raised by @jgraham about producing problematic insentives. Is this concern based on empirical evidence, or are we paranoid programming here? (We already go green on wpt.fyi for 100%.)

Happy to change the color scheme - did you have a specific suggestion?

foolip · 2018-10-19T17:15:46Z

So, when force to suggest a specific color scheme that would lessen the potential for gamification, it seems implausible that colors is an effective tool, and that being consistent with wpt.fyi proper is good.

Perhaps a better approach would be for us to recommend a little disclaimer for people to use after these badges in readmes.

So, I suggest the same color scheme as in #423 and that we have a single definition of the shades of red, yellow and green, so that we keep them in sync.

lukebjerring · 2018-10-25T14:10:50Z

OK; so I'll bundle the color changes for this into work done (at a later date) resolving #423.

foolip · 2018-10-26T08:10:54Z

Can you add API docs for this for how people should embed this? Right beneath that might be a good place to offer advice on how to use it responsibly and maybe even suggest disclaimer wording.

lukebjerring · 2018-10-26T18:45:16Z

api/README.md

+
+Latest chrome experimental run:
+
+![webdriver badge](https://staging.wpt.fyi/api/badge?product=chrome&path=/webdriver)


As a preview for what this will render like, I added a commit that uses this PR's staging environment here:
1ed6008?short_path=d4b47e8#diff-d4b47e85ce206fdbe458cb599edf2a90

(View it in rich mode)

beaufortfrancois · 2018-10-29T06:24:06Z

I'm looking forward to get this PR merged! Thanks for working on this.

jgraham · 2018-10-29T15:46:06Z

FTR I still think that providing and promoting this API is a bad idea, and I don't think the changes meaningfully address those concerns.

Turning results into marketing buttons for people who don't even read the spec seems to have few benefits and significant drawbacks in terms of the incentives it promotes. We have had many conversations about the difficulty of understanding the meaning of test results, and allowing them to be presented in a context-free manner in READMEs and promotional material ignores the concern that interpreting the results is hard without careful reading of a specification or knowledge of an implementation to get some idea of how meaningful and exhaustive the tests are. For that reason I think there's a clear difference between presenting the results alongside the spec text, or on a dashboard aimed at browser vendors with clear disclaimers, compared to putting them in a README or other introductory material that will be the first port of call for people with a shallow understanding of the topic. Hoping that people add this context into their documentation seems optimistic at best.

lukebjerring · 2018-10-29T20:43:04Z

There are some assumptions being made about the reason people desire the badges, which I'd defer to @beaufortfrancois to clarify/justify. My understanding is that those authoring a specific feature like to track implementation progress in each major vendor.

As discussed elsewhere, the motivation to provide this API here is "lesser of two evils", under the assumption that it's being done anyway; wpt.fyi is the suitable home for more control of the cachability, computation formulas, color consistency, context/disclaimer, etc.

That is to say, your argument's merits cascade past wpt.fyi and onto the endpoints (READMEs), and perhaps we should be instead arguing to prevent the badges being used at all (whether provided by us or not).

foolip · 2018-10-29T23:21:50Z

@jgraham, if you could totally control the presentation on https://github.com/WICG/picture-in-picture, how would you present the tests? Would replicating exactly the presentation of wpt.fyi as follows be unproblematic?

If we think that having badges or similar is going to be a net negative no matter how they're done, then we should just dissuade people from making/using them. That seems very likely to succeed given how small a community we are. In other words, I don't think the main reason to provide this API should be that we'll be in control and can react to bad usage, even if that's true.

Rather, I think that bringing attention to the test suite in an eye-catching way in READMEs and at the top of specs is likely to be net positive, it signals that the web standards community values interop testing, that doing it well is a point of pride. There is some non-zero amount of attention on the status of tests that is helpful to interop.

@beaufortfrancois perhaps you can elaborate a bit on what you'd like to convey in a compact form? Have you had any reactions from other vendors on the README yet and is there anything they'd like to see changed?

beaufortfrancois · 2018-10-30T09:08:13Z

For posterity, here's how looks the Picture-in-Picture README.md file for now:

I've added these custom badges for two reasons. The first was for me (spec author), I wanted to make sure all web platform tests related to Picture-in-Picture would be as complete as possible. I didn't want some of my tests to not run or fail unexpectedly in other browser vendor test suites. I used this to keep track of my progress. And since I visit quite often this page, this also gives me a hint of where things are moving.

The second reason was for web developers interested in Picture-in-Picture support. At a quick glance, they can tell which browsers support it or are about to. And I do believe (I may be too naive) that it would empower web developers to raise bugs or even fix tests where possible.

For info, other browser vendors didn't share their concern. If they do so, I'll be happy to discuss and remove if needed.

foolip · 2018-10-31T10:53:48Z

Thanks for the details, @beaufortfrancois. All, I'd like to point out that "didn't want some of my tests to not run or fail unexpectedly in other browser" is not just an aspiration, but that François put in a lot of effort to understand the issues:

jgraham · 2018-10-31T11:29:19Z

So I agree that these numbers can be useful if used responsibly. But one of the challenges of designing public-facing systems is to also consider how those systems can be used in a way that has unintended consequence. When we have previously discussed test results, presenting context free per-browser numbers has been identified as leading to poor incentives around submitting new tests that make the numbers look worse, or submitting tests that intentionally make competing browsers look worse (e.g. by having large numbers of tests that all fail due to the same edge-case bug). These are both failure modes that we have seen in the past when people have tried to use test results in a marketing context.

So whilst I'm sure that everyone in the issue has good intentions, I'm still wary of the proposed changes; in my mind they look a lot like we would be adding an API specifically to encourage the kind of presentation that has previously proven problematic.

Concretely I have several possible suggestions:

Don't link the test results from the README but in the spec itself, prefereably close to the text defining the feature under test. Presumably spec authors and developers of the kind who are likely to work on improving test coverage are looking at the spec itself, so this seems like it would satisfy the requirement to make the results available to those audiences.
Add badges based on the numbers in the interop view i.e. for the fraction of tests passed by 4/4 major engines, 3/4 etc. This is harder to understand, and maybe doesn't work for the use case of "I want to understand how different browsers are doing". But linking through to the wpt.fyi dashboard where there's more ability to set the context (and plans in the future to add e.g. annotations linking to per-engine bugs where available) makes it more likely that the results will be understood in a reasonable manner.
Have badges with super-broad categories like "No support", "Some passing tests", "Most tests pass". Again that wouldn't necessarily help with your use case of wanting to investigate the results in detail, but especially once we have annotations on wpt.fyi doing that in the context of the dashboard (so you can see which test failures are already triaged, etc.) is going to be more helpful anyway.

foolip · 2018-10-31T13:47:04Z

Thanks @jgraham! Getting the incentives right on wpt.fyi itself and when results are exposed in some form elsewhere is obviously very important, and a discussion we had at length at TPAC 2017, starting at "WPT Dashboard Interop Stats PRD". https://wpt.fyi/interop/ is what we built based on that discussion.

So the risks of bad incentives are pretty clear, and the fewer pixels we have the harder it is to convey nuance. But I think we should also take seriously the value of promoting testing discipline via wpt(.fyi) to a web platform engineer audience, and strike some sort of balance.

On the concrete suggestions:

Don't link the test results from the README but in the spec itself

@beaufortfrancois, have you tried out https://tabatkins.github.io/bikeshed/#testing? @tabatkins is planning some presentation changes there, maybe you'd like to try it and see if you have feedback?

However, I this doesn't address "At a quick glance, they can tell which browsers support it or are about to." How to extract that answer for wpt is something we've discussed many, many times, but don't have a ready answer for. @beaufortfrancois, is there a single test for picture-in-picture whose results would be the answer for that question?

Add badges based on the numbers in the interop view

@beaufortfrancois, WDYT, would a summary of https://wpt.fyi/interop/picture-in-picture be useful?

Have badges with super-broad categories

(Suggested by @gsnedders on IRC.)

I'm not sure I understand how this changes incentives. With a fine gradation of 0/N–N/N, or 0–100%, small improvements make for small changes. To divide up the space, if people actually care about which broad category each browser is in, won't that just create weird incentives around the boundaries between categories? (This concern is why, ultimately, we went with small color differences between 99% and 100% in the new https://staging.wpt.fyi/ color scheme, instead of the originally planned jump from yellow to green.)

jgraham · 2018-10-31T14:06:01Z

But I think we should also take seriously the value of promoting testing discipline via wpt(.fyi) to a web platform engineer audience

I strongly agree with the goal here. There is clear benefit to having engineers and their managers understand the state of implementations and have the information they need to prioritise issues likely to cause interop problems. However I'm unconvinced that this is the right way to achieve that.

[in-spec results] doesn't address "At a quick glance, they can tell which browsers support it or are about to."

That seems like the kind of thing that's handled by human curated listes like caniuse and the various browser intent-to-implement dashboards that are floating around. As you say, extracting that specific information from test results is surprisingly tricky since there are many confounding factors. It's also questionable whether it's a useful goal; sites like html5test.com were mainly concerned with answering this kind of broad question and it's certainly questionable whether the overall impact of such sites was positive.

I'm not sure I understand how [broad categories] changes incentives

It may not. But I think there's less chance of people getting hung up on the difference between "most tests passing" and "some tests passing" than on the difference between 100% and not-100%, and it makes it harder to try to pressgang the test results into some (likely misleading) metric of browser quality once there is more than one serious attempt an an implementation.

But I could certainly be persuaded that doing nothing is better than doing broad categories.

foolip · 2018-10-31T14:16:06Z

there's less chance of people getting hung up on the difference between "most tests passing" and "some tests passing" than on the difference between 100% and not-100%

Ah, so that I agree with that, too much of a difference between 99% and 100% would be bad. I thought using lime green for 99% should mostly address that, but the number (or fraction) itself is still there.

@beaufortfrancois would you still find the badges useful if they elided the details and just said "good support" or something in the 80-100% range? (I'm finding it a bit tricky to find a phrasing that's not misleading for the 100%, and "mostly passing" seems to imply <100%.)

beaufortfrancois · 2018-10-31T14:45:19Z

@foolip Number of tests (passing and total) is my main preoccupation to be honest.

beaufortfrancois · 2018-11-08T14:48:37Z

What is the current status?

lukebjerring · 2018-11-08T15:12:16Z

We're waiting on a consensus. Hoping to either determine an alternate manifestation of this feature, which will alleviate the concerns highlighted by @jgraham, or have agreement that the concerns are noted and the feature should move ahead.

I believe the conversation has just currently fallen by the wayside. @jgraham - where do you sit on this issue? @foolip ? (Specifically, are we deciding to block this feature on the concerns, and should I close this PR, or do we think the feature's merits outweigh the concerns, and should I merge it?)

Personally, I still feel the feature adds more value than risk of misinterpretation.

jgraham · 2018-11-08T15:23:05Z

I think the status is that there's a fundamental incompatibility between the functionality requested and the concerns raised. @beaufortfrancois doesn't believe that any of the proposed alternatives would meet their requirements, and I don't believe that the unmodified feature is something we should go ahead with.

lukebjerring · 2018-11-14T21:51:40Z

Ok, closing this PR.

FTR, in my opinion this feels like throwing the baby out with the bath water. We're avoiding supporting the feature because of potential use in misleading/inappropriate contexts, when the demand for the feature and a clear value-add are presented (and, the badges are already created and used anyway).

foolip · 2018-11-15T14:18:05Z

Sorry for not getting back to this despite the mention.

I think walking away from this would be a real shame, even if we're not confident it's perfect. As I argued in #674 (comment), we could show exactly the views we have on wpt.fyi and link to wpt.fyi for further exploration. That would show the same numbers that @beaufortfrancois is getting with his current badges. If there is a fundamental incompatibility of goals here, we must already have that problem on wpt.fyi to some degree. (Being presented together with a long list of directories for other features can't be what makes the difference.)

(I'm not arguing that this would be the perfect presentation for inlining in READMEs, just an acceptable presentation does exists.)

@lukebjerring, what would you think about seeking wider feedback, perhaps on public-test-infra, blink-dev, dev-platform and webkit-dev?

beaufortfrancois · 2018-11-21T08:11:22Z

FWIW, I've discovered this morning another bug in my picture-in-picture web platform tests by simply looking at my badges in the Picture-in-Picture spec repo. It is useful (at least for me)

foolip · 2018-11-21T09:12:09Z

I'm glad to hear that, and also not surprised that looking at results brings problems to your attention. Ping @lukebjerring on #674 (comment).

foolip · 2018-12-05T11:51:23Z

I finally stumbled upon the spec where I'd seen something like this in the header: https://w3c.github.io/hr-time/

That's using caniuse.com, but is the sort of presentation that would be great to enable based on WPT results.

foolip · 2018-12-13T09:51:18Z

As another point in favor of increasing visibility of results, @beaufortfrancois tells me that he discovered the picture-in-picture regression that led to web-platform-tests/wpt#14495 using his own badges.

lukebjerring · 2018-12-18T15:46:48Z

Sorry for slow response. Happy to seek wider feedback.

Add /api/badge

2c96f9c

lukebjerring requested a review from foolip October 18, 2018 13:44

Luke Bjerring added 2 commits October 18, 2018 16:22

Fix query params

184dfc9

Always set header

d7b28b7

beaufortfrancois mentioned this pull request Oct 18, 2018

Add Web Platform Tests badges w3c/picture-in-picture#90

Merged

lukebjerring mentioned this pull request Oct 25, 2018

Use the same color scheme in results and interop view #423

Closed

Luke Bjerring and others added 3 commits October 26, 2018 14:39

Add docs

a61c97b

Temporarily make the image src work for review

1ed6008

Revert back to future (staging.wpt.fyi) URL

82145e5

lukebjerring commented Oct 26, 2018

View reviewed changes

Actually make the example badge experimental

ef730e1

Match proposed color scheme

8bfa4aa

Merge master

becf9f7

Merge master

d5f60a9

lukebjerring closed this Nov 14, 2018

Hexcles deleted the badge branch February 16, 2019 21:41

Hexcles restored the badge branch February 16, 2019 21:42

Hexcles deleted the badge branch October 23, 2019 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /api/badge #674

Add /api/badge #674

lukebjerring commented Oct 18, 2018

wpt-pr-bot commented Oct 18, 2018

beaufortfrancois commented Oct 18, 2018

wpt-pr-bot commented Oct 18, 2018

lukebjerring commented Oct 18, 2018

foolip commented Oct 18, 2018

lukebjerring commented Oct 19, 2018

foolip commented Oct 19, 2018

lukebjerring commented Oct 25, 2018

foolip commented Oct 26, 2018

lukebjerring Oct 26, 2018

beaufortfrancois commented Oct 29, 2018

jgraham commented Oct 29, 2018 •

edited

Loading

lukebjerring commented Oct 29, 2018

foolip commented Oct 29, 2018

beaufortfrancois commented Oct 30, 2018

foolip commented Oct 31, 2018

jgraham commented Oct 31, 2018 •

edited by foolip

Loading

foolip commented Oct 31, 2018

jgraham commented Oct 31, 2018

foolip commented Oct 31, 2018

beaufortfrancois commented Oct 31, 2018

beaufortfrancois commented Nov 8, 2018

lukebjerring commented Nov 8, 2018

jgraham commented Nov 8, 2018

lukebjerring commented Nov 14, 2018

foolip commented Nov 15, 2018

beaufortfrancois commented Nov 21, 2018

foolip commented Nov 21, 2018

foolip commented Dec 5, 2018

foolip commented Dec 13, 2018

lukebjerring commented Dec 18, 2018


		Latest chrome experimental run:

		![webdriver badge](https://staging.wpt.fyi/api/badge?product=chrome&path=/webdriver)

Add /api/badge #674

Add /api/badge #674

Conversation

lukebjerring commented Oct 18, 2018

Description

Review Information

wpt-pr-bot commented Oct 18, 2018

beaufortfrancois commented Oct 18, 2018

wpt-pr-bot commented Oct 18, 2018

lukebjerring commented Oct 18, 2018

foolip commented Oct 18, 2018

lukebjerring commented Oct 19, 2018

foolip commented Oct 19, 2018

lukebjerring commented Oct 25, 2018

foolip commented Oct 26, 2018

lukebjerring Oct 26, 2018

Choose a reason for hiding this comment

beaufortfrancois commented Oct 29, 2018

jgraham commented Oct 29, 2018 • edited Loading

lukebjerring commented Oct 29, 2018

foolip commented Oct 29, 2018

beaufortfrancois commented Oct 30, 2018

foolip commented Oct 31, 2018

jgraham commented Oct 31, 2018 • edited by foolip Loading

foolip commented Oct 31, 2018

jgraham commented Oct 31, 2018

foolip commented Oct 31, 2018

beaufortfrancois commented Oct 31, 2018

beaufortfrancois commented Nov 8, 2018

lukebjerring commented Nov 8, 2018

jgraham commented Nov 8, 2018

lukebjerring commented Nov 14, 2018

foolip commented Nov 15, 2018

beaufortfrancois commented Nov 21, 2018

foolip commented Nov 21, 2018

foolip commented Dec 5, 2018

foolip commented Dec 13, 2018

lukebjerring commented Dec 18, 2018

jgraham commented Oct 29, 2018 •

edited

Loading

jgraham commented Oct 31, 2018 •

edited by foolip

Loading