Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spec_url values for all features for which we have the data #6765

Closed
sideshowbarker opened this issue Sep 26, 2020 · 25 comments
Closed

Add spec_url values for all features for which we have the data #6765

sideshowbarker opened this issue Sep 26, 2020 · 25 comments
Assignees
Labels
bulk_update An update to a mass amount of data, or scripts/linters related to such changes enhancement Nice to have features.

Comments

@sideshowbarker
Copy link
Contributor

https://github.com/w3c/browser-compat-data is a fork of BCD in which basically the only difference is that it adds spec_url values to every feature for which a spec URL can be determined.

So this issue is for tracking the proposal that we upstream all that spec_url data from that fork to here.

#2983 (comment) is where @Elchi3 and I discussed first just adding spec_url data for all JavaScript features, with the idea that we’d eventually do it for all features.

So here’s where we can discuss a plan for taking the next steps to add spec URLs for the rest of the BCD features.

We wouldn’t necessarily need to upstream all the spec_url data from my fork for every feature all at once. I think at least that @Elchi3 believes it’d be better to do it in subtree “chunks” one-by-one — for example, starting with everything in the html subtree, then seeing how that goes, and then after that, maybe the css subtree, etc.

Given that I know @Elchi3 cares about the particulars of how this gets done, and when, I think we would want to wait until we have him back before we proceed on actually upstreaming any of the spec URL data. But perhaps in the meantime, we can try to discuss some of the logistics in preparation.

On that note, I want to mention here that the way the spec_url values get added to the fork is through a script I wrote at https://github.com/w3c/browser-compat-data/blob/master/scripts/add-specs.js. Some details:

  • the script works by scraping the Specification(s) tables in MDN articles
  • it’s written to be runnable either on the entirety of BCD, or else on any individual top-level directory or set of top-level directories (it takes a list of directory names as an optional command-line argument)
  • even after we add spec_url for any features, the script can be re-run periodically to add any the new spec URLs it finds (from re-scraping the MDN articles), or to update any spec URLs that have changed (due to edits make to the MDN aricles)
@queengooborg queengooborg added bulk_update An update to a mass amount of data, or scripts/linters related to such changes enhancement Nice to have features. labels Sep 26, 2020
@foolip
Copy link
Contributor

foolip commented Oct 6, 2020

What are the open questions to be resolved here? Is it just about which anchor in each spec to use, or are there other things that could be debated?

FWIW, I think that what I'm proposing in w3c/webref#63 could help here. With that IDL scraped from specs we can get a mapping for many api.* entries to at least a spec shortname, but how to determine the best specific anchor is still not clear. Often there will be multiple relating to a single IDL member to chose between, one inside the IDL block, and sometimes one for the prose definition of what the IDL member does.

@sideshowbarker
Copy link
Contributor Author

What are the open questions to be resolved here? Is it just about which anchor in each spec to use, or are there other things that could be debated?

There are no open questions to be resolved. Specifically, there’s no ambiguity about which anchor in each spec to use: We have all the spec URLs already — collected by scraping the Specifications tables in MDN articles — and those are all already in the fork at https://github.com/w3c/browser-compat-data

The only decision that needs to be made here is whether to upstream the spec URLs from https://github.com/w3c/browser-compat-data all at once, or whether to do it in stages, and when.

FWIW, I think that what I'm proposing in w3c/webref#63 could help here.

That looks promising. Given that we already have spec URLs for features documented in MDN, I think what’s proposed there could be useful in identifying cases where the relevant MDN article is missing the spec URL for some reason, or else has the wrong spec URL, or where the feature isn’t documented at all yet in MDN.

With that IDL scraped from specs we can get a mapping for many api.* entries to at least a spec shortname, but how to determine the best specific anchor is still not clear.

Yeah, getting the anchors is the part that can’t be done programatically in many cases.

Often there will be multiple relating to a single IDL member to chose between, one inside the IDL block, and sometimes one for the prose definition of what the IDL member does.

Right. I suspect we’d end up finding that whatever we developed to identify the spec URLs programatically would have some limitations. But I think the combination with the existing spec URL data we already from MDN is what would make the w3c/webref#63 really powerful for what we need in BCD.

@foolip
Copy link
Contributor

foolip commented Oct 6, 2020

It's great that we have a whole lot of links already collected and ready to go. I wouldn't propose revisiting if all of those are exactly right. Still, is there a proposed guidelines for which anchor to use for new data? https://github.com/mdn/browser-compat-data/blob/master/schemas/compat-data-schema.md#the-__compat-object doesn't spell it out.

More concretely, the case that I've often run into is whether to link to the section defining an interface vs. the definition in the Web IDL block itself: https://html.spec.whatwg.org/multipage/workers.html#the-abstractworker-mixin vs. https://html.spec.whatwg.org/multipage/workers.html#abstractworker

@sideshowbarker
Copy link
Contributor Author

It's great that we have a whole lot of links already collected and ready to go. I wouldn't propose revisiting if all of those are exactly right. Still, is there a proposed guidelines for which anchor to use for new data?

There are no such guidelines written down yet that I’m aware of. But I agree we should document some if we can.

https://github.com/mdn/browser-compat-data/blob/master/schemas/compat-data-schema.md#the-__compat-object doesn't spell it out.

More concretely, the case that I've often run into is whether to link to the section defining an interface vs. the definition in the Web IDL block itself: https://html.spec.whatwg.org/multipage/workers.html#the-abstractworker-mixin vs. https://html.spec.whatwg.org/multipage/workers.html#abstractworker

From looking through a lot of existing specifications links in MDN articles and from trying to take into consideration what will be most useful to web developers, IMHO it’s almost always better to link to the section defining an interface rather than to the definition in the Web IDL block itself.

@dontcallmedom
Copy link
Contributor

reading this thread, @tidoust and I realized that the automatically extracted list of definitions generated by tidoust/reffy in w3c/webrefs provide probably the right list of anchors for all the IDL interfaces/attributes/methods that need to feed into this spec_url field
https://github.com/w3c/webref/tree/master/ed/dfns

(or when it doesn't provide the right right anchor, it feels like this should be fixed in the spec upstream in any case)

@foolip
Copy link
Contributor

foolip commented Oct 8, 2020

In the case of AbstractWorker, it looks like https://raw.githubusercontent.com/w3c/webref/master/ed/dfns/html.json links to https://html.spec.whatwg.org/multipage/workers.html#abstractworker, which is what I'd expect from a scraping approach.

I do share @sideshowbarker's preference for linking to the section defining an interface, if the heading is right above at least, but I don't know how we'd achieve this. If one wants that, how would would fix it in the spec?

@dontcallmedom
Copy link
Contributor

in general, the trend in respec/bikeshed specs is that the anchor of the definition will be in the prose description rather than the IDL block.

The HTML spec isn't doing that consistently yet, but to me that's a case of needing to upstream what we want there.

@dontcallmedom
Copy link
Contributor

(alternatively, we could easily annotate definitions to refer to the closest precedent heading id)

@ddbeck
Copy link
Collaborator

ddbeck commented Oct 9, 2020

@sideshowbarker

The only decision that needs to be made here is whether to upstream the spec URLs from https://github.com/w3c/browser-compat-data all at once, or whether to do it in stages, and when.

  1. Phases: I think the case made there for doing it in phases still stands. I read through the entirety of Add spec_url data for JavaScript features #2983 again. It's still a big change and reviewing them by topic is just going to be less onerous (honestly, I think this will go faster in phases). I'd suggest starting small, with the html.* or http.* data.

    All add to this by saying that I'm strongly inclined to do some randomized spot-checking of URLs, to make sure they actually make sense to humans. Splitting this by topic is going to be a lot easier.

  2. Timing: I suggest no earlier than the the termination of mdn-browser-compat-data releases. This is mostly just selfishness, for my workload. So it's at least a few weeks before we can start on this in earnest.

    Also, I was thinking about introducing some additional rigor into BCD's approach to versioning (e.g., using semver minor versions to indicate large-scale non-breaking changes to the data). This would be easier to do when there's only one versioning sequence to consider.

    I'm also fine with blocking this for a bit while we await Florian's hoped-for return, especially if it becomes a more concrete expectation. That said, spec_url has been a success in JavaScript; that early work on Add spec_url data for JavaScript features #2983 paid off. Consequently, I feel like this is pretty low-risk to introduce to other parts of BCD, so I don't think we need to wait indefinitely.

@foolip

I wouldn't propose revisiting if all of those are exactly right. Still, is there a proposed guidelines for which anchor to use for new data? https://github.com/mdn/browser-compat-data/blob/master/schemas/compat-data-schema.md#the-__compat-object doesn't spell it out.

I agree that it would be good to document a guideline. Since we already have JS spec_urls, I think starting work on a guideline is not blocked. I'd welcome a PR codifying our approach in JS, which could serve as a model for other spec URL fragment guidelines.

@foolip
Copy link
Contributor

foolip commented Oct 22, 2020

In order to filter BCD to "everything Flexbox" and "everything Grid" it would be of great practical utility to me to start with URLs for those features. I'd also be a willing reviewer if we start there. At this point, I think only the termination of mdn-browser-compat-data releases is blocking. @ddbeck any projection of when that will happen?

@sideshowbarker
Copy link
Contributor Author

In order to filter BCD to "everything Flexbox" and "everything Grid" it would be of great practical utility to me to start with URLs for those features.

I agree that set of features should be one of the very highest priorities for dealing with overall, so I also agree it’d be a great set to start with as far as adding spec URLs.

@ddbeck
Copy link
Collaborator

ddbeck commented Oct 26, 2020

Introducing spec_url to the CSS flexbox/grid data first

I'm happy to see it land there first, but let's not do all of CSS as one big PR. CSS has many more open PRs, so I want to avoid accidentally creating conflicts on many PRs at once (at least until I can get the PR queue under better control).

What's blocked and not blocked

I think only the termination of mdn-browser-compat-data releases is blocking. @ddbeck any projection of when that will happen?

@foolip I need to double-check there have been no new issues raised on the package renaming, but barring any significant issues, I plan to deprecate all past releases of the mdn- package on Thursday, October 29 and merge master-scoped-package into master. After that, we'll be unblocked a bunch of things, including this issue.

In the meantime, I'd like to start to document how to choose a spec URL fragment though, which isn't blocked now (at least for JavaScript). I'd welcome even a very rough proposal, as it would help me better review the incoming spec URLs (even if we have not accepted the guideline yet).

@foolip
Copy link
Contributor

foolip commented Oct 28, 2020

I tried to pull out just the Flexbox spec URLs in #7161 (review) and did some self-review, which immediately revealed some issues:

  • Existing data doesn't consistently link to section
  • Linking to the section is sometimes non-ideal, when that section defines multiple things
  • It's not trivial to judge if it's appropriate to link some things which don't define the properties but do add normative requirements for them.

To that I would add:

@sideshowbarker
Copy link
Contributor Author

sideshowbarker commented Oct 29, 2020

Should we link to https://drafts.csswg.org/css-flexbox-1/ (with -1) or https://drafts.csswg.org/css-flexbox/ without the number. I would personally quite strongly prefer the unnumbered versions.

I strongly agree that’s what we should do — ideally.

But I’m not familiar enough with the CSS spec conventions to know, given a with-number spec URL and given a corresponding without-number URL, can we safely/confidently assume that a particular feature in the with-number spec actually exists in the corresponding without-number spec? (and further, if the feature does exists there, does it have the same fragment ID?)

My understanding is that the CSSWG doesn't direct the unnumbered variants to delta specs with the whole old spec missing, but that's the concern to look out for.

I don’t know what “doesn't direct the unnumbered variants to delta specs with the whole old spec missing” means…

@foolip
Copy link
Contributor

foolip commented Oct 29, 2020

given a with-number spec URL and given a corresponding without-number URL, can we safely/confidently assume that a particular feature in the with-number spec actually exists in the corresponding without-number spec?

@tabatkins I've asked you about this before and IIRC you don't direct the unnumbered version at delta specs, right?

I've confirmed this to hold true for a few delta specs listed in https://github.com/w3c/browser-specs:

That's not quite the same as guaranteeing that no ID in an unnumbered URL will ever go away, but close enough for our purposes I think.

@tabatkins
Copy link

Yeah, we point the unnumbered at the "current version", which is judged on an ad hoc basis, but generally will not ever point to a delta spec.

@tabatkins
Copy link

And recall that Bikeshed does have the ability to enforce that specific IDs be present in a spec (via the Required IDs metadata); it should probably be possible for tooling to track what IDs are used by bcd and suggest adding to the metadata.

@dontcallmedom
Copy link
Contributor

my sense is that this all converge nicely toward having spec_urls to point to exported definitions of the matching term - this means less ad-hoc manual processing, more consistent tracking, etc

(in particular, I think exported definitions should be considered as Required IDs by bikeshed - e.g. by checking that all the webref-extracted definitions marked as exported are presented in the bikeshed source at any time)

@tidoust
Copy link
Contributor

tidoust commented Nov 5, 2020

(alternatively, we could easily annotate definitions to refer to the closest precedent heading id)

@foolip, @sideshowbarker, I note that dfns extracts in Webref now associate definitions with the heading under which they are found. For instance, the definition of AbstractWorker in https://raw.githubusercontent.com/w3c/webref/master/ed/dfns/html.json gets a:

      "heading": {
        "id": "the-abstractworker-mixin",
        "title": "The\n  AbstractWorker mixin",
        "number": "10.2.6.1"
      }

(I still need to clean up the "\n " in there, but that's a no-brainer)

@foolip
Copy link
Contributor

foolip commented Nov 5, 2020

@tidoust would you suggest that we always link to the heading, or just that we allow linking to the heading on a case-by-case basis?

@tidoust
Copy link
Contributor

tidoust commented Nov 5, 2020

@tidoust would you suggest that we always link to the heading, or just that we allow linking to the heading on a case-by-case basis?

As raised by @dontcallmedom in #6765 (comment), I would rather converge towards linking to the definitions themselves, and would e.g. encourage the "put the dfn in the heading itself" pattern, as done in various specifications (e.g. The disableRemotePlayback attribute).

There won't be headings for all IDL definitions, but that seems fine as long as the dfn somewhat stands out. For instance, in WebRTC, iceGatheringState is on its own. Note the attribute is also under a section named Attributes, and it would not make a lot of sense to link to that section.

Linking to definitions does not work well for specs that have definitions in IDL blocks directly (one would want to link to Interface ShadowRoot and not to the IDL block). That is where some editorial effort could be needed to move away from such editorial patterns.

In the meantime, it may be doable to automate the choice of whether to link to the definition or the heading in most cases. For instance, the dfns extracts in webref could perhaps have something like an appearsIn property that would have values such as pre when the definition is in an IDL block or in a CSS value definition block, or proptable for CSS definitions such as font-family. This would signal the need to link to the heading instead of to the definitions.

Linking to definitions also does not work well when the definition appears in the middle of a long paragraph. That does not seem to be a common case though, and that seems harder to flag automatically.

Are there other cases where linking to the heading would make more sense that linking to the definition?

@Elchi3
Copy link
Member

Elchi3 commented Nov 25, 2020

Hi all, I'm ready to get back into this.

  1. Phases: I think the case made there for doing it in phases still stands. I read through the entirety of Add spec_url data for JavaScript features #2983 again. It's still a big change and reviewing them by topic is just going to be less onerous (honestly, I think this will go faster in phases). I'd suggest starting small, with the html.* or http.* data.

This makes sense to me.

@sideshowbarker I feel like we could get into html.* given its size and your expertise with the spec. Do you want to open an initial PR for say HTML global attributes spec_urls and we take a look at it together?

I also see there are comments and a draft PR about api.* and css.* spec_urls, but it seems like starting with html.* would be a bit easier and smaller effort for now?

@sideshowbarker
Copy link
Contributor Author

Do you want to open an initial PR for say HTML global attributes spec_urls and we take a look at it together?

Yup — just now raised #8064

@sideshowbarker
Copy link
Contributor Author

I think we can consider this to be wholly done

@foolip
Copy link
Contributor

foolip commented Sep 17, 2021

Awesome, thank you @sideshowbarker! Are there more things you'd like to see done or cleaned up when it comes to linking specs to BCD and vice versa? For example, is standard_track in sync with the existence of spec_url now, and do we have any monitoring of spec_urls that get broken over time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bulk_update An update to a mass amount of data, or scripts/linters related to such changes enhancement Nice to have features.
Projects
None yet
Development

No branches or pull requests

8 participants