Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out where we use tags and how much we can clean them up #3424

Closed
wbamberg opened this issue Mar 23, 2021 · 32 comments
Closed

Figure out where we use tags and how much we can clean them up #3424

wbamberg opened this issue Mar 23, 2021 · 32 comments
Labels
MDN:Project Anything related to larger core projects on MDN

Comments

@wbamberg
Copy link
Collaborator

In talking to @escattone we were wondering where and for what we use tags in MDN. Our tags were migrated wholesale from Kuma and in general are a real mess.

I think that now we don't expose them directly to users any more (?) but they are used in some macros. It would be good to understand which macros use them, and which specific tag values these macros are looking for, and whether we could clean out the other values.

@wbamberg wbamberg added the needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened. label Mar 23, 2021
@hamishwillee
Copy link
Collaborator

No idea. I recently cleaned up an old pre-kuma syntax that allowed you to list all pages that had a particular tag - "because we don't do that anymore"; that's pretty much the only use of tags that I would want.

@peterbe
Copy link
Contributor

peterbe commented Mar 26, 2021

We do index them in Elasticsearch. For no reason, because you can't search for them through the query API (/api/v1/search?). :)

We certainly don't display them in any UI on the document page itself.

So I think they're only used for some macros. Sidebar macros in particular.

@peterbe
Copy link
Contributor

peterbe commented Mar 26, 2021

One thing I've noticed is that if you think the English tags are a mess; wait till you see the translated-content tags. So often they are different!
If the sidebar rendering depends on the tags...;

  1. The sidebars in translated content is going to suffer
  2. Can't the translated-content just piggyback on their en-US parent doc?

@hamishwillee
Copy link
Collaborator

hamishwillee commented Mar 29, 2021

If the tags never appear in the UI, and we don't intend to show them in future, then there is no reason not to piggy back them off the en-US parent. In particular this is true because IMO it is desirable for sidebar navigation to be consistent across localisations.

But I'd be considering dumping them for sidebar navigation (and MDN) altogether. My "experience" is that there are better ways to build navigation. They do make sense for something like wikipedia where navigation is by category not "guided" (which is where I see the value of sidebars)

@jpmedley
Copy link
Collaborator

Something just struck me about this. Am I remembering correctly that I could previously click on a tag and see all the pages with that tag? If there's a way to look at stats of those URLs, someone should. Such stats may or may not be arguments for keeping tags, but they could be evidence of use cases that still need to be supported. For example, tags may have been the only way to see a complete list of all events.

@peterbe
Copy link
Contributor

peterbe commented Mar 31, 2021

Something just struck me about this. Am I remembering correctly that I could previously click on a tag and see all the pages with that tag? If there's a way to look at stats of those URLs, someone should. Such stats may or may not be arguments for keeping tags, but they could be evidence of use cases that still need to be supported. For example, tags may have been the only way to see a complete list of all events.

In other words; "hold on before we kill all tags" :)

I'll let you content gurus hash it out but I can predict that it would be trivial to add a site-search, by tag. You can actually already search by prefixes. E.g. https://developer.mozilla.org/api/v1/search?q=foreach&slug_prefix=web/htMl
Now, there's no UI for doing this. Just saying how relatively easy it can be accomplished. It would be easy to add &tag=TAG1&tag=TAG2 to the search API. But I'd first need to do a bit of work to make the ?q=... parameter optional (if you supply other params) and then we'd need to decide if &tag=TAG1&tag=TAG2 means "TAG1" in doc.tags AND "TAG2" in doc.tags" or if it means "TAG1" in doc.tags OR "TAG2" in doc.tags".

@peterbe
Copy link
Contributor

peterbe commented Mar 31, 2021

By the way @jpmedley @wbamberg if you have ideas how to get rid of https://developer.mozilla.org/en-US/docs/Web/API/Index I'm all ears. That page is so huuuge that it's omitted from the site-search and our sitemaps.
But perhaps it would be of value to the user to be able to replace that page with a link to the search search like:

<a href="/en-US/search?slug_prefix=web/api">Search all Web API pages</a>

...or something.

@wbamberg
Copy link
Collaborator Author

By the way @jpmedley @wbamberg if you have ideas how to get rid of https://developer.mozilla.org/en-US/docs/Web/API/Index I'm all ears. That page is so huuuge that it's omitted from the site-search and our sitemaps.
But perhaps it would be of value to the user to be able to replace that page with a link to the search search like:

<a href="/en-US/search?slug_prefix=web/api">Search all Web API pages</a>

...or something.

I believe that this page was only used by the doc status pages, which don't exist any more. But @chrisdavidmills might know better than me.

@chrisdavidmills
Copy link
Contributor

By the way @jpmedley @wbamberg if you have ideas how to get rid of https://developer.mozilla.org/en-US/docs/Web/API/Index I'm all ears. That page is so huuuge that it's omitted from the site-search and our sitemaps.
But perhaps it would be of value to the user to be able to replace that page with a link to the search search like:

<a href="/en-US/search?slug_prefix=web/api">Search all Web API pages</a>

...or something.

I believe that this page was only used by the doc status pages, which don't exist any more. But @chrisdavidmills might know better than me.

I think Will is right here.

@jpmedley
Copy link
Collaborator

jpmedley commented Apr 1, 2021

In other words; "hold on before we kill all tags" :)

You got the gist of it. I was going more for make sure we're clear on all the consequences before moving forward, and make sure we have new implementations for valid use cases such as (possibly) the events problem I alluded to above.

By the way @jpmedley @wbamberg if you have ideas how to get rid of https://developer.mozilla.org/en-US/docs/Web/API/Index I'm all ears.

That doesn't look like any index I've ever seen. It looks like something you'd create to make up for a lack of site search. I'll call this misnamed and something that should probably go away. I'd add that the site has no index at all. If you wanted an index, that's a different discussion, but I'm not sure we do.

Taking your question at face value, an index would be a list of links only organized under letters of the alphabet as headings. Links would ideally be subjects rather than to single pages. For example:

S

...
sensors

  • API
  • AmbientLight
  • feature detecting
  • Gyroscope
  • ...
  • using

This kind of thing would be more useful in the expository content where locations of answers to specific questions are less predictable. It seems like there would also need to be separate indexes for the various sections. For example, it's unlikely that someone searching for information on packaging an extension would need to see entries related to PWAs. In the reference material, it's not just redundant with site search, it's, in a way, redundant with reference structure itself. If I've spent any time at all in the reference, I can predict where all of these topics will be found.

The fact that the items in that page seem to all link to single pages reinforces my feeling that it was a site search substitute.

@hamishwillee
Copy link
Collaborator

The Deprecated, Non-standard tags in a page are used by the sidebar macros to auto-add the sidebar icon indicating this states. As discussed here: mdn/yari#3818 I think it a pain we have this info in tags, bcd and header macro. It should probably just be in BCD if possible imported into page, over-ridden by front-matter if present, and the deprecation header/other headers should be added automatically.

The only reason to not auto-generate the deprecation header is that it would be nice in cases where we have a deprecated API superseded by a clear replacement, if we could include the replacement as part of the macro.

That would allow removal of "state" tags.

@jpmedley
Copy link
Collaborator

we could include the replacement as part of the macro.

Should this be in the BCD instead? That would make the information available to all BCD clients and open use cases we haven't thought of yet.

@peterbe
Copy link
Contributor

peterbe commented May 20, 2021

Not sure if this helps, but looking at the original issue here from @wbamberg ...

  • CSSRef.ejs depends on:

    • Deprecated
    • Experimental
    • Overview
    • CSS Property
    • Selector
    • Pseudo-class
    • Pseudo-element
    • At-rule
    • CSS Data Type
    • Non-standard
  • APIRef.ejs depends on:

    • Property
    • Method
    • Constructor
    • Event
    • Experimental
    • Non-standard
    • Non standard
    • Deprecated
    • Obsolete
  • CSSInfo.ejs depends on:

    • CSS Function
    • CSS Data Type
    • Element
  • cssxref.ejs depends on:

    • CSS Function
    • CSS Data Type

Actually. I'm going to stop there. There are many more macros that depend on them.

▶ cd kumascript/macros

▶ rg 'hasTag\(' --count-matches
APIRef.ejs:11
WebExtAPISidebar.ejs:10
CSSInfo.ejs:3
SubpagesWithSummaries.ejs:2
LearnBox.ejs:2
cssxref.ejs:2
CSSRef.ejs:12
SVGRef.ejs:1
InterfaceOverview.ejs:12

So it's pretty clear it's still relevant and important.
But it's only in KS macros. I don't know of any other place where they're exposed.

But all of this reminds me, I should write a translation-differences checker that makes sure the translated document's tags match the en-US one.

@jpmedley
Copy link
Collaborator

It may be a longer project than I thought, but I'm still in favor of moving to BCD when we can. It seems like CSSRef could be altered to read from BCD. It might also prove useful if event, method, property, etc. were the value of a type attribute in BCD. These are just a few examples.

@wbamberg
Copy link
Collaborator Author

So it's pretty clear it's still relevant and important.

Yes, there are some tags that are used in macros. But there are many many tags that are not. That's why we would need to do the analysis of which tags are used, so we can clean them.

But, many of the places we use tags in macros are to identify the type of page something is. From your partial list above:

Overview
CSS Property
Selector
Pseudo-class
Pseudo-element
At-rule
CSS Data Type

Property
Method
Constructor
Event

CSS Function
CSS Data Type
Element

What I would like to do here is have a "page-type" front matter key instead, as in mdn/yari#3350. This would be useful for all sorts of things, including (for example) figuring out whether a page out to contain a BCD table, or a specifications table, or a box listing event properties, or a link to a contructor page, ...

As you said in that issue, Peter, technically that's possible right now, thanks to making front matter available to KS.

Most of the other tags you've listed are to do with the lifecycle status of the feature:

Experimental
Non-standard
Non standard
Deprecated
Obsolete

Some of these are already represented in BCD, and some of them ("Obsolete") we are trying to get away from.

@peterbe
Copy link
Contributor

peterbe commented May 20, 2021

Would it help to figure out which tags are in documents but never ever used in the hasTag() function in KS macros?

@peterbe
Copy link
Contributor

peterbe commented May 20, 2021

Another thing that's occurred to me is that it's quite possible that tags in translated documents are incorrect. I.e. the CSS sidebar might get different results in French compared to English just because the tags aren't perfectly in sync.
So what we could do there is to "always" ignore the tags from the translated document and fall back to the English parent document's tags when building sidebars.

@hamishwillee
Copy link
Collaborator

hamishwillee commented May 21, 2021

Some of this discussion moved on to https://github.com/mdn/content/discussions/5162. Just a few points.

In support of what @wbamberg said, the idea is to make separate out the metadata related to document/object state and type from the more arbitrary "tags". The information would still be there in some form.
FWIW I think the core sidebar should actually be either be explicitly listed as frontmatter or auto-added based on the type. It's metadata not content.

we could include the replacement as part of the macro.

Should this be in the BCD instead? That would make the information available to all BCD clients and open use cases we haven't thought of yet.

That would great, but BCD owners are pretty protective of what goes in there (and rightly so).
More generally though, we are likely to have cases where there is no BCD entry but there is an MDN one. That implies we need a way to specify this stuff in content even if it normally comes from BCD.
Further, while want to think as BCD as canonical, there may be cases where it makes sense for frontmatter to be an "over-ride". Not sure.

Would it help to figure out which tags are in documents but never ever used in the hasTag() function in KS macros?

I think so - part of the analysis.

Another thing that's occurred to me is that it's quite possible that tags in translated documents are incorrect

This is why fetching from BCD is so handy. But yes, IMO all metadata in translated pages should be ignored, and probably should be stripped. Only the "master" version should be used, and it should be used everywhere. The only exception might be "tags after we have pulled out all the useful ones into dedicated keys". Note, we're not rendering the metadata.

@jpmedley
Copy link
Collaborator

That would great, but BCD owners are pretty protective of what goes in there (and rightly so).

You don't know until you ask.

More generally though, we are likely to have cases where there is no BCD entry but there is an MDN one. That implies we need a way to specify this stuff in content even if it normally comes from BCD.

Can you provide an example? When I do reference pages I always do the BCD first, and I require that of the contractors I've been supervising the last year.

Further, while want to think as BCD as canonical, there may be cases where it makes sense for frontmatter to be an "over-ride". Not sure.

I'd need to see a use case before I'd be convinced that's necessary.

@peterbe
Copy link
Contributor

peterbe commented May 21, 2021

FWIW I think the core sidebar should actually be either be explicitly listed as frontmatter or auto-added based on the type

That's off-topic but a great point!!
I had in mind that when we refactor sidebars away from kumascript html that you'd type into the front-matter...

sidebar: css

but you could omit that stuff automatically if it says:

page-type: css-selector

because Yari could have a mapping of page-types => sidebar.

@wbamberg
Copy link
Collaborator Author

Yes on using page-type to map to sidebars. https://github.com/mdn/content/discussions/5162 talks about that as a usage for page-type.

On BCD: I think the proper scope of BCD is to describe the level of browser support for web platform features. I'm uncomfortable with it becoming a general repository for data about the web platform. Or at least, we could decide we wanted to do this but it would be a definite change of scope. Instead I'd like us to consider using front matter for data about web platform features at least in some cases. (I think perhaps this ship has pretty much sailed for at least some things, though, like spec_url and deprecated.)

@jpmedley
Copy link
Collaborator

I'm mindful of the concern with scope. My concern with front matter overriding what's in BCD is that it implies that the BCD is incorrect. I wouldn't want incorrect information showing up in the third-party apps that consume BCD.

@peterbe
Copy link
Contributor

peterbe commented May 25, 2021

Where are we with this?
We've concluded that tags are messy and sometimes the mess can cause sidebars to not operate as expected. E.g. typos.
To begin with...

  1. How about removing all tags (in the front-matter) from all translated-content? Since they're useless at this point.
  2. Hardcode the list of valid tags. So if a document uses Pseudo element (when it should be Pseudo-element) would lead to an error or a flaw.

If it's helpful, I can write a script that logs exactly only the tags used in any page.hasTag() call.

@wbamberg
Copy link
Collaborator Author

Yes, next step I would say is : make a list of all the tags currently used by macros (I guess that is the "list of valid tags") and remove all the rest.

Then when we land page-type we can update macros to use that instead of tags, and remove all the tags that are proxies for page type.

(That's if we can agree that we should remove unused tags. I know Joe was concerned that we might be losing important use cases by doing this, but tbh that state of our tags at the moment is such that I can see that data being useful at the moment.)

I've not done any work on this yet because it's not as much as a priority for me as getting the content Markdown-ready (like removing inline styles and updating live samples).

If it's helpful, I can write a script that logs exactly only the tags used in any page.hasTag() call.

Is that a reliable way to know which tags are used across all our macros? Is that the only way tags are exposed? (e.g. there isn't a tags member of the page or anything like that). If so, then yes, that would be very helpful!

@peterbe
Copy link
Contributor

peterbe commented May 25, 2021

For all en-US content the following tags are the only ones ever used in the KS API call to hasTag(doc, TAG):

ACTUALLY USED TAGS
Non-standard              1,069,187
Experimental              887,409
Deprecated                887,409
Pseudo-element            684,947
Pseudo-class              670,892
CSS Data Type             617,227
Overview                  608,113
CSS Property              608,113
Selector                  608,113
At-rule                   608,113
Obsolete                  604,345
Event                     408,246
Non Standard              358,061
Method                    258,700
Property                  258,680
Constructor               205,538
prototype                 21,035
SVG Attribute             19,256
Type                      10,413
CSS Function              9,122
SVG Filter                1,909
Element                   911
SVG Container             830
SVG Text Content          747
SVG Graphics              664
SVG Font                  664
SVG Animation             581
SVG Descriptive           249
SVG Gradient              249
Important                 223
junk                      223
Event Handler             106
SVG Attr                  83
SVG Light Source          83
SVG Reference             83
Read-only                 82

The number just represents how many times it's called. For example, Read-only 82 means, that 82 times there was a call to hasTag(doc, "Read-only").

A quick one that pops out to me is Non-standard vs. Non Standard :)

@peterbe
Copy link
Contributor

peterbe commented May 25, 2021

Another interesting one is:
Prototype and Junk.
there are roughly 325 documents that have tag Prototype but there's no kumascript macro code that looks for it. Instead it looks for prototype 21k times.
Same with junk. The SubpagesWithSummaries.ejs looks for junk but there's not a single document that uses that. It's always Junk if it's even set.

@peterbe
Copy link
Contributor

peterbe commented May 25, 2021

Actually, the hasTag() function is made to be case insensitive so the problem with junk and prototype is exaggerated.

@peterbe
Copy link
Contributor

peterbe commented May 25, 2021

@hamishwillee
Copy link
Collaborator

@wbamberg Re #3424 (comment) "Yes, next step I would say is "

Yes, but in addition, "create a new key/keys for document state tags: Non-standard, Experimental, Deprecated". These are our three most used tags. They are what cause rendering of the little icons in sidebars and other macros. The info should be pulled out of BCD where possible, but we still need a way for the data to be inserted in the page if there is no BCD entry.

They are important enough to separate, and they would not be part of the "page type" data (I don't think?)

@jpmedley
Copy link
Collaborator

jpmedley commented Jun 2, 2021

"Junk" was how pages on the wiki were marked for archival/deletion.

@peterbe
Copy link
Contributor

peterbe commented Jun 2, 2021

I know that translated content is lower priority but MDN gets millions of people to the non-English documents so it's always worth keeping it in mind.

Just wanted to highlight that mdn/yari#3955 is coming.
If this lands, it means we can much more confidently delete ALL tags from the front-matter in all translated-content documents. Because, if we ever need it, it'll automatically inherit its tags from the en-US document.

Because at the moment, I think the tags are just a scary nuisance for the translators. They might unnecessarily worry about "Oh no, do I have to make sure that it always matches?!?"

@Rumyra Rumyra added MDN:Project Anything related to larger core projects on MDN and removed needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened. labels Jun 7, 2021
@sideshowbarker
Copy link
Member

I propose we move this to the Discussions tracker.

@wbamberg wbamberg closed this as completed Jun 8, 2021
@mdn mdn locked and limited conversation to collaborators Jun 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
MDN:Project Anything related to larger core projects on MDN
Projects
None yet
Development

No branches or pull requests

7 participants