-
Notifications
You must be signed in to change notification settings - Fork 22.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML ➡️ Markdown: Web/JavaScript #5193
Conversation
This branch is now deployed here: |
@@ -29,7 +29,7 @@ includes(searchElement, fromIndex) | |||
### Parameters | |||
|
|||
* `searchElement` | |||
* : The value to search for. | |||
* : The value to search for. Hi Will! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Gregor :)
comfortable with HTML and CSS. You may have to start small, and progress | ||
gradually. To begin, let's examine how to add JavaScript to your page for | ||
creating a *Hello world!* example. (*Hello world!* | ||
is[ the standard for introductory programming examples](https://en.wikipedia.org/wiki/%22Hello,\_World!%22\_program).) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link works but looks wrong: the standard for introductory programming examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not realized this PR also includes files under /en-us/learn/. These should be excluded (the scope of this is /en-us/web/javascript)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a faithful conversion of the broken original, isn't it?
content/files/en-us/learn/getting_started_with_the_web/javascript_basics/index.html
Line 42 in e7d3ed4
<p>However, getting comfortable with JavaScript is more challenging than getting comfortable with HTML and CSS. You may have to start small, and progress gradually. To begin, let's examine how to add JavaScript to your page for creating a <em>Hello world!</em> example. (<em>Hello world!</em> is<a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program"> the standard for introductory programming examples</a>.)</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I mistook the comma in the URL as "syntax". It would be good if this was clever enough to note the leading space in the link text and put it before the link. Otherwise, please ignore - sorry
creating a *Hello world!* example. (*Hello world!* | ||
is[ the standard for introductory programming examples](https://en.wikipedia.org/wiki/%22Hello,\_World!%22\_program).) | ||
|
||
<div class="warning"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning not converted properly. I guess you guys trusted "notecard warning".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this was in "en-us/learn" I didn't prep the content for it, but we should not see this un en-us/web/javascript .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I have yet to fix FOLDERSEARCH to be less fuzzy. Like Will said, /learn
is an accidental inclusion!
|
||
<img alt="" src="hello-world.png" style="display: block; margin: 0px auto" /> | ||
|
||
<div class="note"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note not converted properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has an extra <p>
wrapping its children, we've aimed to automatically cover the bulk of notecard cases without adding too many extra rules for outliers which can be fixed by hand
files/en-us/learn/getting_started_with_the_web/javascript_basics/index.md
Outdated
Show resolved
Hide resolved
files/en-us/learn/getting_started_with_the_web/javascript_basics/index.md
Outdated
Show resolved
Hide resolved
alternate the display of one of two images. This change will happen as a user | ||
clicks the displayed image. | ||
|
||
1. Choose an image you want to feature on your example site. Ideally, the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably these should be auto-numbered. I.e. in GFM you can number all of these as 1.
and they render properly. Also you don't need the spaces between lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea! Me, personally, not a tech writer per-se, I like both, the 1-numbered for making it easy to add items in between and the correctly-numbered for matching the output. I'd defer this decision to all of you and Will.
Unfortunately Prettier, which we use to format the markdown after the conversion, does not have a configuration for lists. It does have interesting behavior though in that it makes lists consistent, check out this example and see how the output changes when you remove the first element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wbamberg This is a question for us. I have a sneaking preference for auto-numbered lists, and that does generally reflect how the original HTML works. Make authoring easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had assumed "correct" numbering, like:
1 one
2 two
3 three
...just because it makes the source files easier to read, and reading happens more than writing. But I can see the other side too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the reading happens on the rendered output mostly :-).
I tend to like manually numbered when I'm writing and I need to refer back to a particular item. On the other hand, moving things around is much easier if they aren't numbered - and in if you do reference things, you have to change the number anyway.
I have a preference for autonumbered, but perhaps we should create a separate discussion.
|
||
## See also | ||
|
||
* [Dynamic client-side scripting with JavaScript](/en-US/docs/Learn/JavaScript) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a definition list? Seems to be too much spacing following bullets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is and you are right, the spacing is peculiar and throwing this markdown in prettier's playground has a different, properyl spaced output. Strange. I shall investigate!
files/en-us/learn/getting_started_with_the_web/javascript_basics/index.md
Outdated
Show resolved
Hide resolved
Thank you for the comments Hamish! |
|
Thanks for the comments!
I don't understand this, could you elaborate?
I'm thinking we will (and are in fact) running Prettier over all this, so we don't care what people do. But I guess it would be advisable for people to use Prettier locally as well? Which makes me think - if people are running Prettier locally will we have a problem with the table-formatting hack we are adding to work around prettier/prettier#10950 ?
Could you point to an example?
Do you mean, HTML and MD versions of the same page? I think that won't happen: when we convert we'll remove the HTML version. |
files/en-us/web/javascript/reference/global_objects/regexp/@@split/index.md
Show resolved
Hide resolved
--- | ||
{{jsSidebar("Operators")}} | ||
|
||
The **`function*`** keyword can be used to define a generator function inside an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I wonder if it is worth doing a search for *
, **
, _
used outside of pre or code blocks in our html. Of course you probably already addressed that.
@Gregoor FWI did a much bigger check of the source and didn't find anything else. For me the unnecessary spaces after list times are the main "dealbreaker". Upshot - impressive conversion. |
We very much appreciate you taking the time to look, Hamish! |
Thank you for the feedback everyone! I've updated the converter and this PR, |
Oh and we merged the converter into yari-main, so now all you need to do to preview it locally is check-out this branch and run |
I've been digging through the report today. I was hoping to go through it all but ran out of time. But I noticed that sometimes, when the converter decides it can't convert something and wants to leave it in HTML, the output gets mangled. For example:
I will finish going through the whole report tomorrow but thought it was worth mentioning these now. I also think we could choose different strategies for many of these. For example, we should probably convert |
I had a good look at the conversion report: https://github.com/mdn/content/blob/h2m-js/md-conversion-problems-report.md. First, I think the report logs two sorts of problems:
In both these cases the converter handles it by not converting the HTML, and keeping it as-is. In general if there aren't too many of these cases, we don't mind a little "unplanned" HTML creeping in. But it's worth looking through the report in case there are unexpected problems or cases where we can tighten things up. I've been through all the types in the summary at the top of the report. I've referred a few times in the analysis below to the conversion spreadsheet which is at https://docs.google.com/spreadsheets/d/1Nb-WUHveeUfi5YV0-pzVyHI1vR1IC8xF40IdkiceyQQ/edit#gid=1365998303, and which refers to #3350 (comment) sometimes where the conversion is more nuanced. If possible I'd like that spreadsheet to be a reasonably definitive guide for how we want to handle conversion to Markdown. Tables
There are a lot of these and it's hard to check them all in detail, but all the ones I checked are "Invalid AST transformations" at the top level ( This seems fine: we expect that some tables are not convertible to GFM and will need to stay as HTML (https://developer.mozilla.org/en-US/docs/MDN/Contribute/Markdown_in_MDN#tables). It looks like 49 tables were unconvertible, which is almost exactly half the tables in the JS docs. Given how limited GFM table syntax is, this isn't surprising. dfnThese are "Missing conversion rules". The conversion sheet gives a conversion of "GFM em?" for the
Note that the conversion of one of these span.seoSummaryThese are "Missing conversion rules". The conversion sheet points to "summary/seoSummary" in #3350 (comment), which basically says we should strip out this element when its text content matches the first paragraph of the doc, and log an error otherwise. See also #3923. But if this is too complicated we could leave these as unconverted, and deal with this in content. Note that the conversion of some of these kbdThese are "Missing conversion rules". We haven't written down what to do about this yet. I think we should keep them as HTML.
So this should stay as "Missing conversion rules". span.blob-code-inner.blob-code-markerThese are "Missing conversion rules". This should be stripped out (captured by "anything else" in the "classes" tab of the conversion spreadsheet). subThese are "Missing conversion rules". Fine for this to stay HTML, per https://developer.mozilla.org/en-US/docs/MDN/Contribute/Markdown_in_MDN#superscript_and_subscript. So this should stay as "Missing conversion rules". figureThis got added https://github.com/mdn/content/pull/4762/files, which explains why we didn't define a rule for it :(. I think we should remove this and use non-semantic markup in cases like this. But I'll do this manually in a content PR. Update: these got removed in #6128. p.summarySame as for codeThese are "Invalid AST transformations". This looks like a bug? In both these cases we're getting mangled output. I reported them at #5193 (comment). pre.brush:.js.highlight:.[5]These are "Missing conversion rules". According to #3512 and the conversion spreadsheet under the catch-all "(anything else)" in the "classes" tab, we should strip this attribute. dlThese are "Invalid AST transformations". These are two cases where the converter didn't think it could convert the markup, so kept it as HTML. The HTML output looks OK and I don't mind a couple of https://developer.mozilla.org/en-US/docs/Web/JavaScript/Equality_comparisons_and_sameness abbr[title="ECMAScript 5th edition"]These are "Missing conversion rules". Should be stripped out according to the conversion spreadsheet. |
Thank you so much Will, this is great! I'll address the points below Invalid Markdown/HTMLIt looks like we have two prettier-related problems. One was easy to fix, namely I was keeping the newline it was adding to prettified HTML, which busted some markdown nodes. Fixing that one fixed these pages:
But then there is a more gnarly one where prettier once again has interesting ideas wrt how to format nodes, see here. I am now wondering if we want to move the prettier pendulum a bit back again and only format HTML tables as these seem to not cause problems and with these weird cases I'm really at a bit of a loss right now. I could try another quick-fix a la what we did for the multi-line closing tag (https://github.com/mdn/yari/blob/b0dbaed4bc4135b51217400f750179b4a3bebc28/markdown/h2m/utils.ts#L38-L40) but I think that would be trickier here as it would also need to account for elements with attributes. So long story short, my recommendation is we'd limit prettifiying HTML to dfn
span.seoSummaryIt looks like the .blob-code-*Will now be stripped. codeAfter some thinking I think this is what we decided on (not the broken markup, that should be fixed with the prettier workaround mentioned above). Let's look at the first one which is: <code><strong><em>function</em>.arguments</strong></code> This could be turned into markdown code like this: **_`function`_`.arguments`** Which would require changing the nesting. That is something we do for simple cases like pre.brush:.js.highlight:.[5]Added stripping for these cases! abbrAdded a rule for stripping those tags away (and ignoring their title). Thank you again Will, I think this is getting more and more refined. I will create and deploy a new build once I've fixed the summary-issue and merged the PR with all the fixes into yari. |
Thank you, Gregor!
So to be clear this would still prettify Markdown, but only prettify HTML in tables? I think this is a good idea. I don't expect there will be a lot of HTML apart from tables, and most of it will be inline (
That's fine. As long as we can look in the report to find summaries that didn't get converted because of a no-match, then we can fix those in content. I did notice that the report listed one case where the summary would not match, in new.target - I thought about fixing it in content but then thought it might be better to leave it in for a while so it could be a test.
You are right of course! We should keep these as HTML. We can fix them in content if necessary. But anyway 2 cases out of 1000 pages is not worth worrying about IMO. I should update the docs for this case. |
Closed in favour of #7092. |
This is still a work in progress, but I wanted to give people the opportunity to look at it early.
If you want to preview it locally all you have to do is check-out this branch and run
That should be it!
If you want to reproduce this PR:
First you have to set-up yari. It may be advisable to point your Yari at a different clone of mdn/content from the one you use for writing work.
Then you can run:
cd wherever-you-have-stored-yari/ yarn yarn md h2m web/javascript --mode=replace
Then you will see the converted
.md
files in your mdn/content tree.(Note that
--mode=replace
tells the tool to replace.html
files with the.md
files. Otherwise it just adds the.md
files. You probably want replace, because it is closer to the final state and may show up extra problems that could be hidden by the presence of the old.html
files.)To see the rendered pages, run the following to get the preview server running on localhost:3000 (it takes some time to start up):
Conversion Report
The partner PR in yari, which adds markdown conversion rule and overall support for it is here:
mdn/yari#3843
I will try to sync this PR with both converter and content changes every ~other work day.