Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert the spec to Markdown #1357

Closed
wants to merge 3 commits into from

Conversation

yakimun
Copy link

@yakimun yakimun commented Nov 28, 2022

#1335

In this PR, I've tried to start working on the spec conversion from XML to Markdown (currently only the Core document).

I couldn't find a working solution quickly, so I wrote a small program that does most of the conversion. Things like references are a bit more complicated, so it was faster to convert them manually.

The Markdown flawor used here is GFM, but it could be changed quite easily.
GFM has tools for rendering to HTML, so it can be used to publish a specification to a website.

Not all elements from IETF XML can be presented as-is in GFM. So there are rules that I used in the conversion process:

XML GFM
<title> 1st-level header
<abstract>, <note> (from <front> block) block with 2nd-level header
<section> block with n-level header (where n is nesting level)
<t>, <preamble>, <postamble> new paragraph
<list> - or 1. list
<eref> plain URL (will be rendered as link)
<xref> [text](target)
<cref> footnote
<artwork> code block
<spanx>, <sourcecode> inline code block

Also, all JSON code blocks are marked as json, so syntax highlighting is applied for them.
JSON was previously formatted inconsistently, so this has been fixed as well (JSON formatting rules can easily be changed in the future).

Notes

  • <list style="hanging"> - don't see any option to represent it in GFM, so for now converting it as - *hangText* text
  • <cref> - converted as footnote, but quote seems like a good option too.
  • <preamble> and <postamble> - in IETF XML these elements are connected to <artwork>, but in MD they are just new paragraphs.
  • <spanx>, <sourcecode> - I don't see any difference between them in the XML spec, so both elements represented as inline code blocks.

Questions:

  • Is GFM a good choice?
  • Block before the title (containing "workgroup", "published", "expires", etc.), "Status of This Memo" and "Copyright Notice" blocks are not presented in the current version of MD spec. Should them be added?
  • The links generated by <xref> element are rendered as RFC 1111 [RFC1111] and Some header (Section 1.2.3). Is it necessary to follow these rendering rules in MD?

Problems

  • References (links) require more attention than in IETF XML (but a good editor makes this problem less significant).
  • References (block) generation automated in IETF XML. In MD it's done manually.
  • I have absolutely no idea how to validate this. The only option I see is to read the entire document very carefully.

@handrews
Copy link
Contributor

This is a bit premature as there are a number of discussions going on right now about how the next version of the spec should be structured. Those issues marked "SDLC" are not yet ready for PRs.

@handrews
Copy link
Contributor

@yakimun BTW a lot of the currently open issues are not ready for PR. We welcome PRs from new contributors but you probably want to confirm that an issue is ready before working on one. I know not all issues have a discussion about PR-readiness but that's usually because the longer-term contributors have some other context for deciding it's ready (that we should probably be better about documenting).

@jdesrosiers
Copy link
Member

@yakimun Thanks for doing this! I haven't reviewed the result in detail, but it looks like a good start and gets us over the biggest hurdle.

there are a number of discussions going on right now about how the next version of the spec should be structured.

This shouldn't matter. The first step is just to convert what we have to markdown. Any restructuring that needs to be done should be done as a separate step.

The thing that I think makes this a bit premature is that we haven't yet talked about how we are going to make this transition. We'll probably want to to do a code freeze on the spec, then convert to markdown and delete XML, then continue work on the markdown.

Have there been updates to the spec since you did this conversion?

How long would it take to re-convert given the script you've developed? In other words, would it be reasonable for us to ask you to do the conversion again in a month or two if we the team decides not to code freeze and switch over just yet?

Is GFM a good choice?

Yep. We don't know exactly what our needs will be for presenting on the website, so we might need to change at some point, but GFM is a good place to start.

Block before the title (containing "workgroup", "published", "expires", etc.), "Status of This Memo" and "Copyright Notice" blocks are not presented in the current version of MD spec. Should them be added?

I think that's all just IETF stuff and it's ok to leave it out.

The links generated by element are rendered as RFC 1111 [RFC1111] and Some header (Section 1.2.3). Is it necessary to follow these rendering rules in MD?

No. We can render it however we want.

I have absolutely no idea how to validate this. The only option I see is to read the entire document very carefully.

Agreed. It's just going to need eyes on it. We're going to need to go through everything very closely for the next release, so I don't think it's terribly important right now to catch every broken link or visual issue that might be lurking.

@handrews
Copy link
Contributor

It should be easier to determine the restructuring and convert each piece in stages rather than do it all right now, which is a big change that is difficult to review.

@jdesrosiers
Copy link
Member

@handrews I strongly disagree. Let's discuss more a the OCWM.

@yakimun
Copy link
Author

yakimun commented Nov 29, 2022

@jdesrosiers

Have there been updates to the spec since you did this conversion?

I've used the last version (https://github.com/json-schema-org/json-schema-spec/blob/38bc78a0ba5c05e6f5c20cb93bd9f3e1c8a9ba0b/jsonschema-core.xml).

It took a couple of hours to manually fix the MD after automatic conversion. If the XML spec changes a lot in the future, then it's not a big problem to repeat the conversion process from the scratch. If the changes in the XML are minor, then I would just copy them to the MD.

No. We can render it however we want.

So what do you think about removing the second part of the link? (Some link text [SomeTarget] -> Some link text). From my point of view, it makes everything cleaner.

@jdesrosiers
Copy link
Member

it's not a big problem to repeat the conversion process from the scratch.

That's good to hear. We'll try to avoid having to do that, but it's good to know it's an option if we need it. It might be a little while before we have agreement to merge this, so don't be surprised if there's no activity here for a few weeks. It doesn't mean we've forgotten about it.

So what do you think about removing the second part of the link?

I agree. Go ahead and remove it.

@awwright
Copy link
Member

awwright commented Dec 8, 2022

Like @handrews said, there’s still many details to figure out before we can actually start editing the specification in Markdown. It's a bit like changing the side of the road that a whole country drives on. I have a couple of different solutions I'm evaluating myself, and we'll want to review these too. Can you share the script you’re using instead?

@gregsdennis
Copy link
Member

While I can appreciate everyone's hesitation on doing this now, in light of https://github.com/orgs/json-schema-org/discussions/282, I'd like to seriously consider making this conversion earlier rather than later. Yes, there are questions to resolve, but this is still something that we'll need to eventually do.

@handrews
Copy link
Contributor

handrews commented Dec 8, 2022

Yes, I'm on board with changing over now-ish - things changed soon after that first comment of mine. Although comparing a few script solutions as @awwright seems reasonable as long as we don't take too long on it. If there's a script that's really well road-tested already that would mean less work verifying correctness.

The exact presentation of the new markdown is less important as we can change that easily in the future with less disruption than the xml->markdown change.

@yakimun
Copy link
Author

yakimun commented Dec 9, 2022

@awwright https://github.com/yakimun/json-schema-spec-converter

This solution can now automatically generate links, section numbers and other things. So, the output requires almost no manual fixes (autoformatting with any good tool is a good next step though).

The solution quickly gets messy, but that shouldn't be a problem for such a short-lived project.

Some tags conversion produces not perfect output in terms of formatting. For example, there are extra empty lines, and the lists with multiline elements are not padded nicely enough.
But it's easy to fix with any autoformatter. Also, it doesn't affect anything when spec is rendered.

@yakimun
Copy link
Author

yakimun commented Dec 9, 2022

I've added all specs converted to MD. Also, everything has been updated with changes from last commits. I hope this helps.

@awwright
Copy link
Member

awwright commented Mar 28, 2023

I've been able to compile and run the converter, so I'm going to see how many PRs we can merge in or close, before I run this.

Since I'll have to re-run the converter after those other PRs merged in, I'll close this out, and let's continue discussion on #1335.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants