Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GitHub Flavored Markdown or at least CommonMark #401

Closed
f3ndot opened this issue Jun 7, 2022 · 8 comments
Closed

Support GitHub Flavored Markdown or at least CommonMark #401

f3ndot opened this issue Jun 7, 2022 · 8 comments

Comments

@f3ndot
Copy link
Contributor

f3ndot commented Jun 7, 2022

Problem Description

Given the proliferation and ubiquity of GitHub, I believe many unassuming developers (myself included) naively believed what Markdown syntax is supported on GH is part of the original 2004 Markdown spec. Even if developers knew the difference, that they would prefer GitHub as that's what's most common. #64 suggests this conflation is a problem.

I personally stumbled across then when I wanted a URL to automatically be hyperlinked with pdoc, not realizing the original spec requires the URLs to be wrapped in < and > characters. In a phrase: I've become spoiled by GFM.

Proposal

Support the GitHub Flavored Markdown spec.

This can take place in a few possible ways:

  • via a new -d gfm or -d github-markdown flag, reserving the -d markdown flag for the original 2004 spec
  • replace the Markdown support in pdoc with the GFM version by default, optionally exposing a -d original-markdown for the 2004 spec
  • A "pick n choose" approach where the most popular/used parts of GFM are supported on top of the current Markdown implementation (I don't really advocate for this one)

Alternatives

I guess not supporting GFM at all?

Independent of my proposal, pdoc should probably declare what version or flavour we colloquially know/call as "Markdown". The original is from 2004 released as a Perl script and has remained largely unchanged, warts and all.

CommonMark appears to be the, well, common and well-defined and versioned specification of the original Markdown. Indeed, GFM is derived from CommonMark!

I would hope that pdoc would choose a version of the CommonMark spec and support that version explicitly.

Additional context

The Differences from original Markdown section of commonmark-spec's README is enlightening.

@f3ndot f3ndot changed the title Support GitHub Flavored Markdown Support GitHub Flavored Markdown or at least CommonMark Jun 7, 2022
@mhils
Copy link
Member

mhils commented Jun 7, 2022

Thanks for raising this! I do agree that getting close to GitHub Flavored Markdown is desirable, but there are a few practical reasons why that's a bit tricky. For some context, pdoc currently uses markdown2 with some extras enabled:

markdown2 is a fast and complete Python implementation of Markdown. It was written to closely match the behaviour of the original Perl-implemented Markdown.pl. Markdown2 also comes with a number of extensions (called "extras") for things like syntax coloring, tables, header-ids.

What's very nice about markdown2 is that it has an excellent track record of being maintained since 2008, it is written in pure Python (which makes distribution much easier), and it is a self-contained file (we include a copy in pdoc, which ensures that we're always API-compatible).

GitHub uses https://github.com/github/cmark-gfm/, which is written in C. While there are Python bindings available (e.g. https://github.com/theacodes/cmarkgfm), but they all require binary wheels or a working compiler on the user's system. This limits us in multiple ways:

  • Python 3.10 shipped on October 4th 2021, but cmarkgfm's first release with 3.10 binary wheels was on December 14th 2021. This means we wouldn't have been able to fully support Python 3.10 for more than two months after its initial release.
  • Likewise, we couldn't easily test the Python 3.11 betas right now because binary wheels for that are not available at https://pypi.org/project/cmarkgfm/#files (which is a scary long list nonetheless).

Now of course we could switch to another pure Python parser, but that hardly improves the situation - they all deviate from GFM in one way or another, and we're already relatively close to GFM anyways. For the record, pure CommonMark is missing a few extras that we want to keep, so it's not a good option either. Long story short, for now I'd suggest we do the following two things:

  1. We enable the link-patterns extra to auto-link URLs.
  2. We wait until the Python GFM situation improves. For example, the situation would be much better if cmarkgfm would provide abi3 wheels that are forward-compatible.

@f3ndot
Copy link
Contributor Author

f3ndot commented Jun 7, 2022

Thanks for responding so quickly! You raise some compelling points, and I can see how that's led to the current state of the project.

While there are Python bindings available (e.g. https://github.com/theacodes/cmarkgfm), but they all require binary wheels or a working compiler on the user's system

Would you consider then letting the user decide by taking advantage of extras? e.g.:

  • pip install pdoc is current behaviour
  • pip install pdoc[cmarkgfm] (or similarly named) for those who are willing to opt in to maybe having to compile due to missing binary wheels. pdoc now supports GFM by default

Under this regime you could also offer a pip install pdoc[commonmark] is for those who wish to use CommonMark, even if it lacks some extensions you like.

We enable the link-patterns extra to auto-link URLs.

Hey, I mean, that's nice :)

I think this may reveal a greater issue of what I want and may find reasonable conflicts with what other's want and find reasonable. For every person who doesn't want <> required, I'm sure there are others who would prefer the explicitness.

Maybe this should be another discussion/issue, but I think the markdown2 extensions pdoc enables should be user-configurable with pdoc choosing sane defaults. Could be CLI flags, could be pdoc.ini, could be whatever makes sense with the least amount of pain. What do you think?

@mhils
Copy link
Member

mhils commented Jun 7, 2022

Would you consider then letting the user decide by taking advantage of extras?

I would be very hesistant to do that because it adds additional complexity1 and long-term maintenance overhead. We need to keep in sync with multiple other Markdown implementations2, which have different feature sets.3 I don't think the benefit justifies the costs.

Could be CLI flags, could be pdoc.ini, could be whatever makes sense with the least amount of pain. What do you think?

One objective for pdoc is to be simple and not require any configuration. Adding additional configuration knobs works against this goal. From my experience, good documentation is usually not held back by a lack of doc tool configuration options. As mentioned in the README, there comes a point where you want to check out Sphinx instead if you require more customizability. We provide a set of command line options for the most common needs, but I don't personally think this crosses the bar. Sorry. :)

If you strongly prefer GFM, you can of course always use pdoc as a library:

#!/usr/bin/env python3
import sys

import cmarkgfm
from pdoc import render
from pdoc.__main__ import cli

render.env.filters["to_html"] = cmarkgfm.github_flavored_markdown_to_html
cli(sys.argv[1:])

The implicit downside is that this is code you need to maintain yourself. 😉

Footnotes

  1. Simply using extras doesn't work because we can't assume a clean venv with pdoc.

  2. See e.g. https://github.com/Zopieux/py-gfm: "The maintainer is also fatigued by Markdown's library constantly breaking their internal API with each minor release."

  3. For example, cmarkgfm doesn't generate a table of contents datastructure as we use it in the menu. Now you also need to communicate to users under which circumstances this might be missing...

mhils added a commit to mhils/pdoc that referenced this issue Jun 7, 2022
@f3ndot
Copy link
Contributor Author

f3ndot commented Jun 7, 2022

Ok makes sense! Thanks for entertaining the request. I may submit a PR adding documentation or a section into the README describing that pdoc supports the original Markdown specification along with a discrete set of extensions/extras to disambiguate/reduce confusion for people in the future.

@daniele-niero
Copy link

Hello @mhils

Mayabe a silly a question, but is render.env.filters["to_html"] = cmarkgfm.github_flavored_markdown_to_html all it takes to use a different markdown backend?

I'm using https://github.com/executablebooks/markdown-it-py for other things and if I could just whoe-horn it into pdoc (at my peril, of course) would make things more consistent for us.
We want a Commonmark and expandable markdown library, so we would really prefer markdown-it-py over markdown2 (despite the horrible name)

@mhils
Copy link
Member

mhils commented Nov 15, 2022

Mayabe a silly a question, but is render.env.filters["to_html"] = cmarkgfm.github_flavored_markdown_to_html all it takes to use a different markdown backend?

Pretty much, yes. Two minor limitations come to mind:

  • pdoc.search currently uses pdoc.render_helpers.to_html directly, and will not pick up the change. I'd be happy to merge a PR that makes it use env if that is important to you.
  • markdown2 generates a table of contents for the top-level docstring, which we work into the "Contents" section in the navigation (see e.g. here: https://pdoc.dev/docs/pdoc.html). Unlikely that another library implements that the same way, but it can probably be mocked somehow.

@daniele-niero
Copy link

Well, it just worked!
Maybe I will find some issues later, but this is great so far.
I think you should put it in the documentation, as an unsupported but possible option. :)

@mhils
Copy link
Member

mhils commented Nov 15, 2022

Done, thanks for the good suggestion! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants