-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Footnotes extension #332
Footnotes extension #332
Conversation
See `_scan_footnote_definition` in cmark-gfm.
It started out limited but now it covers all types of links/images, knows about destination and title, etc.
This turned out to be tricky, and GitHub gets some of it wrong. If anyone ever wants us to be bug-compatible, it should be relatively straightforward to emulate GitHub by just running the initial reference search over everything (including definitions) and then not bothering with finding more at the end.
I've found some interesting behaviors (bugs) in GitHub's implementation while working on this. E.g.:
That shouldn't render anything, because only 2 is referenced but from another footnote that is not rendered. But GitHub renders "Two" as the only footnote, with a "back" link pointing nowhere. Another one:
The order of footnotes should be One (referenced first in the text), then Two (referenced from a footnote). But GitHub renders Two first (because it finds the I've decided not to follow GitHub's implementation for these edge cases, but instead go for the nicest result. See docs on FootnoteHtmlNodeRenderer.java for more about this. If bug-for-bug-compatibility is required at some point it should be simple enough to add as an option. |
The resulting implementation for the footnotes extension is much nicer. It also cleans up LinkInfo and makes images less of a special case. Additionally, this allow inline parsing of markers that are not part of links - could have done this without this change but noticed it here and decided to fix it.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #332 +/- ##
============================================
- Coverage 95.05% 95.01% -0.05%
Complexity 254 254
============================================
Files 131 136 +5
Lines 4185 4350 +165
Branches 600 617 +17
============================================
+ Hits 3978 4133 +155
- Misses 111 116 +5
- Partials 96 101 +5
|
This adds a new extension
commonmark-ext-footnotes
(classorg.commonmark.ext.footnotes.FootnotesExtension
) to implement footnotes syntax as in GitHub Flavored Markdown (see docs). Fixes #273.An example:
The
[^1]
is parsed as aFootnoteReference
, with1
being the label.The line with
[^1]: ...
is aFootnoteDefinition
, with the contents as child nodes (can be a paragraph like in the example, or other blocks like lists).Apart from the parsing, the extension also comes with rendering of footnotes for HTML and Markdown.
Extension mechanisms
In order to implement this as a separate extension, the following APIs were added to commonmark core:
DefinitionMap
: New class for storing and looking up definitions by a label, with label normalization as for link reference definitionsBlockParser
: New methodgetDefinitions
that can be implemented to return definitions that can later be accessed during inline parsing (the built-inParagraphParser
also uses that mechanism now; previously it was a special case in the parser)LinkProcessor
: New interface that can be implemented to customize link/image processing. This is used to turn[^1]
intoFootnoteReference
nodes.NodeRenderer
: New methodsbeforeRoot
andafterRoot
that are called before/after rendering a document; used to render footnotes at the end of the documentAlternatives considered
PostProcessor
Could footnote reference parsing have been implemented as a
PostProcessor
step after inline parsing? No, because a foonote reference like[^*foo*]
would have been turned into emphasis by inline parsing, whereas footnote parsing needs the raw*foo*
as a label.InlineContentParser
I considered using the recently-added inline parsing customization API, using
[
as the trigger character. That would work for simple cases, but not for others. E.g. in this:That is not a footnote followed by
(/url)
, but instead it's an inline link. In other words, if parsing as a link is possible, that is preferred.That means our custom inline parser for
[
would have to be able to parse the full link syntax in order to give preference to links, which is quite tricky. In addition to that, it would have have to trigger on!
, for a footnote like![^foo]
, which normally would be parsed as an image node.So that's what
LinkProcessor
solves: It keeps the tricky link parsing in the inline parser, but allows extensions to decide to treat certain things not as links, but different types of nodes, or maybe even parse things that come after a link (e.g. image attributes could be implemented on top of this).