markdown: parse fenced_code_attributes extension #445

edwintorok · 2024-01-17T00:12:39Z

Pandoc supports this extension: https://pandoc.org/MANUAL.html#extension-fenced_code_attributes

And this:

Recognize them in the lexer. Try to limit the complexity of the regular expression by splitting off parsing of attributes into a separate 'parse' (otherwise we hit automata size limits in ocamllex).

According to
https://quarto.org/docs/authoring/markdown-basics.html#ordering-of-attributes the ordering has to be:

#identifiers
.classes
key-value attributes

For now on output we always normalize to this form (which isn't ideal, but could be improved later):

I initially tried to fully parse the attributes, but I've exceeded the maximum size of the ocamllex automaton, so I kept it simple in this PR (and do some minimal parsing in OCaml later, note that key-value pairs aren't split correctly, but when joined backed together they retain the original value).

Pandoc supports this extension: https://pandoc.org/MANUAL.html#extension-fenced_code_attributes ``` {#identifier .language attr="value"} ``` And this: ``` language {#identifier attr="value"} ``` Recognize them in the lexer. Try to limit the complexity of the regular expression by splitting off parsing of attributes into a separate 'parse' (otherwise we hit automata size limits in `ocamllex`). According to https://quarto.org/docs/authoring/markdown-basics.html#ordering-of-attributes the ordering has to be: * #identifiers * .classes * key-value attributes Signed-off-by: Edwin Török <[email protected]>

edwintorok · 2024-01-17T00:17:17Z

(this PR might need some wider testing to check it doesn't break backwards compatibility, is there a larger corpus you'd normally test changes like this on? e.g. the realworldocaml book, anything else?)

edwintorok mentioned this pull request Jan 17, 2024

ocamllex raises Invalid_argument String.sub/Bytes.sub and sometimes segfaults ocaml/ocaml#12901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

markdown: parse fenced_code_attributes extension #445

markdown: parse fenced_code_attributes extension #445

edwintorok commented Jan 17, 2024

edwintorok commented Jan 17, 2024

markdown: parse fenced_code_attributes extension #445

Are you sure you want to change the base?

markdown: parse fenced_code_attributes extension #445

Conversation

edwintorok commented Jan 17, 2024

edwintorok commented Jan 17, 2024