more strict RST inline markup parsing #17827

a-mr · 2021-04-23T19:24:24Z

Handles cases like:

allow end single backticks with `, e.g. fixes bug 14 from RST minor bugs #17340
```
`\``
```
allow end inline markup with the same symbol as its end-string, e.g. this is legal RST *emphasis** which will be displayed as emphasized emphasis*. This pertains equally to all kinds of inline mark-up, e.g. now it's possible to avoid escaping \ in most cases in single backticks:
```
`proc `+``
```
delete non-standard and undocumented feature that allowed words immediately follow backticks like in `NimNode`s. Now it should be written as `NimNode`\s. Benefit: 2nd syntax can be parsed by other RST implementations.

The difficult part with 1 and edge cases from 2 is to handle escaping by \. The source of bug 14 is that lexer represents repeating punctuation symbols as 1 token, e.g. point 1 example contain 3 tokens `, \, and ``. And e.g. *example\*** is represented *, example, \, and ***. I checked 2 solutions:
A. purely at parser stage
B. add basic escaping at lexer stage, so the example is tokenized as *, example, \, *, **.

Option B. appeared much more easier and it's implemented here. (Option A. lead to much more edge cases handling, I were not able to finish it even preliminarily.)

cc @narimiran

a-mr · 2021-04-23T20:23:50Z

Interestingly, this change fixed interpretation of backspaces in some cases:

check \\ this

check \\\ this

check \\\\ this

check \\\\\ this

check \\\\\\ this

Before:

After (correct):

Araq · 2021-04-26T09:46:25Z

@a-mr While we really appreciate all the hard work you're doing to bring Nim's RST implementation closer to the RST spec, I cannot help but wonder if we should not pursue a different strategy:

In the end Nim will be more compatible with the less successful markup language -- Markdown did win over RST and many people have requested Markdown support. In the past I was against using Markdown instead of RST for Nim's documentation, esp. the Nim manual but since then I've changed my opinion, as long as Nim's builtin tooling can handle Nim's documentation, moving to Markdown is acceptable. (And as long as we don't use Markdown's trailing whitespace feature...)

Araq · 2021-04-26T09:49:54Z

lib/packages/docutils/rst.nim

@@ -277,6 +277,7 @@ type
    line*, col*, baseIndent*: int
    skipPounds*: bool
    adornmentLine*: bool
+    escapeNext*: bool


Interactions between the lexer and the parser via different lexing states is a most ugly setup. I understand why you did it this way and as you said, it's likely the cleanest solution, but these special cases then live in the code for good. Which is why I'm now musing over moving to Markdown instead.

but these special cases then live in the code for good

I don't know, I'm inclined to think it's OK and it should work this way, because \ does have the special escaping meaning for good — in both RST and Markdown.

a-mr · 2021-04-27T13:56:22Z

@Araq thanks for positive words.

I'm not a fan of RST but there is nowhere to move to currently. Having checked major Markdown implementations (Pandoc, Python-markdown, Github) I found that they have very small "common denominator" w.r.t. to RST. There is no even common way to do comments. May be in future one of Markdown version will take a clear lead so we can take as the basis...

May be we can just view RST as extension to Markdown and continue development on basis of rst.nim? Currently we have 2 modes: pure RST and "RST+Markdown extensions". What about adding 3rd mode, which is "Markdown+RST extensions"?

mode	name	`RstParseOption`
1	pure RST
2	RST+Markdown	`roSupportMarkdown`
3	Markdown+RST	`roPreferMarkdown` (+ `roSupportMarkdown`)

And make roPreferMarkdown the default option?
This option would do:

include treatment of backticks in Markdown way while still allowing to add :role: specifier to it
allow to input body elements in Markdown style e.g. don't indent 2nd line in enumeration lists (which often confuses users):

1. list
continuation

Turn off confusing RST block quotes behavior when adding additional indentation transforms block of text to quote.

If your original thought was about completely throwing away all RST features that have Markdown counterpart then I'm against it: in such a case we would implement only our only variant of Markdown which is not used anywhere else. I feel like we still need to implement some standard even if it won't be default option and a user will have to select it explicitly.

Araq · 2021-04-29T15:31:07Z

Unrelated CI failures, merging.

* more strict RST inline markup parsing * add test for unexpected bonus * introduce `roPreferMarkdown`

more strict RST inline markup parsing

9ca4746

add test for unexpected bonus

8990b98

Araq reviewed Apr 26, 2021

View reviewed changes

introduce roPreferMarkdown

9a77e39

Araq merged commit 1640508 into nim-lang:devel Apr 29, 2021

This was referenced May 14, 2021

rst parser makes a line starting with > silently disappear #17987

Closed

RST minor bugs #17340

Open

PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022

more strict RST inline markup parsing (nim-lang#17827)

80146b2

* more strict RST inline markup parsing * add test for unexpected bonus * introduce `roPreferMarkdown`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more strict RST inline markup parsing #17827

more strict RST inline markup parsing #17827

a-mr commented Apr 23, 2021

a-mr commented Apr 23, 2021

Araq commented Apr 26, 2021 •

edited

Loading

Araq Apr 26, 2021

a-mr Apr 28, 2021

a-mr commented Apr 27, 2021

Araq commented Apr 29, 2021

more strict RST inline markup parsing #17827

more strict RST inline markup parsing #17827

Conversation

a-mr commented Apr 23, 2021

a-mr commented Apr 23, 2021

Araq commented Apr 26, 2021 • edited Loading

Araq Apr 26, 2021

Choose a reason for hiding this comment

a-mr Apr 28, 2021

Choose a reason for hiding this comment

a-mr commented Apr 27, 2021

Araq commented Apr 29, 2021

Araq commented Apr 26, 2021 •

edited

Loading