Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more strict RST inline markup parsing #17827

Merged
merged 3 commits into from
Apr 29, 2021
Merged

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented Apr 23, 2021

Handles cases like:

  1. allow end single backticks with `, e.g. fixes bug 14 from RST minor bugs #17340
    `\``
    
  2. allow end inline markup with the same symbol as its end-string, e.g. this is legal RST *emphasis** which will be displayed as emphasized emphasis*. This pertains equally to all kinds of inline mark-up, e.g. now it's possible to avoid escaping \ in most cases in single backticks:
    `proc `+``
    
  3. delete non-standard and undocumented feature that allowed words immediately follow backticks like in `NimNode`s. Now it should be written as `NimNode`\s. Benefit: 2nd syntax can be parsed by other RST implementations.

The difficult part with 1 and edge cases from 2 is to handle escaping by \. The source of bug 14 is that lexer represents repeating punctuation symbols as 1 token, e.g. point 1 example contain 3 tokens `, \, and ``. And e.g. *example\*** is represented *, example, \, and ***. I checked 2 solutions:
A. purely at parser stage
B. add basic escaping at lexer stage, so the example is tokenized as *, example, \, *, **.

Option B. appeared much more easier and it's implemented here. (Option A. lead to much more edge cases handling, I were not able to finish it even preliminarily.)

cc @narimiran

@a-mr
Copy link
Contributor Author

a-mr commented Apr 23, 2021

Interestingly, this change fixed interpretation of backspaces in some cases:

check \\ this

check \\\ this

check \\\\ this

check \\\\\ this

check \\\\\\ this

Before:
image

After (correct):
image

@Araq
Copy link
Member

Araq commented Apr 26, 2021

@a-mr While we really appreciate all the hard work you're doing to bring Nim's RST implementation closer to the RST spec, I cannot help but wonder if we should not pursue a different strategy:

In the end Nim will be more compatible with the less successful markup language -- Markdown did win over RST and many people have requested Markdown support. In the past I was against using Markdown instead of RST for Nim's documentation, esp. the Nim manual but since then I've changed my opinion, as long as Nim's builtin tooling can handle Nim's documentation, moving to Markdown is acceptable. (And as long as we don't use Markdown's trailing whitespace feature...)

@@ -277,6 +277,7 @@ type
line*, col*, baseIndent*: int
skipPounds*: bool
adornmentLine*: bool
escapeNext*: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interactions between the lexer and the parser via different lexing states is a most ugly setup. I understand why you did it this way and as you said, it's likely the cleanest solution, but these special cases then live in the code for good. Which is why I'm now musing over moving to Markdown instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but these special cases then live in the code for good

I don't know, I'm inclined to think it's OK and it should work this way, because \ does have the special escaping meaning for good — in both RST and Markdown.

@a-mr
Copy link
Contributor Author

a-mr commented Apr 27, 2021

@Araq thanks for positive words.

I'm not a fan of RST but there is nowhere to move to currently. Having checked major Markdown implementations (Pandoc, Python-markdown, Github) I found that they have very small "common denominator" w.r.t. to RST. There is no even common way to do comments. May be in future one of Markdown version will take a clear lead so we can take as the basis...

May be we can just view RST as extension to Markdown and continue development on basis of rst.nim? Currently we have 2 modes: pure RST and "RST+Markdown extensions". What about adding 3rd mode, which is "Markdown+RST extensions"?

mode name RstParseOption
1 pure RST
2 RST+Markdown roSupportMarkdown
3 Markdown+RST roPreferMarkdown (+ roSupportMarkdown)

And make roPreferMarkdown the default option?
This option would do:

  1. include treatment of backticks in Markdown way while still allowing to add :role: specifier to it
  2. allow to input body elements in Markdown style e.g. don't indent 2nd line in enumeration lists (which often confuses users):
1. list
continuation
  1. Turn off confusing RST block quotes behavior when adding additional indentation transforms block of text to quote.

If your original thought was about completely throwing away all RST features that have Markdown counterpart then I'm against it: in such a case we would implement only our only variant of Markdown which is not used anywhere else. I feel like we still need to implement some standard even if it won't be default option and a user will have to select it explicitly.

@Araq
Copy link
Member

Araq commented Apr 29, 2021

Unrelated CI failures, merging.

@Araq Araq merged commit 1640508 into nim-lang:devel Apr 29, 2021
PMunch pushed a commit to PMunch/Nim that referenced this pull request Mar 28, 2022
* more strict RST inline markup parsing
* add test for unexpected bonus
* introduce `roPreferMarkdown`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants