Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown parsing bug: inline code in list item #6284

Closed
lierdakil opened this issue Apr 14, 2020 · 4 comments · Fixed by #6289
Closed

Markdown parsing bug: inline code in list item #6284

lierdakil opened this issue Apr 14, 2020 · 4 comments · Fixed by #6289

Comments

@lierdakil
Copy link
Contributor

Seems like this is as of yet unreported, sorry for the noise if it is. Anywho, given the following Markdown:

1. If `(1) x`, then `2`

Pandoc produces:

[OrderedList (1,Decimal,Period)
 [[Plain [Str "If",Space,Str "`(1)",Space,Str "x",Code ("",[],[]) ", then",Str "2`"]]]]

This is unexpected, I would expect something along the lines of

[OrderedList (1,Decimal,Period)
 [[Plain [Str "If",Space,Code ("",[],[]) "(1) x",Str ",",Space,Str "then",Space,Code ("",[],[]) "2"]]]]

Parser seems to be confused specifically by the (1) part, without it, it works as expected. Also works as expected outside of list items.

Also breaks in the same way with things like

1. `#. x`

Seems like a precedence conflict between inline code and list markers?

@mb21
Copy link
Collaborator

mb21 commented Apr 14, 2020

Indeed, -f commonmark parses it as expected... even with:

1. If `1. x`, then `2`

@jgm jgm added the bug label Apr 15, 2020
@jgm
Copy link
Owner

jgm commented Apr 15, 2020

The cause is pretty clear:

-- parses inline code, between n `s and n `s
code :: PandocMonad m => MarkdownParser m (F Inlines)
code = try $ do
  starts <- many1 (char '`')
  skipSpaces
  result <- (trim . T.concat) <$>
            manyTill (notFollowedBy (inList >> listStart) >>
                      (many1Char (noneOf "`\n") <|> many1Char (char '`') <|>
                       (char '\n' >> notFollowedBy' blankline >> return " ")))                      (try (skipSpaces >> count (length starts) (char '`') >>
                      notFollowedBy (char '`')))

I think the notFollowedBy (inList >> listStart) is there for the following sort of case

1.  hi `there
    1. ok`

which pandoc's markdown doesn't want to treat as a nested list. commonmark makes a saner decision and lets block structure trump inline structure here, so we get a nested list.

See commit 9c4ba81 and #5627. I'm tempted to revert this part of the change; in general, I'd prefer to move towards convergence with commonmark on this kind of thing. But we can see if @leungbk has any comments.

@lierdakil
Copy link
Contributor Author

lierdakil commented Apr 15, 2020

@jgm, not entirely sure I understand.

1.  hi `there
    1. ok`

works along the lines of what I would expect, treating the second line as a lazy continuation of the first line:

<ol type="1">
<li>hi <code>there     1. ok</code></li>
</ol>

Adding a newline also works more or less as I would expect, evidently choosing the block structure over inline code:

1.  hi `there

    1. ok`

produces

<ol type="1">
<li><p>hi `there</p>
<ol type="1">
<li>ok`</li>
</ol></li>
</ol>

These two cases seemingly should be handled by (char '\n' >> notFollowedBy' blankline >> return " ") alternative, although I admit that I'm a bit rusty on parsec.

This, however, doesn't do what I would expect:

1.  hi `1. there ok`

produces

<ol type="1">
<li>hi `1. there ok`</li>
</ol>

Which is seemingly the effect of notFollowedBy (inList >> listStart) -- which checks inList >> listStart exclusively at the start (immediately after the opening backtick and possibly spaces), or after a sequence of backticks, or after a newline.

As far as I can tell, the reason for notFollowedBy (inList >> listStart) is to catch cases like this:

1.  hi `x
2. ok`

But it also (incorrectly) catches cases like

1.  hi `x``2. ok`

which produces

<ol type="1">
<li>hi `x```2. ok`</li>
</ol>

instead of what I would expect (and what commonmark does), i.e.

<ol>
<li>hi <code>x```2. ok</code></li>
</ol>

and the case in the OP.

P.S. From what I can gather (and I might be mistaken!) the fix for the immediate issue and the related case with backticks described above would be to move notFollowedBy (inList >> listStart) check to after char '\n'?

P.P.S. After looking through the code and testing my suspicions, apparently there's also an issue of

1.  `- x`
2.  `* x`
3.  `+ x`

not being parsed as inline code, because -*+ can all start a bullet list.

@jgm
Copy link
Owner

jgm commented Apr 15, 2020

Oh yes, I guess you're right, it's for cases like

1.  hi `x
2. ok`

The small change you recommend sounds good to me for now, even though I'd be more tempted simply to remove the check, which would align more with commonmark behavior.

lierdakil added a commit to lierdakil/pandoc that referenced this issue Apr 15, 2020
@jgm jgm closed this as completed in #6289 Apr 15, 2020
jgm pushed a commit that referenced this issue Apr 15, 2020
Closes #6284.

Previously inline code containing list markers was sometimes parsed incorrectly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants