Markdown parsing bug: inline code in list item #6284

lierdakil · 2020-04-14T07:17:26Z

Seems like this is as of yet unreported, sorry for the noise if it is. Anywho, given the following Markdown:

1. If `(1) x`, then `2`

Pandoc produces:

[OrderedList (1,Decimal,Period)
 [[Plain [Str "If",Space,Str "`(1)",Space,Str "x",Code ("",[],[]) ", then",Str "2`"]]]]

This is unexpected, I would expect something along the lines of

[OrderedList (1,Decimal,Period)
 [[Plain [Str "If",Space,Code ("",[],[]) "(1) x",Str ",",Space,Str "then",Space,Code ("",[],[]) "2"]]]]

Parser seems to be confused specifically by the (1) part, without it, it works as expected. Also works as expected outside of list items.

Also breaks in the same way with things like

1. `#. x`

Seems like a precedence conflict between inline code and list markers?

The text was updated successfully, but these errors were encountered:

mb21 · 2020-04-14T18:17:46Z

Indeed, -f commonmark parses it as expected... even with:

1. If `1. x`, then `2`

jgm · 2020-04-15T05:39:33Z

The cause is pretty clear:

-- parses inline code, between n `s and n `s
code :: PandocMonad m => MarkdownParser m (F Inlines)
code = try $ do
  starts <- many1 (char '`')
  skipSpaces
  result <- (trim . T.concat) <$>
            manyTill (notFollowedBy (inList >> listStart) >>
                      (many1Char (noneOf "`\n") <|> many1Char (char '`') <|>
                       (char '\n' >> notFollowedBy' blankline >> return " ")))                      (try (skipSpaces >> count (length starts) (char '`') >>
                      notFollowedBy (char '`')))

I think the notFollowedBy (inList >> listStart) is there for the following sort of case

1.  hi `there
    1. ok`

which pandoc's markdown doesn't want to treat as a nested list. commonmark makes a saner decision and lets block structure trump inline structure here, so we get a nested list.

See commit 9c4ba81 and #5627. I'm tempted to revert this part of the change; in general, I'd prefer to move towards convergence with commonmark on this kind of thing. But we can see if @leungbk has any comments.

lierdakil · 2020-04-15T16:12:17Z

@jgm, not entirely sure I understand.

1.  hi `there
    1. ok`

works along the lines of what I would expect, treating the second line as a lazy continuation of the first line:

<ol type="1">
<li>hi <code>there     1. ok</code></li>
</ol>

Adding a newline also works more or less as I would expect, evidently choosing the block structure over inline code:

1.  hi `there

    1. ok`

produces

<ol type="1">
<li><p>hi `there</p>
<ol type="1">
<li>ok`</li>
</ol></li>
</ol>

These two cases seemingly should be handled by (char '\n' >> notFollowedBy' blankline >> return " ") alternative, although I admit that I'm a bit rusty on parsec.

This, however, doesn't do what I would expect:

1.  hi `1. there ok`

produces

<ol type="1">
<li>hi `1. there ok`</li>
</ol>

Which is seemingly the effect of notFollowedBy (inList >> listStart) -- which checks inList >> listStart exclusively at the start (immediately after the opening backtick and possibly spaces), or after a sequence of backticks, or after a newline.

As far as I can tell, the reason for notFollowedBy (inList >> listStart) is to catch cases like this:

1.  hi `x
2. ok`

But it also (incorrectly) catches cases like

1.  hi `x``2. ok`

which produces

<ol type="1">
<li>hi `x```2. ok`</li>
</ol>

instead of what I would expect (and what commonmark does), i.e.

<ol>
<li>hi <code>x```2. ok</code></li>
</ol>

and the case in the OP.

P.S. From what I can gather (and I might be mistaken!) the fix for the immediate issue and the related case with backticks described above would be to move notFollowedBy (inList >> listStart) check to after char '\n'?

P.P.S. After looking through the code and testing my suspicions, apparently there's also an issue of

1.  `- x`
2.  `* x`
3.  `+ x`

not being parsed as inline code, because -*+ can all start a bullet list.

jgm · 2020-04-15T16:46:56Z

Oh yes, I guess you're right, it's for cases like

1.  hi `x
2. ok`

The small change you recommend sounds good to me for now, even though I'd be more tempted simply to remove the check, which would align more with commonmark behavior.

Closes #6284. Previously inline code containing list markers was sometimes parsed incorrectly.

mb21 added format:Markdown reader labels Apr 14, 2020

jgm added the bug label Apr 15, 2020

lierdakil added a commit to lierdakil/pandoc that referenced this issue Apr 15, 2020

[Markdown Reader] Fix inline code in lists (jgm#6284)

f0f69fa

lierdakil mentioned this issue Apr 15, 2020

[Markdown Reader] Fix inline code in lists (#6284) #6289

Merged

jgm closed this as completed in #6289 Apr 15, 2020

jgm pushed a commit that referenced this issue Apr 15, 2020

Markdown Reader: Fix inline code in lists (#6284)

83c1ce1

Closes #6284. Previously inline code containing list markers was sometimes parsed incorrectly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown parsing bug: inline code in list item #6284

Markdown parsing bug: inline code in list item #6284

lierdakil commented Apr 14, 2020

mb21 commented Apr 14, 2020 •

edited

Loading

jgm commented Apr 15, 2020

lierdakil commented Apr 15, 2020 •

edited

Loading

jgm commented Apr 15, 2020

Markdown parsing bug: inline code in list item #6284

Markdown parsing bug: inline code in list item #6284

Comments

lierdakil commented Apr 14, 2020

mb21 commented Apr 14, 2020 • edited Loading

jgm commented Apr 15, 2020

lierdakil commented Apr 15, 2020 • edited Loading

jgm commented Apr 15, 2020

mb21 commented Apr 14, 2020 •

edited

Loading

lierdakil commented Apr 15, 2020 •

edited

Loading