Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown parser gets multiple-underscores-inside-italics wrong #389

Closed
ljw1004 opened this issue Mar 22, 2016 · 10 comments · Fixed by #760
Closed

Markdown parser gets multiple-underscores-inside-italics wrong #389

ljw1004 opened this issue Mar 22, 2016 · 10 comments · Fixed by #760

Comments

@ljw1004
Copy link

ljw1004 commented Mar 22, 2016

The markdown parser gets underscores-inside-italics wrong. For instance,

A *asterisk_one_two* B

On both GitHub and CommonMark, it's rendered as an italic with underscores:

A asterisk_one_two B

But FSharp.Formatting.Markdown parses it as a nested emphasis:

Emphasis [ Literal "asterisk", Emphasis [ Literal "one"], Literal "two"]

(Actually, to be honest, I don't know what is the correct behavior. I'm assuming that CommonMark is correct here.)

Here are some more examples:

A *asterisk_one_two* B

A *asterisk_one_two_three* B

A *asterisk_one_two_three_four* B

A _underscore_one_two_ B

A _underscore_one_two_three_ B

A _underscore_one_two_three_four_ B

A asterisk_one_two B

A asterisk_one_two_three B

A asterisk_one_two_three_four B

A underscore_one_two B

A underscore_one_two_three B

A underscore_one_two_three_four B

@ljw1004
Copy link
Author

ljw1004 commented Mar 24, 2016

I wonder if this is the same as #376 ?

@matthid
Copy link
Member

matthid commented Mar 24, 2016

thanks for reporting, yes we have tests for the full spec (Im pretty sure for both issues you reported). If you need a near/mid term fix feel free to enable and fix them. There is not much that can go wrong.

@matthid
Copy link
Member

matthid commented Mar 24, 2016

If you need any help/pointers feel free to ask.

@tpetricek
Copy link
Member

I think the behaviour was a bit unspecified before CommonMark.

With CommonMark, I think the rules are that there should be spaces if you want nesting:

this *is_just_underscores*
this *is _nested_ emphasis*

@ljw1004
Copy link
Author

ljw1004 commented Apr 6, 2016

This might be related to #376

@tpetricek
Copy link
Member

Yes, it is. In the previous issue, @amazingant points to the relevant bits of the CommonMark spec:

I can't find a part of the CommonMark spec that explicitly states it rather than just being implied by the left/right flanking delimiter rules, but emphasis with the underscore like some_words_here should be translated as some_words_here, but come out of FSharp.Markdown as some<em>words</em>here. Spec has a few examples of this; makes snake-case value names output wrong 😬

@deviousasti
Copy link
Member

Has there been any resolution to this?
Working around this is a bit ugly.

Does DelimitedMarkdown need to be rewritten?
It looks like the spec has a lot of corner cases.

@abelbraaksma
Copy link
Member

This is a rather unfortunate bug and as issue #2411 shows, leads to "real world" issues. In that particular scenario for Fantomas, the headers are the names of configuration tokens, which all have underscores, making the file as a whole hard to read.

I don't mind taking a stab at this sometime soon (no promises), perhaps @dsyme or @tpetricek know off-hand where things like this get parsed/rendered? Seems to me that the parser needs some look-ahead/behind to determine that _ has spaces around.

@nhirschey
Copy link
Collaborator

I believe it’s here

/// Recognizes some form of emphasis using `**bold**` or `*italic*`
/// (both can be also marked using underscore).
/// TODO: This does not handle nested emphasis well.
let (|Emphasised|_|) =
function
| (('_'
| '*') :: _tail) as input ->
match input with
| DelimitedMarkdown [ '_'; '_'; '_' ] (body, rest)
| DelimitedMarkdown [ '*'; '*'; '*' ] (body, rest) ->
Some(body, Emphasis >> List.singleton >> (fun s -> Strong(s, None)), rest)
| DelimitedMarkdown [ '_'; '_' ] (body, rest)
| DelimitedMarkdown [ '*'; '*' ] (body, rest) -> Some(body, Strong, rest)
| DelimitedMarkdown [ '_' ] (body, rest)
| DelimitedMarkdown [ '*' ] (body, rest) -> Some(body, Emphasis, rest)
| _ -> None
| _ -> None

@abelbraaksma
Copy link
Member

Thanks for fixing this @nojaf and fr merging it in @dsyme!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants