Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting emphasis with angled quotation marks #645

Closed
alexeyfv opened this issue Jun 19, 2022 · 2 comments · Fixed by #649
Closed

Converting emphasis with angled quotation marks #645

alexeyfv opened this issue Jun 19, 2022 · 2 comments · Fixed by #649
Assignees

Comments

@alexeyfv
Copy link

Hi,

I'm trying to convert a document which contains "«_word_»" string. As you can see on example below, the parser cannot recognize it as emphasis:

var html1 = Markdown.ToHtml("«_word_»"); // "<p>«_word_»</p>\n"

But "_«word»_" has been converted ok:

var html2 = Markdown.ToHtml("_«word»_"); // "<p><em>«word»</em></p>\n"

I'm using Markdig 0.30.2. Is it a bug? If yes, is there any workaround to avoid the issue? Thanks.

@xoofx xoofx added the question label Jun 19, 2022
@xoofx
Copy link
Owner

xoofx commented Jun 19, 2022

Oh, interesting... you might hit a specific case of the specs, as there is a split between the results of the different CommonMark parsers here

So the spec about emphasis is here and I would think that it is not a bug as per the rule:

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

I haven't checked but it is high likely that the character « and » are Unicode punctuation character.

cc: @MihaZupan thoughts?

@MihaZupan MihaZupan added the bug label Jul 17, 2022
@MihaZupan MihaZupan self-assigned this Jul 17, 2022
@MihaZupan
Copy link
Collaborator

MihaZupan commented Jul 17, 2022

This is a bug, our CheckUnicodeCategory helper is not matching what CommonMark defines as Unicode Whitespace and Unicode punctuation.

Specifically, we are off in the 128-255 range (where « and » are) and with our Unicode space categories.

  11 ('♂') Space should be False
 133 ('?') Space should be False
 161 ('¡') Punctuation should be True
 167 ('§') Punctuation should be True
 171 ('«') Punctuation should be True
 182 ('¶') Punctuation should be True
 183 ('·') Punctuation should be True
 187 ('»') Punctuation should be True
 191 ('¿') Punctuation should be True
8232 ('?') Space should be False
8233 ('?') Space should be False

IsWhitespace also isn't matching the spec rn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants