Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect emphasis handling #383

Closed
mity opened this issue May 12, 2021 · 1 comment
Closed

Incorrect emphasis handling #383

mity opened this issue May 12, 2021 · 1 comment

Comments

@mity
Copy link
Contributor

mity commented May 12, 2021

(Distilled from https://talk.commonmark.org/t/i-dont-understand-how-emphasis-is-parsed/3866)

Input:

*****Hello*world****

Actual Output:

<p>*****Hello<em>world</em>***</p>

Expected Output:

<p>**<em><strong>Hello<em>world</em></strong></em></p>

More detailed rationale can be found in this comment: https://talk.commonmark.org/t/i-dont-understand-how-emphasis-is-parsed/3866/8

@jgm
Copy link
Member

jgm commented Jun 15, 2021

Reading the algorithm at the end of the spec, I think I see the issue. We have an openers_bottom table that limits how far back you have to look for an opener. It is indexed to the type of delimiter (_, *) and the length of the closing delimiter mod 3. So after we fail to match the opener ***** to *, we set the openers_bottom for (*, 1) to the location of *, effectively removing the ***** as a possible opener for any run of *s with a length mod 3 of 1, including the final **** in this example. This procedure ignores the fact that the length mod 3 thing only matters if one of the delimiters can be both an opener and a closer.J

jgm added a commit that referenced this issue Jun 15, 2021
jgm added a commit that referenced this issue Jun 16, 2021
@jgm jgm closed this as completed in dc9366c Jun 16, 2021
jgm added a commit that referenced this issue Jun 17, 2021
jgm added a commit to commonmark/commonmark-spec that referenced this issue Jun 17, 2021
jgm added a commit to commonmark/commonmark.js that referenced this issue Jun 17, 2021
jgm added a commit that referenced this issue Jun 19, 2021
jgm added a commit that referenced this issue Jun 19, 2021
This reverts commit dc9366c.
jgm added a commit that referenced this issue Jun 19, 2021
The problem arose as follows. The input was

```
*****Hello*world****
```

We have an `openers_bottom` table that limits how far back you have to
look for an opener. It is indexed to the type of delimiter (`_` or `*`)
and the length of the closing delimiter mod 3.  So after we fail to match
the opener `*****` to `*`, we set the openers_bottom for `(*, 1)` to the
location of `*`, effectively removing the `*****` as a possible opener for
any run of `*`s with a length mod 3 of 1, including the final `****` in this
example. This procedure ignores the fact that the length mod 3
restriction only matters if one of the delimiters can be both an opener
and a closer.

To fix this problem, we index the `openers_bottom` table not just to
the type of delimiter and the length of the closing delimiter mod 3,
but to whether the closing delimiter can also be an opener.
jgm added a commit to commonmark/commonmark.js that referenced this issue Jun 19, 2021
rlidwka added a commit to markdown-it/markdown-it that referenced this issue Jun 30, 2021
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Feb 11, 2022
0.2.1.1
* Fix bug in prettyShow for SourceRange (#80). The bug led to an infinite
  loop in certain cases.

0.2.1
* Use official 0.30 spec.txt.
* Update HTML block parser for recent spec changes.
* Fix test case from commonmark/cmark#383. We need to index the list of
  stack bottoms not just by the length mod 3 of the closer but by whether
  it can be an opener, since this goes into the calculation of whether the
  delimiters can match.

0.2
* Commonmark.Inlines: export LinkInfo(..) [API change].
* Commonmark.Inlines: export pLink [API chage].
* Comonmark.ReferenceMap: Add linkPos field to LinkInfo [API change].
* Commonmark.Tokens: normalize unicode to NFC before tokenizing
  (#57). Normalization might affect detection of flankingness, recognition
  of reference links, etc.
* Commonmark.Html: add data-prefix to non-HTML5 attributes, as pandoc does.
* Remove unnecessary build-depends.
* Use lightweight tasty-bench instead of criterion for benchmarks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants