-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Total rework of Emphasis/Strong #1864
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/markedjs/markedjs/7lc52k1xy |
These tests look like they existed solely to cover the CommonMark examples with Strong and Em together that Marked wasn't passing because it output them backwards: `<strong><em>` instead of `<em><strong>`. This is no longer necessary.
// em | ||
if (token = this.tokenizer.em(src, maskedSrc, prevChar)) { | ||
// em & strong | ||
if (token = this.tokenizer.emStrong(src, maskedSrc, prevChar)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would definitely be a breaking change since the tokenizers are part of the public API. Can we do this without combining them? Can we just switch the order of em and strong to get <strong><em>a</em></strong>
to switch to <em><strong>a</strong></em>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, they kind of need to be tackled together to get the right sequence of <em><strong>
to work, which isn't just a stylistic thing. Even though it renders the same as <strong><em>
, the processing to get to that point also clears up several other bugs, especially regarding uneven **text*****
delimiters on both sides.
Edit for clarification: processing em/strong in this way allows following more of the CommonMark specs in a "natural" way that I think will be much easier to maintain (instead of a monstrous, fiddly regex). However, this also means you don't really know if the output is going to be an em
or a strong
until the very end of the process (see the very end of the Tokenizer).
This might be worth putting into a v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After researching quite a few dependants I think it should be fine to combine them since most dependants will change the renderer instead of the tokenizer. This will have to be a major bump to v2 though. I do want to get a few other breaking changes together before releasing v2 so it might be a while before I get to fully reviewing this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds appropriate. I need to review the other PRs you have waiting as well that should go out before this anyway...
There are some other changes I've seen in the issues list that I'd like to lump into a v2 bump as well.
Co-authored-by: Steven <[email protected]>
🎉 This PR is included in version 2.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Description
Fixes em and strong (***〜***) #1860, Fixes Asterisks are not properly escaped #1811
Also fixes Commonmark/GFM examples:
Noticeable speedup, especially on the GFM benchmark (~8.7 sec -> ~8.1 sec, pretty consistent over 5 runs on my laptop)
What was attempted
src
string in Lexer, also mask out escaped\*
and\_
which further simplifies a lot of regex<em><strong>text</strong></em>
over<strong><em>text</em></strong>
*text*********
Note this involves significant changes in the Lexer and Tokenizer APIs, which should be noted in the update.
The new Regex should be pretty benign compared to the earlier stuff. It literally checks for sequences of the pattern
a***b
, that is, runs of*
or_
between a single character on each side.Contributor
Committer
In most cases, this should be a different person than the contributor.