-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent slugs with unicode (emoji) characters #73
Comments
Heya!
This statement is incorrect. This package uses
Not completely, Can you describe more of the problem you’re experiencing? What isn’t working? |
@wooorm thanks for the quick reply! I made one step ahead and haven't looked into the real culprit here – React Markdown 🤦♂️. Looks like somewhere inside it does not parse the heading correctly and outputs the slug with composite emojis included in it, which is actually not what Sorry for not diving deep enough into the problem and thank you for the great tool! |
👍 are you on the latest |
@wooorm I'm on |
Initial checklist
Problem
In my project, I have unicode characters (emoji) in the headings. Let it be "🏃♂️ Heading".
When the TOC is generated, the output url slugs sometimes contain those emojis, although, according to the
github-slugger
docs this should not be the case.I have noticed, that
mdast-util-toc
, as of time of me writing this issue, contains version1.0.0
ofgithub-slugger
in itspackage.json
, which was released way back on September 22nd 2015. Since then, the emoji standard has evolved quite drastically, and some new emojis are ignored in slugs creation. Thus,🏃♂️ Heading
's slug becomes#%EF%B8%8F-heading
, while, e.g.🏷 Another Heading
strips the emoji correctly –#-another-heading
.Solution
While searching
github-slugger
issues, I have found this particular one, which suggests that their emoji detection algorithm was at least outdated (or even broken).As seen in their latest update 1.4.0, they now include the generated regex from
emoji-regex
in their source code, which is kept up-to-date automatically.The solution I propose is to keep
github-slugger
dependency up-to-date and bump its version to1.4.0
in thepackage.json
, which should solve the outdated emoji detection problem.Maybe, it's also worth including some of the newer emojis in the tests to verify it's still working correctly. The only present emoji in the
unicode
test (❤️) was also broken ingithub-slugger
at some point in time and later fixed in their1.1.2
release.Alternatives
Alternatively, it may be cool to include some config option to transform/map the url slug on the fly so that it can be modified before actually landing on the parsed AST. Kind of a
mapSlug
function ((slug: string) => string
) or a regexp stripping likestripSlug: RegExp
in thesearch.js
function.But I'd rather just bump the
github-slugger
version and verify the absence of regressions in tests, as another config option would be an overhead, IMO.The text was updated successfully, but these errors were encountered: