Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent slugs with unicode (emoji) characters #73

Closed
4 tasks done
drizzer14 opened this issue Nov 8, 2021 · 4 comments
Closed
4 tasks done

Inconsistent slugs with unicode (emoji) characters #73

drizzer14 opened this issue Nov 8, 2021 · 4 comments
Labels
🤷 no/invalid This cannot be acted upon 👎 phase/no Post cannot or will not be acted on

Comments

@drizzer14
Copy link

Initial checklist

Problem

In my project, I have unicode characters (emoji) in the headings. Let it be "🏃‍♂️ Heading".
When the TOC is generated, the output url slugs sometimes contain those emojis, although, according to the github-slugger docs this should not be the case.

I have noticed, that mdast-util-toc, as of time of me writing this issue, contains version 1.0.0 of github-slugger in its package.json, which was released way back on September 22nd 2015. Since then, the emoji standard has evolved quite drastically, and some new emojis are ignored in slugs creation. Thus, 🏃‍♂️ Heading's slug becomes #%EF%B8%8F-heading, while, e.g. 🏷 Another Heading strips the emoji correctly – #-another-heading.

Solution

While searching github-slugger issues, I have found this particular one, which suggests that their emoji detection algorithm was at least outdated (or even broken).
As seen in their latest update 1.4.0, they now include the generated regex from emoji-regex in their source code, which is kept up-to-date automatically.

The solution I propose is to keep github-slugger dependency up-to-date and bump its version to 1.4.0 in the package.json, which should solve the outdated emoji detection problem.

Maybe, it's also worth including some of the newer emojis in the tests to verify it's still working correctly. The only present emoji in the unicode test (❤️) was also broken in github-slugger at some point in time and later fixed in their 1.1.2 release.

Alternatives

Alternatively, it may be cool to include some config option to transform/map the url slug on the fly so that it can be modified before actually landing on the parsed AST. Kind of a mapSlug function ((slug: string) => string) or a regexp stripping like stripSlug: RegExp in the search.js function.

But I'd rather just bump the github-slugger version and verify the absence of regressions in tests, as another config option would be an overhead, IMO.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Nov 8, 2021
@wooorm
Copy link
Member

wooorm commented Nov 8, 2021

Heya!

mdast-util-toc, as of time of me writing this issue, contains version 1.0.0 of github-slugger in its package.json

This statement is incorrect. This package uses ^1.0.0 (note the ^). That means that all the work done on github-slugger over the last 5 years is pulled in already.
For more information, see how semver works here: https://semver.npmjs.com, you can input github-slugger and ^1.0.0 and see that all versions are pulled in.
If you have an older github-slugger in your node_modules, you can run npm update to update.

As seen in their latest update 1.4.0, they now include the generated regex from emoji-regex in their source code, which is kept up-to-date automatically.

Not completely, emoji-regex is not what GitHub uses, instead, it was removed in 1.4.0: Flet/github-slugger@af59f34#diff-e727e4bdf3657fd1d798edcd6b099d6e092f8573cba266154583a746bba0f346R1.


Can you describe more of the problem you’re experiencing? What isn’t working?

@drizzer14
Copy link
Author

@wooorm thanks for the quick reply!

I made one step ahead and haven't looked into the real culprit here – React Markdown 🤦‍♂️. Looks like somewhere inside it does not parse the heading correctly and outputs the slug with composite emojis included in it, which is actually not what mdast-util-toc does in isolation.

Sorry for not diving deep enough into the problem and thank you for the great tool!
I suppose this issue can be closed, gonna fight with React Markdown from now on 😄.

@wooorm
Copy link
Member

wooorm commented Nov 8, 2021

👍

are you on the latest react-markdown? It should all just work. It could definitely be a bug somewhere, but I’m not sure what the root problem is!

@wooorm wooorm closed this as completed Nov 8, 2021
@wooorm wooorm added 🤷 no/invalid This cannot be acted upon 👎 phase/no Post cannot or will not be acted on labels Nov 8, 2021
@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Nov 8, 2021
@drizzer14
Copy link
Author

@wooorm I'm on 7.0.1 when the latest is already 7.1.0. Gonna check out the minor update and see if it really just works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤷 no/invalid This cannot be acted upon 👎 phase/no Post cannot or will not be acted on
Development

No branches or pull requests

2 participants