Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes in 1.1 #4

Closed
revin opened this issue Apr 22, 2016 · 5 comments
Closed

Breaking changes in 1.1 #4

revin opened this issue Apr 22, 2016 · 5 comments

Comments

@revin
Copy link
Contributor

revin commented Apr 22, 2016

Hello! Greenkeeper just notified me that github-slugger 1.1 broke npm/marky-markdown; the issue is that in 1.0, unicode emoji characters in headings were being stripped out, but now they're being converted to HTML entities.

For example, given ## 😄-😄 unicode hyphen unicode:

  • 1.0 rendered --unicode-hyphen-unicode
  • 1.1 now renders 😄-😄-unicode-hyphen-unicode

...which has broken a handful of our tests. Is there a way to get the old behavior? Thanks! 👍

@wooorm
Copy link
Collaborator

wooorm commented Apr 26, 2016

Thanks @revin!

The introduced changes were trying to fix a bug in github-slugger, which can be seen is this heading as slugged by GitHub: https://github.com/wooorm/gh-and-npm-slug-generation#Привет-non-latin-你好, unfortunately, it seems to be too loose.

Now I’m wondering what exact characters are allowed by GH in slugs; white space and punctuation, and emoji? Thoughts?

@Flet
Copy link
Owner

Flet commented Apr 26, 2016

Apologies @revin! If we need to revert and/or redo this I'm fine with it!

@revin
Copy link
Contributor Author

revin commented Apr 26, 2016

GitHub says they use vmg/redcarpet (which is mostly C code) to do the markdown rendering. So the answers should be there, or at least that's the first place I plan on looking when I get a chance.

@wooorm
Copy link
Collaborator

wooorm commented Apr 26, 2016

There’s also github/markup, but I couldn’t find any reference to the jargon word “slug” in those repos.

I’ve included the input/output from wooorm/gh-and-npm-slug-generation and chrisdickinson/emoji-slug-example in the tests locally, and the only difference from the current algorithm seem to be the emoji. I’ll look into integrating mathiasbynens/emoji-regex, and if that fixes things.

@revin
Copy link
Contributor Author

revin commented Apr 26, 2016

That might do it. I was reading through https://github.com/vmg/redcarpet/blob/master/ext/redcarpet/html.c#L273-L322 and now have a bunch of tabs open to see what the unicode support situation is on C's standard library functions. 😕

wooorm added a commit that referenced this issue Apr 26, 2016
This update ensures emoji are correctly stripped (GH-4), and non-Latin
characters are not incorrectly lower-cased (GH-6), and adds a bunch of
(verified) test cases to ensure the future holds less broken builds.

Closes GH-4.
Closes GH-5.
Closes GH-8.
@Flet Flet closed this as completed in #8 Apr 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants