-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support languages like Chinese, Japanese, Thai, etc. #1
Comments
I'm curious, what would be the preferable result in this case? |
It is possible to convert Chinese to pinyin for example: |
Reading that answer it appears that there is no single way to slugify Chinese characters. Even converting them to Pinyin it would be very hard to provide the correct conversion, as the last answer in the question you linked to points out. If you have the translations handy you can add them to your project and then slugify the translation. That would probably be easier than asking slugify to also convert from one language to the other. I believe that's outside the scope of what the library was designed to do |
Could we just leave CJK characters unchanged? Like |
Wikipedia URLs contain unicode characters in their paths, so I figured that was OK and I was looking for a lib to do the same for my non-English site. |
PR welcome for an opt-in options for it. |
mark |
@sindresorhus Yeah, "Ignores Chinese" is a bad title. The Japanese get no love either. 残念。。。 |
@brandonpittman I definitely intend to support languages like Chinese, Japanese, Thai, etc, but it's more work and will take some time. Help is always welcome though. |
If anyone wants to work on this, see the feedback given in sindresorhus/slugify#30. |
We are currently using https://www.npmjs.com/package/transliteration but I'd love to use this library instead. Even basic/minimal support for Chinese/Japanese characters would be good enough for what we need. |
A little tip about the idea of converting Chinese to Pinyin like Conversion to Pinyin could never be 100% accurate, but for most cases, they are totally fine to use as slugs. But, if the generated slugs are expected to be unique, then Pinyin is not good idea. Because it's highly possible that completely different Chinese characters gets converted to the same Pinyin. For example, all |
as the original author of this issue, this popped up in my email. I read & write in basic Chinese. I have 2 thoughts about this:
|
I didn`t. I mean for most cases, they are totally fine to be used as slugs unless unique is required which totally depends on actual use cases. The reason I mentioned this is that I noticed that the current slugify process for supported languages produces unique slugs, though it might be just an unintended side effect.
Is a good idea to reduce the chance of coincidence. |
What I can say is that I needed slugs for URLs. For example someone writes a post titled “我的冬季“ or something like that. So instead of having a URL with an ID like this: I guess everyone will have a different use case. I've already started looking into developing a unique solution with strokes and tones. But this will be just for fun and will be a heavy library which most likely won't be front-end friendly. |
Can we add other languages like https://en.wikipedia.org/wiki/Tifinagh (for Berber languages) to this issue, or is it only related to Asian languages? The solution to allow for some untouched unicode ranges (provided in pull request sindresorhus/slugify#30 that was closed) would be enough for my needs, but I understand it can be a bit difficult to use. Here, the range would be |
Hey it's the year of 2024 and I think a bit of extra tech can be used. I made a GPT for slugify-ing any Chinese text for my blog: https://chat.openai.com/g/g-1jvs433lo-slugifyzhuan-jia I've posted the prompt as a gist here so everyone can reproduce and edit it. Hope this helps in some way. |
It's an interesting idea indeed :-) |
This URL has changed to https://symbl.cc/ |
It's a cool library, but i'm fearful that it won't slugify everything.
Chinese characters are just deleted.
The text was updated successfully, but these errors were encountered: