-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Special characters in taxonomy and slugs #1180
Comments
This isn't hard to fix and I understand the motivation for it. We already do some URL normalization of the taxonomies, but probably didn't think about monsieur Depardieu back then. This might be a breaking change (as someone will have some URLs changed), but it's the right thing to do. |
So gerard-depardieu not gérard-depardieu etc. Fixes gohugoio#1180
OK, great for the first and easy part ! Thanks :-) There's one more problem though : I don't want the accent in the URL, but I want it on the archive page (like on the screenshot). With nothing more, I don't see how it could work. Am I wrong ? |
You are wrong. The accents (and some others) are ONLY stripped for the paths (on disk and the URL presented to the user). The taxonomy name will be preserved as written. I added the "Gérard Depardieu" tag to one of my posts to make sure. It has nothing to do with the actor, but I might publish it just to confuse people. |
OK, I retract the last I said above -- there is one more fix to do, will check on that tomorrow. |
I can get this to work in a hackish-kind-of way, but will have to look at this later -- to do a proper fix. |
Well thanks ! I'm impressed, we are definitely not on the WordPress pace here… :-) |
So the taxonomy `Gérard Depardieu` gives paths on the form `gerard-depardieu`. Unfortunately this introduces two imports from the `golang.org/`, but Unicode-normalization isn't something we'd want to write from scratch. See https://blog.golang.org/normalization See gohugoio#1180
Before this commit, taxonomy names were hyphenated, lower-cased and normalized -- then fixed and titleized on the archive page. So what you entered in the front matter isn't necessarily what you got in the final site. To preserve backwards compability, `PreserveTaxonomyNames` is default `false`. Setting it to `true` will preserve what you type (the first characters is made toupper for titles), but normalized in URLs. This also means that, if you manually construct URLs to the archive pages, you will have to pass the Taxonomy names through the `urlize` func. Fixes gohugoio#1180
So the taxonomy `Gérard Depardieu` gives paths on the form `gerard-depardieu`. Unfortunately this introduces two imports from the `golang.org/`, but Unicode-normalization isn't something we'd want to write from scratch. See https://blog.golang.org/normalization See #1180
Interesting @bep, because when you "normalize" Japanese, and remove the "accent" from katakana, the meaning changes completely. In some cases it's unrecognizable or at least quite humorous. |
OK, so that part may have been a bad idea ... I can revert that if I'm convinced ... Hmm, languages. @nicolinuxfr |
Hum, not a bad idea for me anyway. I hope I will be able to keep this really important feature for me. |
@bep, I can give you precise information about which characters in Japanese are losing their "accent" if that will help. For instance:
Please advise how I can assist in figuring it out. |
@nicolinuxfr yes, that was the input I wanted (how important is it). @RickCogley I think the solution is to add an option around this, default old behaviour. I will fix this later tonight. BTW: This is just about the URLs/file paths. |
He he, it's not that we love these, but the meaning is completely different without accents… :-) Thanks for trying to satisfy everyone here ! |
@bep, the bit of the character in Japanese that is getting stripped is called a "dakuten" https://en.wikipedia.org/wiki/Dakuten. There is one that looks like a double quote and one that looks like a circle. After rendering to public using
I'm using "topics" as a taxonomy here. The last 3 lines in the
But Hugo strips the dakuten, and combines the four into two. That is, ba バ pa パ both become ハ. |
@RickCogley I know what we strip and how to not strip to them ... Will fix tonight. |
@nicolinuxfr please add RemovePathAccents = true to your config to keep the behavior you want. |
Great, thanks for keeping me satisfied along with everyone else ! :-) Is it merged yet so I can try using home-brew or should I compile it manually ? |
Didn't you have some success with go get -u .... ? and yes its merged. |
@nicolinuxfr you can install the absolute latest version with |
@dunn are you sure ? I tried to upgrade that way and the build seems old :
EDIT : oh, it seems I have a failed build because of decencies. Well, it doesn't matter, the @bep for me, everything is still fine ! |
@RickCogley yeah, new dependencies have to be added manually, so it can break. I just opened Homebrew/legacy-homebrew#40794; thanks for the heads-up, @nicolinuxfr! |
@dunn ah, I see. Thanks. I hadn't realized that. |
So the taxonomy `Gérard Depardieu` gives paths on the form `gerard-depardieu`. Unfortunately this introduces two imports from the `golang.org/`, but Unicode-normalization isn't something we'd want to write from scratch. See https://blog.golang.org/normalization See gohugoio#1180
Before this commit, taxonomy names were hyphenated, lower-cased and normalized -- then fixed and titleized on the archive page. So what you entered in the front matter isn't necessarily what you got in the final site. To preserve backwards compability, `PreserveTaxonomyNames` is default `false`. Setting it to `true` will preserve what you type (the first characters is made toupper for titles), but normalized in URLs. This also means that, if you manually construct URLs to the archive pages, you will have to pass the Taxonomy names through the `urlize` func. Fixes gohugoio#1180
And default off. Fixes gohugoio#1180
7297c1172 Add note about caching for Hugo Pipes. c91be3403 minor markdown, capitalization and spelling fixes (#1183) fd4a103bf Fix several 404 errors (#1162) 69378bc20 Update related.md 28c24e95f Add note on setting baseURL 7b1502c99 minor typo fix (#1180) 33abeb4fe Update related.md 4887563f6 Update js.md ee5f1de2e Hugo 0.74.3 986ea0c8e releaser: Add release notes to /docs for release of 0.74.3 3299b44bd Fix Asciidoctor args bcb950347 resources/js: Add option for setting bundle format 3f8324918 resources/js: Add es5 build target git-subtree-dir: docs git-subtree-split: 7297c1172754078511ac1c10ca0dfd4cab629506
7297c1172 Add note about caching for Hugo Pipes. c91be3403 minor markdown, capitalization and spelling fixes (gohugoio#1183) fd4a103bf Fix several 404 errors (gohugoio#1162) 69378bc20 Update related.md 28c24e95f Add note on setting baseURL 7b1502c99 minor typo fix (gohugoio#1180) 33abeb4fe Update related.md 4887563f6 Update js.md ee5f1de2e Hugo 0.74.3 986ea0c8e releaser: Add release notes to /docs for release of 0.74.3 3299b44bd Fix Asciidoctor args bcb950347 resources/js: Add option for setting bundle format 3f8324918 resources/js: Add es5 build target git-subtree-dir: docs git-subtree-split: 7297c1172754078511ac1c10ca0dfd4cab629506
Thanks!!!! This is what I was looking for my site on spanish :) |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I'm trying Hugo for my personal blog which has a lot of taxonomies. And as I'm writing in French, many taxonomies have special characters in them, like an accentuated letter.
Right now, I'm using WordPress which has the perfect behavior on this matter. The taxonomy name can have any special characters (for example, "Gérard Depardieu"), the slug associated with it only has standard characters (
gerard-depardieu
). But when you display the taxonomy archive, you still have the special characters : so in this case, you would not have "Gerard Depardieu", but "Gérard Depardieu"). You can see the example live here : http://voiretmanger.fr/acteur/gerard-depardieu/.I don't know if Hugo could do the same. I know WordPress has a database, so it's easier. But I can see some solutions (or hacks) to make it work : either look in the metadata associated with the post to display the name of the taxonomy on the archive page, or have a "table" (a YAML/TOML config file, I guess) with all correspondances.
An idea, to end my Gérard Depardieu example :
I hope a solution will be feasible, because it's the main thing that would be keeping me out of Hugo and with WordPress. I think I can find a solution for every other problems I have…
Thanks anyway for your time !
The text was updated successfully, but these errors were encountered: