Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters in taxonomy and slugs #1180

Closed
nicolinuxfr opened this issue May 29, 2015 · 25 comments
Closed

Special characters in taxonomy and slugs #1180

nicolinuxfr opened this issue May 29, 2015 · 25 comments

Comments

@nicolinuxfr
Copy link

I'm trying Hugo for my personal blog which has a lot of taxonomies. And as I'm writing in French, many taxonomies have special characters in them, like an accentuated letter.

Right now, I'm using WordPress which has the perfect behavior on this matter. The taxonomy name can have any special characters (for example, "Gérard Depardieu"), the slug associated with it only has standard characters (gerard-depardieu). But when you display the taxonomy archive, you still have the special characters : so in this case, you would not have "Gerard Depardieu", but "Gérard Depardieu"). You can see the example live here : http://voiretmanger.fr/acteur/gerard-depardieu/.

taxonomy-example

I don't know if Hugo could do the same. I know WordPress has a database, so it's easier. But I can see some solutions (or hacks) to make it work : either look in the metadata associated with the post to display the name of the taxonomy on the archive page, or have a "table" (a YAML/TOML config file, I guess) with all correspondances.

An idea, to end my Gérard Depardieu example :

gerard-depardieu: "Gérard Depardieu"

I hope a solution will be feasible, because it's the main thing that would be keeping me out of Hugo and with WordPress. I think I can find a solution for every other problems I have…

Thanks anyway for your time !

@bep
Copy link
Member

bep commented May 29, 2015

This isn't hard to fix and I understand the motivation for it.

We already do some URL normalization of the taxonomies, but probably didn't think about monsieur Depardieu back then. This might be a breaking change (as someone will have some URLs changed), but it's the right thing to do.

@bep bep added this to the v0.15 milestone May 29, 2015
@bep bep self-assigned this May 29, 2015
bep added a commit to bep/hugo that referenced this issue May 29, 2015
So gerard-depardieu not gérard-depardieu etc.

Fixes gohugoio#1180
@nicolinuxfr
Copy link
Author

OK, great for the first and easy part ! Thanks :-)

There's one more problem though : I don't want the accent in the URL, but I want it on the archive page (like on the screenshot).

With nothing more, I don't see how it could work. Am I wrong ?

@bep
Copy link
Member

bep commented May 29, 2015

You are wrong. The accents (and some others) are ONLY stripped for the paths (on disk and the URL presented to the user). The taxonomy name will be preserved as written.

I added the "Gérard Depardieu" tag to one of my posts to make sure. It has nothing to do with the actor, but I might publish it just to confuse people.

@bep
Copy link
Member

bep commented May 29, 2015

OK, I retract the last I said above -- there is one more fix to do, will check on that tomorrow.

@bep
Copy link
Member

bep commented May 29, 2015

I can get this to work in a hackish-kind-of way, but will have to look at this later -- to do a proper fix.

@bep bep removed their assignment May 30, 2015
@nicolinuxfr
Copy link
Author

Well thanks ! I'm impressed, we are definitely not on the WordPress pace here… :-)

bep added a commit to bep/hugo that referenced this issue May 30, 2015
So the taxonomy `Gérard Depardieu` gives paths on the form `gerard-depardieu`.

Unfortunately this introduces two imports from the `golang.org/`, but Unicode-normalization isn't something we'd want to write from scratch.

See https://blog.golang.org/normalization

See gohugoio#1180
bep added a commit to bep/hugo that referenced this issue May 30, 2015
Before this commit, taxonomy names were hyphenated, lower-cased and normalized -- then fixed and titleized on the archive page.

So what you entered in the front matter isn't necessarily what you got in the final site.

To preserve backwards compability, `PreserveTaxonomyNames` is default `false`.

Setting it to `true` will preserve what you type (the first characters is made toupper for titles), but normalized in URLs.

This also means that, if you manually construct URLs to the archive pages, you will have to pass the Taxonomy names through the `urlize` func.

Fixes gohugoio#1180
bep added a commit that referenced this issue May 31, 2015
So the taxonomy `Gérard Depardieu` gives paths on the form `gerard-depardieu`.

Unfortunately this introduces two imports from the `golang.org/`, but Unicode-normalization isn't something we'd want to write from scratch.

See https://blog.golang.org/normalization

See #1180
@bep bep closed this as completed in be38acd May 31, 2015
@nicolinuxfr
Copy link
Author

Just a quick note to thank bep for his work… it works exactly as I wanted it ! So it's perfect as far as I am concerned. :-)

capture d ecran 2015-06-01 a 17 44 13

@RickCogley
Copy link
Contributor

Interesting @bep, because when you "normalize" Japanese, and remove the "accent" from katakana, the meaning changes completely. In some cases it's unrecognizable or at least quite humorous.

@bep
Copy link
Member

bep commented Jun 16, 2015

OK, so that part may have been a bad idea ... I can revert that if I'm convinced ... Hmm, languages. @nicolinuxfr

@bep bep reopened this Jun 16, 2015
@nicolinuxfr
Copy link
Author

Hum, not a bad idea for me anyway. I hope I will be able to keep this really important feature for me.

@RickCogley
Copy link
Contributor

@bep, I can give you precise information about which characters in Japanese are losing their "accent" if that will help.

For instance:

ビ  going to ヒ

Please advise how I can assist in figuring it out.

@bep
Copy link
Member

bep commented Jun 16, 2015

@nicolinuxfr yes, that was the input I wanted (how important is it). @RickCogley I think the solution is to add an option around this, default old behaviour.

I will fix this later tonight. BTW: This is just about the URLs/file paths.

@nicolinuxfr
Copy link
Author

He he, it's not that we love these, but the meaning is completely different without accents… :-)

Thanks for trying to satisfy everyone here !

@RickCogley
Copy link
Contributor

@bep, the bit of the character in Japanese that is getting stripped is called a "dakuten" https://en.wikipedia.org/wiki/Dakuten. There is one that looks like a double quote and one that looks like a circle. After rendering to public using hugo server, I see this:

rcogley@jrcmbp2015:~/dev/eSolia/public/ja/topics|rc-working-2⚡
⇒  ll
total 8
drwxr-xr-x  4 rcogley  staff   136 Jun 16 22:22 about
-rw-r--r--  1 rcogley  staff     0 Jun 16 22:23 index.html
-rw-r--r--  1 rcogley  staff  1928 Jun 16 22:23 index.xml
drwxr-xr-x  4 rcogley  staff   136 Jun 14 20:58 professional
drwxr-xr-x  4 rcogley  staff   136 Jun 14 20:58 お問い合わせ
drwxr-xr-x  4 rcogley  staff   136 Jun 16 22:25 はひふへほ
drwxr-xr-x  4 rcogley  staff   136 Jun 14 20:58 サーヒス
drwxr-xr-x  4 rcogley  staff   136 Jun 16 22:25 ハヒフヘホ

I'm using "topics" as a taxonomy here. The last 3 lines in the ll output have these marks, and are supposed to be:

topics:
  - About
  - ばびぶべぼ
  - バビブベボ
  - ぱぴぷぺぽ
  - パピプペポ

But Hugo strips the dakuten, and combines the four into two. That is, ba バ pa パ both become ハ.

@bep
Copy link
Member

bep commented Jun 16, 2015

@RickCogley I know what we strip and how to not strip to them ... Will fix tonight.

@bep bep closed this as completed in 4b7c134 Jun 16, 2015
@bep
Copy link
Member

bep commented Jun 16, 2015

@nicolinuxfr please add RemovePathAccents = true to your config to keep the behavior you want.

@nicolinuxfr
Copy link
Author

Great, thanks for keeping me satisfied along with everyone else ! :-)

Is it merged yet so I can try using home-brew or should I compile it manually ?

@bep
Copy link
Member

bep commented Jun 16, 2015

Didn't you have some success with go get -u .... ? and yes its merged.

@dunn
Copy link
Contributor

dunn commented Jun 16, 2015

@nicolinuxfr you can install the absolute latest version with brew install --HEAD hugo

@nicolinuxfr
Copy link
Author

@dunn are you sure ? I tried to upgrade that way and the build seems old :

Hugo Static Site Generator v0.14 BuildDate: 2015-05-26T18:46:46+02:00

EDIT : oh, it seems I have a failed build because of decencies. Well, it doesn't matter, the go get -u -v github.com/spf13/hugo worked perfectly.

@bep for me, everything is still fine !

@RickCogley
Copy link
Contributor

@bep, thanks, after a recompile with go get -u -v github.com/spf13/hugo, I have proper Japanese again. :-)

@dunn, I stopped using brew install --HEAD hugo after finding it wonky a while back, but the above always works for me.

@dunn
Copy link
Contributor

dunn commented Jun 16, 2015

@RickCogley yeah, new dependencies have to be added manually, so it can break. I just opened Homebrew/legacy-homebrew#40794; thanks for the heads-up, @nicolinuxfr!

@RickCogley
Copy link
Contributor

@dunn ah, I see. Thanks. I hadn't realized that.

tychoish pushed a commit to tychoish/hugo that referenced this issue Aug 13, 2017
So the taxonomy `Gérard Depardieu` gives paths on the form `gerard-depardieu`.

Unfortunately this introduces two imports from the `golang.org/`, but Unicode-normalization isn't something we'd want to write from scratch.

See https://blog.golang.org/normalization

See gohugoio#1180
tychoish pushed a commit to tychoish/hugo that referenced this issue Aug 13, 2017
Before this commit, taxonomy names were hyphenated, lower-cased and normalized -- then fixed and titleized on the archive page.

So what you entered in the front matter isn't necessarily what you got in the final site.

To preserve backwards compability, `PreserveTaxonomyNames` is default `false`.

Setting it to `true` will preserve what you type (the first characters is made toupper for titles), but normalized in URLs.

This also means that, if you manually construct URLs to the archive pages, you will have to pass the Taxonomy names through the `urlize` func.

Fixes gohugoio#1180
tychoish pushed a commit to tychoish/hugo that referenced this issue Aug 13, 2017
bep added a commit that referenced this issue Aug 14, 2020
7297c1172 Add note about caching for Hugo Pipes.
c91be3403 minor markdown, capitalization and spelling fixes (#1183)
fd4a103bf Fix several 404 errors (#1162)
69378bc20 Update related.md
28c24e95f Add note on setting baseURL
7b1502c99 minor typo fix (#1180)
33abeb4fe Update related.md
4887563f6 Update js.md
ee5f1de2e Hugo 0.74.3
986ea0c8e releaser: Add release notes to /docs for release of 0.74.3
3299b44bd Fix Asciidoctor args
bcb950347 resources/js: Add option for setting bundle format
3f8324918 resources/js: Add es5 build target

git-subtree-dir: docs
git-subtree-split: 7297c1172754078511ac1c10ca0dfd4cab629506
asankah pushed a commit to asankah/hugo that referenced this issue Aug 16, 2020
7297c1172 Add note about caching for Hugo Pipes.
c91be3403 minor markdown, capitalization and spelling fixes (gohugoio#1183)
fd4a103bf Fix several 404 errors (gohugoio#1162)
69378bc20 Update related.md
28c24e95f Add note on setting baseURL
7b1502c99 minor typo fix (gohugoio#1180)
33abeb4fe Update related.md
4887563f6 Update js.md
ee5f1de2e Hugo 0.74.3
986ea0c8e releaser: Add release notes to /docs for release of 0.74.3
3299b44bd Fix Asciidoctor args
bcb950347 resources/js: Add option for setting bundle format
3f8324918 resources/js: Add es5 build target

git-subtree-dir: docs
git-subtree-split: 7297c1172754078511ac1c10ca0dfd4cab629506
@eugenioej
Copy link

@nicolinuxfr please add RemovePathAccents = true to your config to keep the behavior you want.

Thanks!!!! This is what I was looking for my site on spanish :)

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants