Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirHTML builder generates wrong canonical links #9730

Open
lexming opened this issue Oct 12, 2021 · 12 comments · Fixed by #9731
Open

DirHTML builder generates wrong canonical links #9730

lexming opened this issue Oct 12, 2021 · 12 comments · Fixed by #9731

Comments

@lexming
Copy link
Contributor

lexming commented Oct 12, 2021

Describe the bug

Using the dirhtml builder results in pages with canonical links of the form

<link rel="canonical" href="https://www.url.com/search.html" />

for the index document _website/search/index.html

How to Reproduce

  1. Set html_baseurl
html_baseurl = 'https://www.url.com'
  1. Build documentation using dirhtml builder
sphinx-build -b dirhtml . _website/

Expected behavior

The canonical link has either the form

<link rel="canonical" href="https://www.url.com/search/" />

or

<link rel="canonical" href="https://www.url.com/search/index.html" />

Your project

sphinx-canonical-links.zip

Screenshots

No response

OS

Unix

Python version

3.9

Sphinx version

4.x

Sphinx extensions

No response

Extra tools

No response

Additional context

sphinx-canonical-links.zip

@davidism
Copy link

davidism commented Nov 10, 2021

This is affecting the Pallets docs on Read the Docs. Apparently some browsers "Share Link" feature uses a canonical URL if it's available, so sharing links to docs shares invalid links. Is there anything I can do to help get this merged and released?

Also, somehow I didn't realize this has been generated by Sphinx since 1.8 in 2018, both Read the Docs and Pallets-Sphinx-Theme have code for generating this tag as well.

@davidism
Copy link

I added a workaround for this issue to Pallets-Sphinx-Themes. A hook will detect if the dirhtml builder is in use and if the pageurl value ends with .html and replace it with the correct URL. pallets/pallets-sphinx-themes#53

@benjaoming
Copy link
Contributor

A quick suggestion for fixing this bug:

Since a lot of projects and themes may have developed behavior based on how the currently undocumented pageurl context variable is implemented, it might be better to define and implement an additional context variable that is 100% intended to render the canonical URL?

For instance: tldr; I don't really know if pageurl is supposed to link to /search/index.html or /search/.. this might be useful for local builds (that don't have an HTTP server in front). I also don't know if (when using it for canonical URLs), it should contain the baseurl or if indeed we need a different canonical domain from baseurl. For instance, we might have CSP enabled which means having the current domain as baseurl for assets and hrefs and a different canonical domain for search engines to index.

Idea? 💡

Starting from scratch with 3 new variables might be easier to handle and customize canonical URLs?

  • canonical_baseurl (default value: baseurl)
  • canonical_pageurl (default value: similar to pageurl but with DirHTML fixes applied)
  • canonical_absolute_url is a helper function that combines canonical_baseurl and canonical_pageurl.
  • pageurl is fixed for DirHTML by setting it to /search/ and then it's added to documentation.

@lexming
Copy link
Contributor Author

lexming commented Jan 16, 2023

@benjaoming thanks for the suggestion, having a specific variable for the canonical URL can indeed be useful. Let me clarify though that the core of this issue is that the current pageurl with DirHTML is broken and points to a non-existing document. For instance, /search.html instead of /search or /search/index.html. So any developed behavior based on current pageurl is not being questioned, since it is not working.

@benjaoming
Copy link
Contributor

So any developed behavior based on current pageurl is not being questioned, since it is not working.

I wrote this, and I think it works... but not sure if we'll merge it nor if it's a recommendable approach https://github.com/readthedocs/sphinx_rtd_theme/blob/6beecce44759257ed584f2fdb0357f2aec0ad01a/sphinx_rtd_theme/layout.html#L62-L77

(afaict, it's also going to keep on working, even if pageurl becomes consistent with DirHTML builder)

@benjaoming
Copy link
Contributor

benjaoming commented Jan 19, 2023

@AA-Turner I would like to move forwards with a fix for this issue. What do you think about the approach to add new context variables specifically for canonical URLs as laid out in #9730 (comment) ? It would be nice to have some level of assurance that the approach is well-understood before implementing it.

@angelinekwan
Copy link

angelinekwan commented Jan 21, 2023

@benjaoming Referencing to your suggestion, since my project is not using the rtd theme but furo theme, I tried to apply the same check to the _templates/base.html under {%- block extrahead %} . I was expecting this to overwrite the existing canonical tag but it adds another canonical tag resulting in duplicate tag.

Do you have any advice? What I want to achieve is remove the default canonical and set my own in the base.html to render in each page under <head> but I can't remove html_baseurl because the sphinx-sitemap extension is using it.

Thanks in advance.

@benjaoming
Copy link
Contributor

@angelinekwan The reason why we're doing it in the template of sphinx-rtd-theme is... well it's complicated 😇 But if given the choice, it's better to do create new context variables like @davidism's approach => #9730 (comment)

I'm not sure why you'd end up with duplicate tags, sounds like a template block isn't being overwritten as intended... but if it's possible to fix the existing context variable, you wouldn't have to manipulate the template. Sorry, if it's not helpful.

It's worrying that both template developers and project owners are doing workarounds for this bug, it really needs to be addressed upstream. Would appreciate if a Sphinx developer can give some sort of endorsement to the suggested approach?

@angelinekwan
Copy link

I think it ended up duplicate tags because the overwrite is not in the header template file which has the default canonical tag - I don't think i can access it. Agree this issue should be fix on the upstream... Thanks for checking!

kai687 added a commit to kai687/sphinxawesome-theme that referenced this issue Apr 16, 2023
Upstream Sphinx doesn't construct canonical links correctly for the
DirectoryHTMLBuilder.
When I introduced #650, it included a bug for `index.html` pages.

This PR adds a proper fix with tests this time to make sure that
canonical links are what they should be. This PR is a follow up of #1258
and reverts the change in `layout.html`. Checking the canonical link
shouldn't be done in the template.

It's reported here:
[#9730](sphinx-doc/sphinx#9730)
@AA-Turner AA-Turner reopened this Oct 5, 2024
@AA-Turner
Copy link
Member

Starting from scratch with 3 new variables might be easier to handle and customize canonical URLs?

* `canonical_baseurl` (default value: `baseurl`)

* `canonical_pageurl` (default value: similar to `pageurl` but with DirHTML fixes applied)

* `canonical_absolute_url` is a helper function that combines `canonical_baseurl` and `canonical_pageurl`.

* `pageurl` is fixed for DirHTML by setting it to `/search/` and then it's added to documentation.

@benjaoming @lexming do you still support this approach? If so I would accept a PR for it.

A

@lexming
Copy link
Contributor Author

lexming commented Oct 7, 2024

Thanks for the update on this @AA-Turner . I'm not really interested in that fine-grained customization of canonical URLs though. I guess it can be useful to some, but the original purpose of this issue was about generating non-existent canonical URLs, which is already fixed. So I suggest to open a different issue for those customization options.

@benjaoming
Copy link
Contributor

hey! I'm gonna have to ghost this, as I'm not able to find time and get far enough into the substance again right now 🙈

But I think the original comment still stands, a good solution to the problem would be to start from scratch with new variables & definitions. You won't get bogged down by all the hacks that are floating around for the old variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants