Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CanonicalURLMiddleware should support trailing slash redirection #8882

Closed
dnsl48 opened this issue Mar 26, 2019 · 27 comments
Closed

CanonicalURLMiddleware should support trailing slash redirection #8882

dnsl48 opened this issue Mar 26, 2019 · 27 comments

Comments

@dnsl48
Copy link
Contributor

dnsl48 commented Mar 26, 2019

Affected Version

4.x

Description

CanonicalURLMiddleware should be able to force redirection to URLs with trailing slashes in case it's not ending with a file extension.

@emteknetnz
Copy link
Member

Few notes:

  • I believe that the trailing slash version is slightly more correct i.e. /some/page/ vs /some/page
  • It seems that trailing slashes suit the concept of nested pages a lot nicer
  • Would be good to globally default to the trailing slash version, though maybe allow admins to globally change this to a non-trailing slash version (though also maybe better to not allow this)
  • Should avoid 301's for redirection due to number of things that can go wrong, recommend you simply use a 302

@michalkleiner
Copy link
Contributor

Even though there's a module for this https://github.com/axllent/silverstripe-trailing-slash, I'd support a move to have that as the standard from SS 4.4 or SS 5.

@chillu
Copy link
Member

chillu commented Apr 9, 2019

Google says that it's OK to serve identital content under both URLs, although you can improve crawl efficiency by redirecting (and it's best practice). There's other layers this can be fixed in as well (webserver). So I'm marking this as an impact/low enhancement. Meaning: If there's no takers to send a pull request soon, the issue will likely be closed.

@brynwhyman
Copy link
Contributor

brynwhyman commented Dec 10, 2020

Looks like that Google article might have been a bit old, these duplicate URLs can lead to "unwanted behaviour". We're expecting this to be causing issues with google advertising bids (among other things). From Google:

If you don't explicitly tell Google which URL is canonical, Google will make the choice for you, or might consider them both of equal weight, which might lead to unwanted behavior, as explained below in Why should I choose a canonical URL?

Does anyone have further feedback on @emteknetnz's comment before I define some ACs for this following those points?

@emteknetnz
Copy link
Member

emteknetnz commented Dec 10, 2020

Guess we'll need to decide a couple of things. For the solution we go with it's probably either:
a) Use a canonical tag
b) Use a 302 redirect
c) Both?

And secondly:
a) Should the trailing slash be the default, OR
b) Should the no trailing slash be the default

We should research both of these and discuss before deciding on either

@michalkleiner
Copy link
Contributor

michalkleiner commented Dec 10, 2020

c) both and a) trailing slash default

@brynwhyman
Copy link
Contributor

brynwhyman commented Dec 11, 2020

There's a few other cases to consider that this (awesome) module axllent/silverstripe-trailing-slash helps call out, including:

  • When not to redirect, i.e example.com/contact.html & example.com/contact?test
  • Only act on $_GET requests as not to interfere with any posted data
  • Create additional "ignore_paths" by creating a yaml config
  • Backwards compatibility support for projects that have implemented their own redirects (especially if they're using a different default to what's set here)

@chillu
Copy link
Member

chillu commented Jan 4, 2021

I think this needs to go into 4.x to be useful. And of the three options @emteknetnz outlined, I only see b) as something we can do in a minor release.

With a) and c), a canonical tag default would need to be injected via $MetaTags in the template. Not everyone keeps this marker in their template, and many custom templates would already have their own canonical link markup. Which would lead to duplicate (and potentially different) canonical tags. A common use case is to declare a main domain when showing content on other vanity domains.

We could add <link rel=canonical href=$RelativeLink> to our default themes, but that won't apply to any existing sites. I would also be careful with $AbsoluteLink (incl. the domain) since that can mess with static publishing (serving files under multiple domains, production domain aliases, etc).

So in conclusion, canonical tags are a bit of a minefield for us as a framework. I'd focus this issue on trailing slash redirections. @michalkleiner Do you want to have a go at a pull request for this?

@chillu
Copy link
Member

chillu commented Jan 4, 2021

On HTTP codes, I believe we should default this to a 301 Permanent rather than a 302 Temporary, since that's the Google recommendation: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls. It also avoids increasing crawler activity, since crawlers would always send two requests with a 302 Temporary. We'd have to be pretty certain that it doesn't mess up the URL (keeps query params and hashes intact).

@chillu
Copy link
Member

chillu commented Jan 4, 2021

Just realised that you can set canonical via a HTTP header as well: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls#expandable-7. That has the same issues though - we can't tell if a custom template already makes use of <link rel=canonical>, at least without string matching the whole HTML response (which I'd rather avoid).

Google recommends absolute paths in canonical (incl. domain). We could only build this into our default templates (<link> tag) or response (rel="canonical") if we forced devs to configure their "main" domain in every Silverstripe installation. At the moment this is optional, and only required for subsites. Given how easy it is to add a <link> tag to your own template, I don't think it's worth the framework pushing itself into the middle of this.

image

@sminnee
Copy link
Member

sminnee commented Jan 13, 2021

Personally, I like URLs without trailing slashes. I wonder if approaching this as an optional feature that can either force the addition or removal of slashes makes sense?

I would note that if you have 302 redirects, it's not clear to me what additional value the canonical meta tag will provide - there will never be an incorrect URL that actually returns HTML!

Introducing it as optional in the first instance would at least be a way of testing it in production before forcing it on people...?

@chillu
Copy link
Member

chillu commented Jan 13, 2021

I'm ambivalent about URLS with or without trailing slashes. There's a slight argument for trailing slashes in a system which generally produces deeply nested URL paths. I don't think Google recommends either way as the "right" one.

If it's an optional feature, we've got that use case covered in the module already IMHO. I'm happy with a phased rollout in order to work through the kinks (e.g. a feature flag in 4.8, default in 4.9). But unless the end goal is a default setting, this is perfectly fine in a module.

As an aside, as an ecosystem we're not very good at surfacing those optional best practices. The addons site lists "popular" modules, but it's still hard to figure out what you should add to your own project starter (kinglozzer/metatitle is another good example). Separate discussioni, but worth mentioning. Adding features and defaults to core should be a last resort, because it increases the maintenance surface for a very small team of core maintainers.

@emteknetnz
Copy link
Member

emteknetnz commented Jan 13, 2021

Would pretty strongly recommend against 301's as a default and default to 302's instead. While 301's are better for SEO, 301's get cached in browsers which means there's no central mechanism available to undo them if you do something wrong. I've seen it go wrong and customers really don't like it :-/

302's are reversible so it allows:

  • CMS configuration where you toggle trailing/non-trailing slashses
  • regular silverstripe functionality such as renaming page urls e.g. moving a page in the site tree

Still worth including 301's as an option in a drop-down with a text warning, so people who feel confident can still easily do this

@chillu
Copy link
Member

chillu commented Jan 13, 2021

Sorry but I disagree. Google recommends 301s, and they don't cause duplicate requests that impact server performance/capacity. If we're unsure that we can introduce this in a minor release as a new default with a 301, I'd rather keep it as a module.

I'm very aware that you need to get 301s right if you're sending those with cache headers for the browser. But in the specific case of 301s for slashes, what's the scenario where this could cause issues for customers?

The main point of contention for any new default behaviour (301 or 302, slashes or no slashes) is that a developer has already configured it one way or another on the webserver level (CDN, nginx config or .htaccess). This might cause infinite redirection loops, e.g. if we decide to redirect to slashes, but the dev removes the slash in .htaccess. We can detect the Referer header to break that loop?

@emteknetnz
Copy link
Member

emteknetnz commented Jan 13, 2021

Google recommends 301s, and they don't cause duplicate requests that impact server performance/capacity

I agree that 301's have definite benefits, namely:

  • Server performance (less HTTP requests)
  • Browser performance (less HTTP requests to wait for a response from)
  • SEO (though this is hard/impossible to quantify)

However they come with the downsides of:

  • less flexibility / ability to change things later
  • they can break your site if they're mis-configured

I see it as a tradeoff between performance and ease-of-use. As a general rule of thumb, ease-of-use tends to be a better thing to design for rather than performance and causes less headaches.

But in the specific case of 301s for slashes, what's the scenario where this could cause issues for customers?

If we allow configuration in the module to let customers toggle between non-slash and slash. Even if we don't allow toggling and we simply say "this is the correct way", I'd still expect some customers to ask their developers to change from this way to other other because their customer (or their SEO specialist, or marketing specialist, or just the developer) thinks it's the correct/prettier way of doing it. If there's an existing browser cached 301 then it'll cause an infinite loop for browsers that have it cached once it's toggled.

Still, I'm not saying don't allow 301's, I'm just saying make it the responsibility of the end user to switch it on and now they are responsible for their site getting stuck in infinite loops if they do it wrong. However if we make 301's the default now WE are responsible for their site being stuck in an infinite loop because we did not consider the 50 different edge cases.

Running composer update and now prod is broken is a pretty bad user experience after all ;-)

This might cause infinite redirection loops, e.g. if we decide to redirect to slashes, but the dev removes the slash in .htaccess .. we can detect the Referer header to break that loop?

How would we do that? Would that work in every infrastructure configuration?

@chillu
Copy link
Member

chillu commented Jan 14, 2021

OK, I think we have the following options:

  • Option A: Enable by default, 301 redirects.
  • Option B: Enable by default, 302 redirects.
  • Option C: Opt-in in core, with devs deciding on 301 vs. 302 redirects
  • Option D: Keep as a current module functionality

How widespread is this issue of non-canonical URLs being indexed by search engines actually? In a site that uses the current 4.x defaults, generated links will contain a slash. As long as that's the only way links are created and crawled by search engines, there's no issue. If the framework generates links without that slash, then we should fix that inconsistency, which doesn't require us to build any redirection into core.

How about we just ensure the framework generates URLs consistently (e.g. in SiteTree->Link()), make that configurable (slash or no slash), and then call it a day? If you really care about SEO and are afraid that somehow outside of framework, search engines will pick up on non-canonical URLs, then ... there's a module for that.

@emteknetnz
Copy link
Member

There's also another option

Option E: Similar to what was intended with mysql utf8mb4 where it was enabled by default for new projects and opt-in for existing projects.

Option C: Opt-in in core, with devs deciding on 301 vs. 302 redirects

My concern with is this would be that this will likely result in low uptake as I'm assuming that most devs and site-owners won't be aware this is even a thing

@chillu
Copy link
Member

chillu commented Jan 14, 2021

OK, I'll wait for @silverstripe/core-team to weigh in on this for a few days, but my vote goes to "Option D: Keep as a current module functionality", and I'm planning to close this issue. If anyone feels strongly about this, I would recommend that they offer to write an SEO focused blog post on ss.org which flags this module as a potential default addition for custom project starter kits. And add it to their own project starter kit. I'm open to ideas on how we can promote these better as well.

@dhensby
Copy link
Contributor

dhensby commented Jan 15, 2021

First off, I really hate how much collective time this relatively unimportant vanity issue takes up across all software/website projects in the world - I find it unbelievable people are worrying about such a mundane problem that for almost all sites will have no meaningful impact whatsoever and the additional redirects people propose will actively hurt their site's SEO performance (TTFB is increased for no user benefit - any changes intended solely for search engines' benefit is not good SEO and users don't care if your URLs end in slashes or not or both).

To address some of the points:

  1. Redirects (if used) should be 301. If we can't make them work reliably and so have to use 302s to cover that, then we shouldn't release the feature.
  2. Canonical tags act like 301s as far as SEs are concerned, so that is perfectly acceptable result; as @chillu points out, we can't really inject this into $MetaTags because lots of template creators will already have added the canonical tags and some may not even use the $MetaTags template variable.
  3. Again, as @chillu points out, we should have more consistent URL generation rather than forcing a middleware to issue excessive redirects
  4. People who care about this will be using solutions outside the framework - this really isn't a framework related issue.

Overall I agree with @chillu, leave this as functionality in a separate module; it seems like a lot more edge-cases and features have been considered and implemented there, we don't want to increase the maintenance burden on the core team for a vanity feature, and (if this is really a big deal) it would actually be more correct for us to serve 404s on one of the options.

@dhensby dhensby closed this as completed Jan 15, 2021
@dhensby dhensby reopened this Jan 15, 2021
@chillu
Copy link
Member

chillu commented Jan 18, 2021

Thanks for the validation and summary, Dan! Closing this now.

@kinglozzer
Copy link
Member

I agree that this is just a vanity issue but it’s something that keeps coming up - we get bugged about it from clients who want “Twitter-style” URLs without trailing slashes (why they think Twitter invented it I have no idea!). I also agree that redirects should be left off the table - they’re too risky - but I do think we could look at fixing up our URL generation.

Would anyone object to passing all framework-generated links through some form of “helper” class? I’m thinking places like SiteTree::Link(), and even Controller::join_links(), could return VanityURLRewriter::rewrite($link);. It would make it much easier to encapsulate the 3 behaviours I think we need to support:

  • "Legacy" mode, default for existing projects, where the rewriter would just return $link;
  • "Remove slashes" mode, where the rewriter would return rtrim($link, '/');
  • "Force slashes" mode, where the rewriter would return rtrim($link, '/') . '/';

I might be oversimplifying slightly because we’d need to ensure asset URLs aren’t rewritten (hopefully just a case of choosing not to pass them through the rewriter), but I think that approach would avoid ad-hoc implementations around core and would help guarantee backward compatibility if we do decide to try to adjust our URL generation.

@dhensby
Copy link
Contributor

dhensby commented Jan 28, 2021

I don't know that we need an extra function to run links through, maybe we just need a central link generator?

The issue isn't just the links that are generated, but the URLs the application accepts. Should SS route /test/ the same as /test? Stricter engines would not.

Dealing with assets should be fairly straightforward... Just refuse to add trailing slashes to URLs that have a . in them

@chillu
Copy link
Member

chillu commented Jan 28, 2021

Which links generated by core are realistically affected by such a helper method? You can already use an update hook on SiteTree::Link(), and asset links are already meant to be excluded from this logic. Nobody cares about this in CMS links, right? I'm in favour of keeping things simple here.

@xini
Copy link
Contributor

xini commented Jan 28, 2021

Sorry to keep going with this topic. I wasn't aware of this discussion when I submitted silverstripe/silverstripe-cms#2631.

The intent of my PR is to create more consistent URLs across a site.

As far as I know, only SiteTree ads a trailing slash to its links, and only if no action is present.

As pointed out in my PR Controller::join_links() should also be simplified to make sure there are no trailing slashes.

Regarding routing, I think SS should still route /test/ the same as /test. It's a simple fact that in SS they are the same.

re what @chillu mentioned, unfortunately the update hook in SiteTree::RelativeLink() is called before the trailing slash is added. Of course this can be overwritten in Link() and AbsoluteLink(). But just saying.. ;)

@xini
Copy link
Contributor

xini commented Jan 28, 2021

Also, some classes don't use Controller::join_links() at all, e.g. https://github.com/silverstripe/silverstripe-framework/blob/4/src/ORM/PaginatedList.php#L446-L453. They use the current request, which never has a trailing slash.

@kinglozzer
Copy link
Member

Yeah maybe I’m trying to over-engineer this, my main concerns are that we make any changes to URL-generation opt-in and that we can guarantee backward compatibility if you choose not to. I’m also not keen on us deciding “no trailing slash is correct” on behalf of everyone (though it’s arguably better than the status-quo). I don’t see any nice way of achieving that without abstracting this out somewhere new, or we’re going to end up with a bunch of duplicated logic everywhere we generate links.

Also, some classes don't use Controller::join_links() at all

Even those that do use Controller::join_links() still generate inconsistent URLs: some modules add a trailing slash to the arguments, some modules don’t. I think we’d ideally also adjust join_links() and add the same logic.

Perhaps a slightly more generic URLGenerator class would be a more worthwhile addition? We could move other methods like Controller::join_links() and HTTP::setGetVar() into it. I know we don’t want to take on any significant maintenance burden, but it seems to me like it’d be a fairly simple class to maintain.

I don’t mind “owning” the task of implementing this if it gets a vote of confidence, though I’m not sure when I’ll get round to it as I have another big open source job on my plate at the moment. If anyone else wants to take it on instead that’s great - I just want to clarify that I’d be willing to do it, as I know I have a habit of proposing solutions that other people have to spend time implementing 😉

@chillu
Copy link
Member

chillu commented Feb 2, 2021

I don't mind too much either way - a URLGenerator class would sure be handy. But the devil is in the details: Are you also updating docs and recommending usage of the new class? How about writing release notes and upgrade guides? Ensuring that the "core" modules (and potentially Supported Modules) use the new API? If you are happy taking this change through those steps, I'm happy to support it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants