Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changing behaviour of requests to /.well-known/traffic-advice to stop 404 log entries #17

Open
adamseabrook opened this issue Jun 9, 2021 · 6 comments

Comments

@adamseabrook
Copy link

Semi-related to #16

We monitor specifically for requests from Google IP ranges (specifically 66.*) that get anything other than a 200 response. This helps us identify issues where a page in our CMS may have got unpublished or someone changed a slug without adding a 301 and the GoogleBot is now running into 404s or other issues.

In the past 7 days we got 3,290 404s for https://www.betterteam.com/.well-known/traffic-advice file which triggered a number of alerts. I know we can ignore this directory entirely but it would be ideal if another method could be found which does not cause 404s.

404s also bypass our cache which then hits our origin server with extra requests.

We added the missing file to https://www.betterteam.com/.well-known/traffic-advice so it now gets a 200.

@buettner
Copy link
Owner

buettner commented Jun 9, 2021

Sorry this is causing you trouble.

FWIW, the volume of requests should go down soon as we're implementing another caching layer.

Two questions:

  1. Would it be easier for you to add a field to your DNS entry instead of using traffic-advice to control prefetching? We considered this, but based on other feedback that modifying DNS records is often hard for developers but adding a file is easy, we settled on the traffic-advice approach.
  2. Were there challenges to constructing and adding the traffic-advice file? Remove the requirement for a specialised MIME type and rename the file to have .json extension #16 suggested there can be challenges, and we'd like to know if that is the common case.

@adamseabrook
Copy link
Author

I think either a DNS entry or adding something to the <head> section or server headers would make the most sense. Headers or head section will also mean all the SEO plugins like Yoast can have an option added to turn this off or on like they do with other robots related things:
https://wordpress.org/support/plugin/wordpress-seo/ (Wordpress)
https://plugins.craftcms.com/sprout-seo (Craft)
https://github.com/nystudio107/craft-seomatic (Craft)

Adding a file with a custom mime type I think will be beyond your average user (I am not sure they will even care about this though). The users that do care about this are probably not going to be too happy about any process that requires actual development as one of the comments mentioned in #16 DNS entry would be the easiest but wont give you page level control.

For us it was easy as we just added it as an advanced response in Fastly (see below). I took a quick look in Cloudflare and could not see any way to create a response there with a custom mime type.

image

@buettner
Copy link
Owner

The limitation with <head> is that the proxy can't see the page content, only the browser can. The proxy can fetch traffic-advice and cache it to stop prefetch traffic from reaching the site.

I'm happy to see that Fastly supports this well at least.

The tension here is that we should follow best practices. The /.well-known URL RFC says that a "good practice" is "Using an application-specific media type in the Content-Type header field, and requiring clients to fail if it is not used", and the W3C Web Platform Design Principles states, "Always define a corresponding MIME type and extend existing APIs to support this type for any new data format."

@robrwo
Copy link

robrwo commented Oct 26, 2021

Adding a DNS entry is not available to most developers. It also requires an additional skillset that not every developer has.

As for the application-specific media type, I would point out that there are several well-known URLs with the .json extension, and others with the .txt extension.

@Pino4
Copy link

Pino4 commented May 12, 2022

At 1 of the largest hosters in the world (SiteGround) it is not possible on all hosting plans to set the MIME type in the .well-known folder. Siteground's response:
_"the .well-known folder has a separate configuration in nginx so you would not be able to change the MIME type of any files within. _Our system has a unified setup on all servers and we cannot exclude the .well-known folder from Nginx or add any custom rules for it. This folder is used for various internal checks and SSL verification files.

I am afraid that this Private Prefetch Proxy option is not compatible with our servers at the moment."

@jeremyroman
Copy link
Contributor

Thanks for sharing that.

As noted, the .well-known directory is sometimes specially managed because it's used for other potentially sensitive origin-wide features, like TLS certificate issuance (ACME HTTP-01), with similar needs to be assured of being presented by the site owner.

Your hosting provider could support this in the future by either serving the traffic advice themselves (and giving customers some UI affordance for controlling it, possibly even dynamically) or by rewriting it internally (not by serving a redirect) to some URL that customers do have configuration control over, such as:

location = /.well-known/traffic-advice {
  rewrite ^/\.well-known/traffic-advice$ /.some-other-path/traffic-advice;
}

Of course, neither of these helps you immediately and it's useful to know that this is a barrier to some.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants