Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of Changes Made
This PR makes 2 notable changes:
Serve simpler 404 pages when possible
Our 404 page is a fancy HTML page, comprised of multiple templates, and requiring a number of DB queries to create (not many queries, granted). If a person in a browser loads a page, we want to show them this "fancy" 404 page for a better user experience. However, if the request shouldn't return HTML (eg it's a missing static file) or user never asked for HTML, we shouldn't spend the time creating a fancy 404 page if it's never going to be viewed.
Instead, when possible, we show a simplified HTML page, which just contains text. This requires much fewer resources to generate, and is quicker to serve.
Cache 404 pages
This one might be controversial. 😬
If a page returns a 404, chances are it'll still be a 404 in 10 minutes time, or even longer. Therefore, it's probably something which can be cached to reduce system load.
According to RFC2616, 404s should not be cached. However, for our use case, I think it's worth it. The TTL is intentionally shorter than it probably could be, but this could be increased in future.
In Wagtail, a request will always do a database query. Potentially multiple depending on how much of the path does exist. Therefore, missing pages can result in higher than expected usage, and won't be cached by an edge cache. Worse still, because the 404 pages usually shown are fancy HTML versions, they may do queries in themselves (for eg navigation), making 404s more expensive still.
By caching the 404, we reduce the impact on users viewing it in future, especially useful if a site is being crawled, as many frontend caches will normalise URLs before caching (ours sure does).
If a 404 has been cached, and a page is created in its place, Wagtail's existing frontend caching will purge the 404s cache during publishing.
Related reading:
How to Test
This can be tested in the browser, by confirming the correct 404 is shown. The unit tests give a few useful examples. Similarly,
curl
can be used to manually exercise the header.Note: If no
Accept
header is passed, Django assumes*/*
.MR Checklist
Unit tests
Documentation
Browser testing
Data protection
Accessibility
Sustainability
Pattern library
I've upstreamed some helper methods which would make this kind of content negotiation much simpler in future: django/django#18415