-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
send longer cache headers by default #1725
Conversation
Generated by 🚫 dangerJS |
10 Minutes sounds like a good starting point, would definitely be better if in the future we could set a default per badge. Looks like there was some test in the past with caching the Would be good to also keep the I think i remember @paulmelnikow mentioning an variable that is set on the production server when working with the if (longCache) {
outQueryParams.maxAge = '2592000';
- }
+ } else {
+ outQueryParams.maxAge = '0';
+ } |
Cheers. I've made some updates based on your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, hopefully this will take some stress off the servers.
lib/request-handler.spec.js
Outdated
beforeEach(function (done) { | ||
camp = Camp.start({ port: config.port, hostname: '::' }); | ||
camp.on('listening', () => done()); | ||
initialLongCache = process.env.LONG_CACHE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be reset before each test rather than just setting initially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to set it back to the global/original value after each test. If you move this from beforeEach()
/afterEach()
to beforeAll()
/afterAll()
, some of the tests are going to be running with LONG_CACHE
set to whatever value we set it to in the most recent test instead of the default value for the test environment (I realise at the moment this is probably undefined
, but in principle this might change). This could introduce an unexpected behaviour. It is better if all tests in the suite inherit the default value for LONG_CACHE
consistently. If a particular test needs it to be set to a particular value, we do it explicitly in that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it makes sense to reset process.env.LONG_CACHE = initialLongCache
after/beforeEach()
,
But couldn't we set const initialLongCache = process.env.LONG_CACHE
,
which would reset LONG_CACHE
to its initial value after/before each test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now. Updated in a9d06bd
Since this is quite a major change, I'm going to leave this open for a bit and see if @espadrine or @paulmelnikow want to comment on this.. |
I fully agree with this.
And this is a great idea. I think all env variables are defined here. What do you think about adding a new badge with the current server date as a value? I think it would be useful in debugging/testing caching. |
Maybe we should have an option to change cache value more often? I see several options for this:
|
All of that sounds like where we should aim for. Do you think we need to implement variable cache lengths for different badges as an opening gambit? Maybe another good question here would be "does setting I've not put a huge amount of thought into how to implement this yet, but this is my intial thought: We're currently straddling 2 architectures - we've got a small handful of integrations packaged as neat classes, and we've also got a huge legacy of badges defined in There is probably going to need to be some kind of transition period and a sensible default to fall back on.
That sounds like a good idea. I'll add that to this PR over the weekend
Think the rest of these points need feedback from Paul or Thaddée. FWIW I'd also like to see us deploy more often and I am willing to put work in to help that happen if possible. Simultaneously that probably should not be a predicate for making this particular change. |
This gives us a category for badges like 'flip' or 'servertime' for debug purposes which is excluded from the examples.
to help with cache header debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, well done!
I know we've been waiting for more opinions before merging this, but I would personally be inclined to move forward as quickly as possible. Our service is really not in a good shape at the moment (see #1568), and I'm concerned we'll have to wait months and months if we miss next deployment. If we merge, we could see how this behaves on our staging environment, build more confidence on this piece of work and promptly iterate if need-be. 😉
@espadrine @paulmelnikow |
Any update on this? There are times when badges work but most of the time they don't... |
Just a quick note for the moment to say that https://github.com/badges/ServerScript/blob/master/start-shields.sh Rather than overload I'd love to give this a more thoughtful review. I can do that in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @chris48s for testing this with the live copy of camo. It’s really valuable to know that it’s respecting the headers.
I have one thought which is the default maxAge seems too long. It’s fine for users who aren’t too concerned about freshness, or for values which don’t change very often, but since it could leave stale values in the cache for a long time, it doesn’t meet users’ expectation of responsiveness. I’m okay with encouraging longer maxAge settings when people prefer reliability to responsiveness but think the default should be lower, to meet developers expectations of troubleshooting, debugging, and catching changes in a reasonable amount of time. Part of why we use badges is to gamify our work, and if I’ve just fixed the build or published a new npm version, it’s really satisfying to see the updated badge after a few refreshes, without waiting several minutes.
Given how much work went into fast updates, and after reading some of the various discussions in other projects citing this project’s early work in #221, I’m reluctant to dramatically alter that trade off.
I realize the problem we’re trying to fix is far worse. However this should make a big impact on the current situation. Upon refreshing the page, badges won’t disappear until expiration (after they’ve appeared once) whereas before this change camo starts from scratch on every page load.
I think two things our supplement this change really well:
- A big request cache in a cloud key–value store
- Amped up hosting (i.e. scaling out)
lib/all-badge-examples.js
Outdated
@@ -2219,6 +2219,9 @@ function findCategory(wantedCategory) { | |||
|
|||
function loadExamples() { | |||
loadServiceClasses().forEach(ServiceClass => { | |||
if (ServiceClass.category === 'debug') { | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave a comment in the code about the intent of this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Perhaps it should check an env var like ENABLE_DEBUG_SERVICES
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but can we replace it with this?
if (!ServiceClass.examples) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels like a good check to make, but perhaps it should fail a test or issue a warning, unless category is debug
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally if we wanted to create a badge but not add it to the examples page (for some reason), it would have been sufficient to define:
static get examples() {
return [];
}
The reason I have specifically added a check here is because I've defined this badge with a category 'debug' which doesn't exist in the examples. If we remove that check, the next line const category = findCategory(ServiceClass.category);
throws Error: Unknown category debug referenced in Time
. If I had (wrongly IMO) defined this badge as being in some other category like 'downloads', just returning []
from examples()
would be fine and I wouldn't have added this check. We want some way to ensure we don't call findCategory()
on a category that doesn't exist.
It is probably also useful if:
- There is some way to explicitly declare that particular badges should be excluded from the examples (we also do this with
flip
, for example) - loadExamples() throws an exception if
example()
is not implemented at all on some class, as in most cases that probably represents an error or oversight.
I don't have particularly strong opinions on the structure or convention we choose to to do that. Thoughts? (Note that []
is 'truthy' in javascript.)
Perhaps it should check an env var like ENABLE_DEBUG_SERVICES
Remember we're assembling examples for the front-end here. I'm not sure it is too important that it is configurable whether we show this debug badge on the examples page (presumably you'd want that on in local/dev if anywhere)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! It didn't register that this was in loadExamples
.
Could you move the check into the block right below?
i.e.
if (category === undefined) {
if (ServiceClass.category === 'debug') {
return;
} else {
throw …
}
}
lib/request-handler.js
Outdated
@@ -79,13 +80,18 @@ function handleRequest (makeBadge, handlerOptions) { | |||
return (queryParams, match, end, ask) => { | |||
const reqTime = new Date(); | |||
|
|||
let maxAge = envFlag(process.env.LONG_CACHE, false) ? 900 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making maxAge a handler option? That would simplify testing. Also I’d suggest we make the environment variable BADGE_MAX_AGE_SECONDS
which would facilitate granular config changes without code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before default cache value (900 seconds -> 15 min) is too long in my opinion. Anything > 0 have a change to improve performance. Let's start with a really low value, e.g. 30s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d suggest we make the environment variable
BADGE_MAX_AGE_SECONDS
which would facilitate granular config changes without code changes.
👍 Done.
As I said before default cache value (900 seconds -> 15 min) is too long in my opinion.
Making this change usefully abstracts the issue of what BADGE_MAX_AGE_SECONDS
should be set to in production. If its not set, we default to 0 (sensible default for dev, staging or self-hosting). The source code itself is now not opinionated on this issue. Discussion of what value BADGE_MAX_AGE_SECONDS
should be set to on our production servers can occur in a pull request to the badges/ServerScript repo modifying https://github.com/badges/ServerScript/blob/master/start-shields.sh once this code change is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. There's one caveat, which is there's been some discussion of me attempting a deploy without logs or shell access: #1742 (comment). I do have the ability to access the remote repo where the code is, though I am not sure where to find ServerScript.
lib/request-handler.js
Outdated
// Cache management - no cache, so it won't be cached by GitHub's CDN. | ||
// always let queryParams.maxAge override the default | ||
// regardless of env.LONG_CACHE setting | ||
maxAge = queryParams.maxAge; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should maxAge
be the floor, at least until we have a use case that demands it? I’d hate to have everyone tuning this to a lower number on their own if we can centralize the tuning instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should limit the minimum maxAge
for now,
maybe once we have a maxAge per type of badge.
We currently have no cache and I think that the few that will know maxAge=0
is an option shouldn't affect the service too much.
(I would assume most people just leave the badges default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a really strong opinion either way on this - I'd be OK with either in principle.
If we set the cache too long and loads of users start adding ?maxAge=0
to work around it, that is going to be really hard to mop up once we've tuned it better as the problem becomes distributed. That leads me towards using a floor.
However one concern I would have with making the default the floor would be this scenario:
We set the default to 300 secs
As a user I call mybadge.svg?maxAge=600
and get back the header Cache-Control: max-age=600
Then I call mybadge.svg?maxAge=60
and get back the header Cache-Control: max-age=300
That's very unexpected behaviour that will manifest in a really subtle way. As an end user I'd be going out of my mind trying to debug that once I noticed it.. Is there a way we could make that scenario less confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set the cache too long and loads of users start adding
?maxAge=0
to work around it, that is going to be really hard to mop up once we've tuned it better as the problem becomes distributed. That leads me towards using a floor.
That's the situation I'm concerned about, too.
That's very unexpected behaviour that will manifest in a really subtle way. As an end user I'd be going out of my mind trying to debug that once I noticed it.. Is there a way we could make that scenario less confusing?
Two less confusing options when the maxAge parameter is out of the configured range could be:
- Return an error. Clear, but heavy handed and fairly unfriendly.
- Return a 302 temporary redirect. At least if people are trying to debug and inspect closely what's happening, use
curl
, etc., they will see it very clearly. This makes two requests instead of one, which isn't great.
Something I could ask, which might help us decide: what is the use case for setting maxAge
to a small value? Is it so a developer can debug a badge that seems slow to update, or is it because we want to let them opt out of our caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option could be two config settings: a default maxAge and a minimum maxAge. I'm leaning toward having a really small default but if down the line we decided to bump it up to 300 sec e.g., I could see letting people reduce it to a reasonable lower number like 30 or 60. Probably not zero though.
When both numbers are zero we end up thinking in circles. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I could ask, which might help us decide: what is the use case for setting
maxAge
to a small value?
Yes - this is a good question to think about.. As things stand right now, the default is no cache so the only reason someone is going to be using this right now is to set a long cache. No user is using this right now expecting it to make the cache shorter, so modifying the feature so that it can only make the cache longer (nor shorter) is broadly consistent with current expectations of what this feature is used for.
Also based on some highly scientific research (GitHub advanced search for "shields.io" "maxAge=" in markdown files), most of the usage of this in the wild is with large values (larger than the ones we're suggesting for a default), so we're unlikely to cause much/any unexpected behaviour by ignoring small values.
I think given that context, it is sensible not to allow users to manually set a shorter cache. In terms of signalling, lets start off by communicating this by documenting the limit and see what happens..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nice idea to document the limit. However the approach here won't work well, because it's pulling the value from the build-time environment instead of the server environment. If that setting changed post-deploy (which is what we're trying to accomplish by making this env-configurable), the front end wouldn't update. I suppose you could say "the server's configured default" which gives a hint that there is one…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. Lets keep it vague for now but at least provide a hint..
lib/request-handler.js
Outdated
// regardless of env.LONG_CACHE setting | ||
maxAge = queryParams.maxAge; | ||
} | ||
if (maxAge === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m a little confused whether maxAge is a number or a string at this point. Should we coerce the query param?
ask.res.setHeader('Expires', reqTime.toGMTString()); | ||
} else { | ||
ask.res.setHeader('Cache-Control', 'max-age=' + maxAge); | ||
ask.res.setHeader('Expires', new Date(+reqTime + maxAge * 1000).toGMTString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to set Expires along with Cache-Control: max-age? To my reading of the spec, Cache-Control: max-age takes priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explained here #1651 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that pointer!
From #1651 (comment):
It may be pointless in a perfect world but different implementations take different approach, there will always be someone not following the spec or maybe also not supporting Cache-Control but supporting Expires.
It seems like we're acting on a kind of superstition. It would be great to have an example of a client which does this, or a reliable report that this is true.
I'm fine leaving it in for now, though let's add a reference that stays with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Cache-Control header was defined as part of the HTTP/1.1 specification would be ignored by any HTTP client which implements the HTTP/1.0 standard but not HTTP/1.1. This is definitely an edge case and we should expect any modern web browser or downstream cache to implement HTTP/1.1 but we can't safely assume that only clients implementing HTTP/1.1 will connect to us. I've added a comment clarifying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining that!
lib/request-handler.spec.js
Outdated
beforeEach(function (done) { | ||
camp = Camp.start({ port: config.port, hostname: '::' }); | ||
camp.on('listening', () => done()); | ||
process.env.LONG_CACHE = initialLongCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this in an afterEach seems slightly safer. It’ll run even if there’s a crash in some of this camp setup.
services/time/time.js
Outdated
|
||
module.exports = class Time extends BaseService { | ||
|
||
async handle({library}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this key library
could be removed.
Thanks for additional comments. I've slightly forgotten where we got to with this and I will be away over the weekend so it might be next week before I get a chance to get back into it and push more commits/respond to comments, but I will follow it up. |
- avoid overloading LONG_CACHE - move confguration of cache length to deploy settings instead of source code - default to no cache for dev mode
I think I've now either fixed or replied to all of the new review comments |
@chris48s Thanks! Any thoughts on this one? #1725 (comment) |
Apparently I had not either fixed or replied to all of the new review comments 😄 have added a reply. |
Why not set minimum cache length of 5 / 10 minutes to "all" badges (and a minimum of 1 hour for the ones that doesn't really need to much updates like the license badge)? In my opinion the 0 value should be blocked in all cases. This PR is already deployed but the default is still 0. |
This was turned on in #1846 which just went live. |
Following on from conversation in #1723 (comment) ...
Having thought about this a bit, I think a default cache time of 1 hour is a bit long. For a badge like licence that we don't expect to change much, it is probably too short, but for something like a CI build status or coverage amount, I think an hour would be annoying as a default behaviour. In some cases, users switch to this service due to this behaviour (e.g: lemurheavy/coveralls-public#971 (comment) ). Maybe a good future improvement would be to define a default cache length for different types of badges?
My hypothesis is that even quite a short cache will help. Also, due to the infrequent nature of deploys, I think it would be better to start small and crank it up gradually if needed. If we introduce a long cache, annoy all our users and can't get a fix deployed for ~3 months, I think that is worse than if we introduce a short cache and it only improves performance very slightly and we have to wait ~3 months to bump it up a bit. What do you think @RedSparr0w ?
Having done some reading on this (e.g: github/markup#224 (comment) ) I was slightly nervous about camo respecting the headers, but I set up ngrok on my dev copy and tested this configuration with various URLs (http, https, with and without GET params etc) via GitHub's camo proxy using markdown files in gist. Camo respects the headers as expected in all cases so I think we can safely introduce this.
So far so good, but in its current state this will make development really annoying. What we want to do is use these headers in production and use
'no-cache, no-store, must-revalidate'
in dev, but I'm not sure about how the prod environment is configured. What's the best way to set setNODE_ENV
default safely on this project? Can anyone advise?