Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send longer cache headers by default #1725

Merged
merged 15 commits into from
Jul 20, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions lib/request-handler.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
'use strict';

const envFlag = require('node-env-flag');
// eslint-disable-next-line node/no-deprecated-api
const domain = require('domain');
const request = require('request');
Expand Down Expand Up @@ -79,13 +80,18 @@ function handleRequest (makeBadge, handlerOptions) {
return (queryParams, match, end, ask) => {
const reqTime = new Date();

let maxAge = envFlag(process.env.LONG_CACHE, false) ? 900 : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making maxAge a handler option? That would simplify testing. Also I’d suggest we make the environment variable BADGE_MAX_AGE_SECONDS which would facilitate granular config changes without code changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before default cache value (900 seconds -> 15 min) is too long in my opinion. Anything > 0 have a change to improve performance. Let's start with a really low value, e.g. 30s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d suggest we make the environment variable BADGE_MAX_AGE_SECONDS which would facilitate granular config changes without code changes.

👍 Done.

As I said before default cache value (900 seconds -> 15 min) is too long in my opinion.

Making this change usefully abstracts the issue of what BADGE_MAX_AGE_SECONDS should be set to in production. If its not set, we default to 0 (sensible default for dev, staging or self-hosting). The source code itself is now not opinionated on this issue. Discussion of what value BADGE_MAX_AGE_SECONDS should be set to on our production servers can occur in a pull request to the badges/ServerScript repo modifying https://github.com/badges/ServerScript/blob/master/start-shields.sh once this code change is merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. There's one caveat, which is there's been some discussion of me attempting a deploy without logs or shell access: #1742 (comment). I do have the ability to access the remote repo where the code is, though I am not sure where to find ServerScript.

if (queryParams.maxAge !== undefined && /^[0-9]+$/.test(queryParams.maxAge)) {
ask.res.setHeader('Cache-Control', 'max-age=' + queryParams.maxAge);
ask.res.setHeader('Expires', new Date(+reqTime + queryParams.maxAge * 1000).toGMTString());
} else {
// Cache management - no cache, so it won't be cached by GitHub's CDN.
// always let queryParams.maxAge override the default
// regardless of env.LONG_CACHE setting
maxAge = queryParams.maxAge;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maxAge be the floor, at least until we have a use case that demands it? I’d hate to have everyone tuning this to a lower number on their own if we can centralize the tuning instead.

Copy link
Member

@RedSparr0w RedSparr0w Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should limit the minimum maxAge for now,
maybe once we have a maxAge per type of badge.

We currently have no cache and I think that the few that will know maxAge=0 is an option shouldn't affect the service too much.
(I would assume most people just leave the badges default)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a really strong opinion either way on this - I'd be OK with either in principle.

If we set the cache too long and loads of users start adding ?maxAge=0 to work around it, that is going to be really hard to mop up once we've tuned it better as the problem becomes distributed. That leads me towards using a floor.

However one concern I would have with making the default the floor would be this scenario:

We set the default to 300 secs

As a user I call mybadge.svg?maxAge=600 and get back the header Cache-Control: max-age=600
Then I call mybadge.svg?maxAge=60 and get back the header Cache-Control: max-age=300

That's very unexpected behaviour that will manifest in a really subtle way. As an end user I'd be going out of my mind trying to debug that once I noticed it.. Is there a way we could make that scenario less confusing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set the cache too long and loads of users start adding ?maxAge=0 to work around it, that is going to be really hard to mop up once we've tuned it better as the problem becomes distributed. That leads me towards using a floor.

That's the situation I'm concerned about, too.

That's very unexpected behaviour that will manifest in a really subtle way. As an end user I'd be going out of my mind trying to debug that once I noticed it.. Is there a way we could make that scenario less confusing?

Two less confusing options when the maxAge parameter is out of the configured range could be:

  • Return an error. Clear, but heavy handed and fairly unfriendly.
  • Return a 302 temporary redirect. At least if people are trying to debug and inspect closely what's happening, use curl, etc., they will see it very clearly. This makes two requests instead of one, which isn't great.

Something I could ask, which might help us decide: what is the use case for setting maxAge to a small value? Is it so a developer can debug a badge that seems slow to update, or is it because we want to let them opt out of our caching?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be two config settings: a default maxAge and a minimum maxAge. I'm leaning toward having a really small default but if down the line we decided to bump it up to 300 sec e.g., I could see letting people reduce it to a reasonable lower number like 30 or 60. Probably not zero though.

When both numbers are zero we end up thinking in circles. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I could ask, which might help us decide: what is the use case for setting maxAge to a small value?

Yes - this is a good question to think about.. As things stand right now, the default is no cache so the only reason someone is going to be using this right now is to set a long cache. No user is using this right now expecting it to make the cache shorter, so modifying the feature so that it can only make the cache longer (nor shorter) is broadly consistent with current expectations of what this feature is used for.

Also based on some highly scientific research (GitHub advanced search for "shields.io" "maxAge=" in markdown files), most of the usage of this in the wild is with large values (larger than the ones we're suggesting for a default), so we're unlikely to cause much/any unexpected behaviour by ignoring small values.

I think given that context, it is sensible not to allow users to manually set a shorter cache. In terms of signalling, lets start off by communicating this by documenting the limit and see what happens..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice idea to document the limit. However the approach here won't work well, because it's pulling the value from the build-time environment instead of the server environment. If that setting changed post-deploy (which is what we're trying to accomplish by making this env-configurable), the front end wouldn't update. I suppose you could say "the server's configured default" which gives a hint that there is one…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. Lets keep it vague for now but at least provide a hint..

}
if (maxAge === 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a little confused whether maxAge is a number or a string at this point. Should we coerce the query param?

ask.res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate');
ask.res.setHeader('Expires', reqTime.toGMTString()); // Proxies, GitHub, see #221.
ask.res.setHeader('Expires', reqTime.toGMTString());
} else {
ask.res.setHeader('Cache-Control', 'max-age=' + maxAge);
ask.res.setHeader('Expires', new Date(+reqTime + maxAge * 1000).toGMTString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to set Expires along with Cache-Control: max-age? To my reading of the spec, Cache-Control: max-age takes priority.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's explained here #1651 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that pointer!

From #1651 (comment):

It may be pointless in a perfect world but different implementations take different approach, there will always be someone not following the spec or maybe also not supporting Cache-Control but supporting Expires.

It seems like we're acting on a kind of superstition. It would be great to have an example of a client which does this, or a reliable report that this is true.

I'm fine leaving it in for now, though let's add a reference that stays with the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cache-Control header was defined as part of the HTTP/1.1 specification would be ignored by any HTTP client which implements the HTTP/1.0 standard but not HTTP/1.1. This is definitely an edge case and we should expect any modern web browser or downstream cache to implement HTTP/1.1 but we can't safely assume that only clients implementing HTTP/1.1 will connect to us. I've added a comment clarifying this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining that!

}

ask.res.setHeader('Date', reqTime.toGMTString());
Expand Down
24 changes: 22 additions & 2 deletions lib/request-handler.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,19 @@ describe('The request handler', function() {
before(analytics.load);

let camp;
let initialLongCache;
beforeEach(function (done) {
camp = Camp.start({ port: config.port, hostname: '::' });
camp.on('listening', () => done());
initialLongCache = process.env.LONG_CACHE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be reset before each test rather than just setting initially?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to set it back to the global/original value after each test. If you move this from beforeEach()/afterEach() to beforeAll()/afterAll(), some of the tests are going to be running with LONG_CACHE set to whatever value we set it to in the most recent test instead of the default value for the test environment (I realise at the moment this is probably undefined, but in principle this might change). This could introduce an unexpected behaviour. It is better if all tests in the suite inherit the default value for LONG_CACHE consistently. If a particular test needs it to be set to a particular value, we do it explicitly in that test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it makes sense to reset process.env.LONG_CACHE = initialLongCache after/beforeEach(),
But couldn't we set const initialLongCache = process.env.LONG_CACHE,
which would reset LONG_CACHE to its initial value after/before each test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now. Updated in a9d06bd

});
afterEach(function (done) {
clearRequestCache();
if (camp) {
camp.close(() => done());
camp = null;
}
process.env.LONG_CACHE = initialLongCache;
});

describe('the options object calling style', function() {
Expand Down Expand Up @@ -105,15 +108,32 @@ describe('The request handler', function() {
expect(handlerCallCount).to.equal(1);
});

it('should set the expires header to current time', async function () {
it('should set the expires header to current time + 15 mins if LONG_CACHE on', async function () {
process.env.LONG_CACHE = 1;
const res = await fetch(`${baseUri}/testing/123.json`);
expect(res.headers.get('expires')).to.equal(res.headers.get('date'));
const expectedExpiry = new Date(+(new Date(res.headers.get('date'))) + 900000).toGMTString();
expect(res.headers.get('expires')).to.equal(expectedExpiry);
expect(res.headers.get('cache-control')).to.equal('max-age=900');
});

it('should set the expires header to current time + max-age', async function () {
const res = await fetch(`${baseUri}/testing/123.json?maxAge=3600`);
const expectedExpiry = new Date(+(new Date(res.headers.get('date'))) + 3600000).toGMTString();
expect(res.headers.get('expires')).to.equal(expectedExpiry);
expect(res.headers.get('cache-control')).to.equal('max-age=3600');
});

it('should set Cache-Control: no-cache, no-store, must-revalidate if maxAge=0', async function () {
const res = await fetch(`${baseUri}/testing/123.json?maxAge=0`);
expect(res.headers.get('expires')).to.equal(res.headers.get('date'));
expect(res.headers.get('cache-control')).to.equal('no-cache, no-store, must-revalidate');
});

it('should set Cache-Control: no-cache, no-store, must-revalidate if LONG_CACHE off', async function () {
process.env.LONG_CACHE = 0;
const res = await fetch(`${baseUri}/testing/123.json`);
expect(res.headers.get('expires')).to.equal(res.headers.get('date'));
expect(res.headers.get('cache-control')).to.equal('no-cache, no-store, must-revalidate');
});

describe('the cache key', function () {
Expand Down