Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images have no-store, must-revalidate which absolutely kills performance #1723

Closed
kornelski opened this issue Jun 6, 2018 · 5 comments
Closed
Labels
core Server, BaseService, GitHub auth, Shared helpers performance-improvement Related to performance or throughput of the badge servers

Comments

@kornelski
Copy link

The responses from img.shields.io have

cache-control: no-cache, no-store, must-revalidate

which is absolutely the worst case for HTTP requests. It means every client has to make a full request, all the way up to your origin server, with no help of the CDN, every. single. time. it displays the badge.

This seems unreasonably restrictive and clearly your origin server is struggling (I'm seeing responses taking over 10 seconds per badge). Even if you'd like the responses to be super fresh and update as soon as the data changes, allowing cache of even just 1 second would allow the CDN to take some traffic off your back-end servers without affecting freshness of the responses:

cache-control: max-age=1

In case the no-store, must-revalidate is not added by your origin server, but the CDN, consider configuring the CDN to allow some caching (e.g. Page Rules can be used for this in Cloudflare).

@chris48s
Copy link
Member

chris48s commented Jun 7, 2018

Thanks for raising this.

We don't use cloudflare as a downstream cache. It has been discussed it in the #maintainers channel but there are some concerns about CloudFlare's use of tracking cookies: #608 and the conversation has somewhat fizzled out. Maybe this is a good opportunity to re-open that conversation though.

Aside from that, modifying the cache-control headers as suggested should give us benefit when using someone else's CDN (e.g: GitHub's camo proxy - a very common case). This seems like a good idea to me, but looking at the code, it does seem to be a very deliberate decision to avoid being cached by camo:

// Cache management - no cache, so it won't be cached by GitHub's CDN.
ask.res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate');
ask.res.setHeader('Expires', reqTime.toGMTString()); // Proxies, GitHub, see #221.

also see #221

.. but I am not so clear on why this is such a bad thing. Is anyone more familiar with the history? Why this is something we need to avoid?

@RedSparr0w
Copy link
Member

RedSparr0w commented Jun 7, 2018

I assume it was done to avoid outdated badges,
But at the current rate i feel like the badges should at least default to a ~1 hour cache (with option to set ?maxAge=0 for more important badges).

I've been tracking the badges uptime here

(note the uptime is based on the badge having a response time < 4 seconds)

1 second delay:
1 second
2 second delay:
2 seconds
3 second delay:
3 seconds
3.9 second delay:
3 seconds
3.95 second delay:
3 seconds
4 second delay: (always fails)
4 seconds


It would be good if we could get something merged before the next deploy as #1568 is a pretty big issue (with 40 👍 already)

@paulmelnikow paulmelnikow added core Server, BaseService, GitHub auth, Shared helpers operations Hosting, monitoring, and reliability for the production badge servers performance-improvement Related to performance or throughput of the badge servers and removed operations Hosting, monitoring, and reliability for the production badge servers labels Jun 16, 2018
@chris48s
Copy link
Member

#1725 now merged. There's still a bit of additional work to do in order to deploy this and find the right value for env.BADGE_MAX_AGE_SECONDS.

@paulmelnikow
Copy link
Member

This was turned on in #1846 which just went live. Will be curious to see how this goes in the morning!

@paulmelnikow
Copy link
Member

I’m glad to say this has had a huge effect. Today’s peak traffic is being handled like weekend traffic, with 99% of requests coming in underneath the 4 second camo timeout. I’m thrilled!

That should give us a little time to sort out our hosting. We’re still relatively slow on a number of badges, particularly the static badges which should be instant, so we still have a capacity issue to address. Likely this is due to ~10% growth over the last several months. I have proposed moving to Zeit Now to fix the capacity issue and solve our sysadmin bottleneck at the same time. This proposal is blocked awaiting response from @espadrine who owns the servers and load balancer.

Nice work, @chris48s, and everyone else who participated in the discussion and reviews. Thanks also to @kornelski for putting this solution clearly on the radar.

Let’s leave this open for a couple days and as long as things still look good, close it out. Then we can open new issues for any follow-on discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Server, BaseService, GitHub auth, Shared helpers performance-improvement Related to performance or throughput of the badge servers
Projects
None yet
Development

No branches or pull requests

4 participants