Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate CertFetcher with flag-protection into AMP Packager. #349

Merged
merged 23 commits into from
Nov 13, 2019
Merged
Changes from 2 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
3eae168
Update ACME config to include email adddress and acme challenge port.…
banaag Sep 17, 2019
6e6936f
Update ACME config to include email adddress and acme challenge port.…
banaag Sep 18, 2019
da28431
Fix build errors in last commit.
banaag Sep 18, 2019
53252ce
Add config file error checking for PopulateCertCache.
banaag Sep 18, 2019
848dbff
Add more logic to handle initial conditions with invalid cert and to …
banaag Sep 18, 2019
bfd3a6d
Code refactor/cleanup involving certs.
banaag Sep 23, 2019
058fdff
Add DNS and TLS challenges, added them to load from config, cleaned u…
banaag Sep 27, 2019
5d0800d
go mod tidy, go mod vendor updates
banaag Sep 27, 2019
3869517
remove debug log statements
banaag Sep 27, 2019
e793128
Fixed CSR Loading, added it to config
banaag Sep 27, 2019
f5ba299
Fixed bugs with checking for cert expiry.
banaag Sep 30, 2019
7e81054
Added support for saving the fetched certs to disk and for certs to b…
banaag Oct 21, 2019
f2cdf94
Fixed gateway server call to certcache. Removed go module files insid…
banaag Oct 22, 2019
234a353
Merge branch 'master' into certfetcher-integrate
banaag Oct 23, 2019
f9bf520
Fixed certcache_test.go after merge.
banaag Oct 23, 2019
9633713
Fixed bugs in certcache, also fixed unit test.
banaag Oct 24, 2019
86f850b
Added locking for reading/writing certs.
banaag Oct 25, 2019
99e49b2
Ran go fmt on files that have incorrect formatting
banaag Oct 25, 2019
eeedb8f
Fix twifkak first-pass comments except the logic change comment which…
banaag Oct 31, 2019
eeae624
Fix twifkak comments for ocsp refresh logic and ocsp cache purge.
banaag Oct 31, 2019
8ac5827
Fix additional twifkak comments.
banaag Nov 7, 2019
1539d48
Fix 2nd round of twifkak comments.
banaag Nov 13, 2019
2a668e1
Fix gregable@ comments.
banaag Nov 13, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 119 additions & 39 deletions packager/certcache/certcache.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,22 +49,24 @@ var infiniteFuture = time.Unix(1<<63-62135596801, 999999999)
// 1MB is the maximum used by github.com/xenolf/lego/acmev2 in GetOCSPForCert.
// Alternatively, here's a random documented example of a 20K limit:
// https://www.ibm.com/support/knowledgecenter/en/SSPREK_9.0.0/com.ibm.isam.doc/wrp_stza_ref/reference/ref_ocsp_max_size.html
const MaxOCSPResponseBytes = 1024 * 1024
const maxOCSPResponseBytes = 1024 * 1024

// How often to check if OCSP stapling needs updating.
const OcspCheckInterval = 1 * time.Hour
const ocspCheckInterval = 1 * time.Hour

// How often to check if certs needs updating.
const CertCheckInterval = 24 * time.Hour
const certCheckInterval = 24 * time.Hour

// Max number of OCSP request tries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that this will time out after 2^10 minutes = ~16 hours.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You put it on certCheckInterval, but I think it should go on maxOCSPTries, since that's the one that does exponential backoff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, fixed.

const MaxOCSPTries = 10
const maxOCSPTries = 10

// Recommended renewal duration for certs. This is duration before next cert expiry.
// 8 days is recommended duration to start requesting new certs to allow for ACME server outages.
// It's 6 days + 2 days renewal grace period.
twifkak marked this conversation as resolved.
Show resolved Hide resolved
// 6 days so that generated SXGs are valid for their full lifetime, plus 2 days in front of that to allow time for the new cert
// to be obtained.
// TODO(banaag): make 2 days renewal grace period configurable in toml.
const CertRenewalInterval = 8 * 24 * time.Hour
const certRenewalInterval = 8 * 24 * time.Hour

type CertHandler interface {
GetLatestCert() *x509.Certificate
Expand All @@ -79,30 +81,30 @@ type CertCache struct {
// If certFetcher is not set, that means cert auto-renewal is not available.
certFetcher *certfetcher.CertFetcher
renewedCertsMu sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The certName should change on a cert renewal (the sha256 computed by util.CertName is different), I think? So I think we need a CertCache.renewedCertName field also.
  2. I think we need to put any reads/writes of certName and renewedCertName under their respective RWMutex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

renewedCertName string
renewedCerts []*x509.Certificate
ocspUpdateAfterMu sync.RWMutex
ocspUpdateAfter time.Time
// TODO(twifkak): Implement a registry of Updateable instances which can be configured in the toml.
ocspFile Updateable
ocspFile Updateable
ocspFilePath string
client http.Client

// "Virtual methods", exposed for testing.
// Given a certificate, returns the OCSP responder URL for that cert.
extractOCSPServer func(*x509.Certificate) (string, error)
// Given an HTTP request/response, returns its cache expiry.
httpExpiry func(*http.Request, *http.Response) time.Time

// Domains to validate
Domains []string
CertFile string
NewCertFile string

// Is CertCache initialized to do cert renewal or OCSP refreshes?
isInitialized bool

// "Virtual methods", exposed for testing.
// Given a certificate, returns the OCSP responder URL for that cert.
extractOCSPServer func(*x509.Certificate) (string, error)
// Given an HTTP request/response, returns its cache expiry.
httpExpiry func(*http.Request, *http.Response) time.Time
}

// Callers need to call Init() on the returned CertCache before the cache can auto-renew certs.
// Callers can use the uninitialized CertCache either for testing certificates (without doing OCSP or
// Callers can use the uninitialized CertCache for testing certificates (without doing OCSP or
// cert refreshes).
func New(certs []*x509.Certificate, certFetcher *certfetcher.CertFetcher, domains []string,
certFile string, newCertFile string, ocspCache string) *CertCache {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this comment optional.

The long argument list makes the callsites tricky to read and easy to get wrong, especially if several of the arguments have the same type.

An alternative pattern would be to create an IsInitialized() bool or similarly named function that verifies all of the required fields have been set. Then callers can just set fields in the struct by name and assert IsInitialized before doing anything with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO.

Expand All @@ -125,6 +127,7 @@ func New(certs []*x509.Certificate, certFetcher *certfetcher.CertFetcher, domain
// want to have all of them hammering the OCSP server - ideally,
// you'd have one request, in the backend, and updating them all.
ocspFile: &Chained{first: &InMemory{}, second: &LocalFile{path: ocspCache}},
ocspFilePath: ocspCache,
client: http.Client{Timeout: 60 * time.Second},
extractOCSPServer: func(cert *x509.Certificate) (string, error) {
if cert == nil || len(cert.OCSPServer) < 1 {
Expand Down Expand Up @@ -198,10 +201,10 @@ func (this *CertCache) GetLatestCert() *x509.Certificate {
this.updateCertIfNecessary()
return this.getCert()
}
if d >= time.Duration(CertRenewalInterval) {
if d >= time.Duration(certRenewalInterval) {
// Cert is still valid.
return this.getCert()
} else if d < time.Duration(CertRenewalInterval) {
} else if d < time.Duration(certRenewalInterval) {
// Cert is still valid, but we need to start process of requesting new cert.
log.Println("Current cert is close to expiry threshold, attempting to renew in the background.")
return this.getCert()
Expand Down Expand Up @@ -234,6 +237,9 @@ func (this *CertCache) ocspMidpoint(bytes []byte, issuer *x509.Certificate) (tim

func (this *CertCache) ServeHTTP(resp http.ResponseWriter, req *http.Request) {
params := mux.Params(req)

this.certsMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be RLock/RUnlock? The only use of this.certs is a read, down on line 333.

Also, should it be closer to the read (i.e. inside the for loop)?

Copy link
Collaborator Author

@banaag banaag Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already RLock and RUnlock in the current version (maybe you were looking at an older file?). Also, this is supposed to guard the read on this.certName. In the current version, I don't see anything on line 333 that needs to be guarded.

defer this.certsMu.Unlock()
if params["certName"] == this.certName {
// https://tools.ietf.org/html/draft-yasskin-httpbis-origin-signed-exchanges-impl-00#section-3.3
// This content-type is not standard, but included to reduce
Expand Down Expand Up @@ -319,12 +325,12 @@ func (this *CertCache) readOCSP() ([]byte, time.Time, error) {
// If certFetcher is nil, that means we are not auto-renewing so don't retry OCSP.
maxTries = 1
} else {
maxTries = MaxOCSPTries
maxTries = maxOCSPTries
}

for numTries := 0; numTries < maxTries; {
ocsp, err = this.ocspFile.Read(context.Background(), this.shouldUpdateOCSP, func(orig []byte) []byte {
return this.fetchOCSP(orig, &ocspUpdateAfter, numTries > 0)
return this.fetchOCSP(orig, this.certs, &ocspUpdateAfter, numTries > 0)
})
if err != nil {
if numTries >= maxTries-1 {
Expand Down Expand Up @@ -390,7 +396,7 @@ func (this *CertCache) maintainOCSP(stop chan struct{}) {
// has trouble getting a request, hopefully it does something
// smarter than just retry in a busy loop, hammering the OCSP server
// into further oblivion.
ticker := time.NewTicker(OcspCheckInterval)
ticker := time.NewTicker(ocspCheckInterval)

for {
select {
Expand Down Expand Up @@ -475,22 +481,50 @@ func (this *CertCache) findIssuer() *x509.Certificate {
return nil
}

// Finds the issuer of the specified cert (i.e. the second from the bottom of the
// chain).
func (this *CertCache) findIssuerUsingCerts(certs []*x509.Certificate) *x509.Certificate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminate the duplication: implement findIssuer() using this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if certs == nil || len(certs) == 0 {
return nil
}
issuerName := this.certs[0].Issuer
for _, cert := range this.certs {
// The subject name is guaranteed to match the issuer name per
// https://tools.ietf.org/html/rfc3280#section-4.1.2.4 and
// #section-4.1.2.6. (The latter guarantees that the subject
// name will be in the subject field and not the subjectAltName
// field for CAs.)
//
// However, the definition of "match" is more complicated. The
// general "Name matching" algorithm is defined in
// https://www.itu.int/rec/T-REC-X.501-201610-I/en. However,
// RFC3280 defines a subset, and pkix.Name.String() defines an
// ad hoc canonical serialization (as opposed to
// https://tools.ietf.org/html/rfc1779 which has many forms),
// such that comparing the two strings should be sufficient.
if cert.Subject.String() == issuerName.String() {
return cert
}
}
return nil
}

// Queries the OCSP responder for this cert and return the OCSP response.
func (this *CertCache) fetchOCSP(orig []byte, ocspUpdateAfter *time.Time, isRetry bool) []byte {
issuer := this.findIssuer()
func (this *CertCache) fetchOCSP(orig []byte, certs []*x509.Certificate, ocspUpdateAfter *time.Time, isRetry bool) []byte {
issuer := this.findIssuerUsingCerts(certs)
if issuer == nil {
log.Println("Cannot find issuer certificate in CertFile.")
return orig
}
// The default SHA1 hash function is mandated by the Lightweight OCSP
// Profile, https://tools.ietf.org/html/rfc5019 2.1.1 (sleevi #4, see above).
req, err := ocsp.CreateRequest(this.getCert(), issuer, nil)
req, err := ocsp.CreateRequest(certs[0], issuer, nil)
if err != nil {
log.Println("Error creating OCSP request:", err)
return orig
}

ocspServer, err := this.extractOCSPServer(this.getCert())
ocspServer, err := this.extractOCSPServer(certs[0])
if err != nil {
log.Println("Error extracting OCSP server:", err)
return orig
Expand Down Expand Up @@ -534,7 +568,7 @@ func (this *CertCache) fetchOCSP(orig []byte, ocspUpdateAfter *time.Time, isRetr
// expiry earlier than we'd usually follow.
*ocspUpdateAfter = this.httpExpiry(httpReq, httpResp)

respBytes, err := ioutil.ReadAll(io.LimitReader(httpResp.Body, 1024*1024))
respBytes, err := ioutil.ReadAll(io.LimitReader(httpResp.Body, maxOCSPResponseBytes))
if err != nil {
log.Println("Error reading OCSP response:", err)
return orig
Expand All @@ -544,7 +578,7 @@ func (this *CertCache) fetchOCSP(orig []byte, ocspUpdateAfter *time.Time, isRetr
// 2. Validate the server responses to make sure it is something the client will accept.
// and also per sleevi #4 (see above), as required by
// https://tools.ietf.org/html/rfc5019#section-2.2.2.
resp, err := ocsp.ParseResponseForCert(respBytes, this.getCert(), issuer)
resp, err := ocsp.ParseResponseForCert(respBytes, certs[0], issuer)
if err != nil {
log.Println("Error parsing OCSP response:", err)
return orig
Expand Down Expand Up @@ -575,7 +609,7 @@ func (this *CertCache) fetchOCSP(orig []byte, ocspUpdateAfter *time.Time, isRetr
func (this *CertCache) maintainCerts(stop chan struct{}) {
// Only make one request per certCheckInterval, to minimize the impact
// on servers that are buckling under load.
ticker := time.NewTicker(CertCheckInterval)
ticker := time.NewTicker(certCheckInterval)

for {
select {
Expand All @@ -600,6 +634,18 @@ func (this *CertCache) getCert() *x509.Certificate {
return this.certs[0]
}

// Returns true iff cert cache renewal contains at least 1 cert.
func (this *CertCache) hasRenewalCert() bool {
return len(this.renewedCerts) > 0 && this.renewedCerts[0] != nil
}

func (this *CertCache) getRenewalCert() *x509.Certificate {
if !this.hasRenewalCert() {
return nil
}
return this.renewedCerts[0]
}

// Set current cert with mutex protection.
func (this *CertCache) setCerts(certs []*x509.Certificate) {
this.certsMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to make certsMu an RWMutex, and RLock-protect all reads also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certsMU is already an RWMutex (you maybe looking at an older version of the code when you commented), added RLock-protect for all reads also. Did the same for renewedCerts.

Expand All @@ -620,12 +666,14 @@ func (this *CertCache) setNewCerts(certs []*x509.Certificate) {
this.renewedCerts = certs

if this.renewedCerts == nil {
this.renewedCertName = ""
err := certloader.RemoveFile(this.NewCertFile)
if err != nil {
log.Printf("Unable to remove file: %s", this.NewCertFile)
}
return
}
this.renewedCertName = util.CertName(certs[0])

err := certloader.WriteCertsToFile(this.renewedCerts, this.NewCertFile)
if err != nil {
Expand All @@ -650,13 +698,14 @@ func (this *CertCache) updateCertIfNecessary() {
d, err = util.GetDurationToExpiry(this.certs[0], time.Now())
}
if err != nil {
// Current cert is already invalid, check if we have a pending renewal cert.
if this.renewedCerts != nil {
// If renewedCerts is set, copy that over to certs
// and set renewedCerts to nil.
// TODO(banaag): do the same cert setting dance on disk.
// Purge OCSP cache? Make shouldUpdateOCSP() return true?
this.setCerts(this.renewedCerts)
this.setNewCerts(nil)
// Purge OCSP cache
certloader.RemoveFile(this.ocspFilePath)
return
}
// Current cert is already invalid. Try refreshing.
Expand All @@ -669,30 +718,61 @@ func (this *CertCache) updateCertIfNecessary() {
this.setCerts(certs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this one have an OCSP cache purge, too? If so, maybe move the certloader.RemoveFile call inside the setCerts impl.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

return
}
if d >= time.Duration(CertRenewalInterval) {
if d >= time.Duration(certRenewalInterval) {
// Cert is still valid, don't do anything.
} else if d < time.Duration(CertRenewalInterval) {
// Cert is still valid, but we need to start process of requesting new cert.
log.Println("Warning: Current cert crossed threshold for renewal, attempting to renew.")
certs, err := this.certFetcher.FetchNewCert()
if err != nil {
log.Println("Error trying to fetch new certificates from CA: ", err)
return
} else if d < time.Duration(certRenewalInterval) {
// Check if we already have a renewal cert waiting, fetch a new cert if not.
if this.renewedCerts == nil {
// Cert is still valid, but we need to start process of requesting new cert.
log.Println("Warning: Current cert crossed threshold for renewal, attempting to renew.")
certs, err := this.certFetcher.FetchNewCert()
if err != nil {
log.Println("Error trying to fetch new certificates from CA: ", err)
return
}
this.setNewCerts(certs)
} else {
var ocspUpdateAfter time.Time

ocsp, _, errorOcsp := this.readOCSP()
if errorOcsp != nil {
newOcsp := this.fetchOCSP(ocsp, this.renewedCerts, &ocspUpdateAfter, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this is going to try to fetch the OCSP response for the new cert once a day, rather than using the backoff retry logic you added to readOCSP. Is that desired? Maybe that's OK for now, but can you add a TODO to revisit?

I realize it's difficult to use readOCSP here, since it's hard-coded to use this.certs and friends, rather than this.renewedCerts and friends. That makes me think two things:

  1. The extraction of the retry logic from readOCSP would be useful.
  2. We should bundle certName, certs, certsMu, ocspFile, ocspFilePath, ocspUpdateAfter, and ocspUpdateAfterMu into a new struct type, and then have two copies of that in certcache - one for current certs and one for new certs.

OK for future PR... TODO? 🐑 😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for the certs is by design and is as follows:
updateCertIfNecessary() will be called:

  1. On demand, during initialization or when GetLatestCert() is called and we already have an expired cert.
  2. Every 'certCheckInterval' hours (currently every 24 hours).

The logic for retries in readOCSP is specific to just reading OCSP and retrying when we got a new cert because of Digicert's setup.

Inside updateCertIfNecessary, there's a case where we already got the renewalCert (waiting in this.renewedCerts) and in this case, we make sure we also get the OCSP by calling readOCSP with retries set to true.

I understand your suggestion of possibly reusing the readOCSP retry logic in this edge case, but given that the updateCertIfNecessary() calls are a day apart, chances are the fetchOCSP might fail the first time, but will succeed the next day. Given that, it will still be nice-to-have the refactor of readOCSP so I added this as a TODO.

// Check if newOcsp != ocsp and that there are no errors, health-wise with new ocsp.
if !bytes.Equal(newOcsp, ocsp) && this.isHealthy(newOcsp) == nil {
// We were able to fetch new OCSP with renewal cert, time to switch to new certs.
this.setCerts(this.renewedCerts)
this.setNewCerts(nil)
// Purge OCSP cache
certloader.RemoveFile(this.ocspFilePath)
}
}
}
this.setNewCerts(certs)
}
}

func (this *CertCache) reloadCertIfExpired() {
d := time.Duration(0)
err := errors.New("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of having these incorrect default values. Perhaps it's worth extracting the first half of this function as a:

func (this *CertCache) doesCertNeedReloading() bool {
  if !this.hasCert() { return true }
  d, err := ...
  return err != nil || d < certRenewalInterval
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if this.hasCert() {
d, err = util.GetDurationToExpiry(this.certs[0], time.Now())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/spaces/tabs/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran go fmt certcache.go

if err == nil && d >= time.Duration(certRenewalInterval) {
// Cert not expired and still have time before renewing, do nothing.
return
}

// If we get to here, the cert was either expired, or it's time to renew.
// We always validate the certs here. If we are in development mode and the certs don't validate,
// it doesn't matter because the old certs won't be overriden (and the old certs are probably invalid, too).
// it doesn't matter because the old certs won't be overridden (and the old certs are probably invalid, too).
certs, err := certloader.LoadAndValidateCertsFromFile(this.CertFile, true)
if err != nil {
log.Println(errors.Wrap(err, "Can't load cert file."))
certs = nil
}
if certs != nil {
this.setCerts(certs)
// Purge OCSP cache
certloader.RemoveFile(this.ocspFilePath)
}

newCerts, err := certloader.LoadAndValidateCertsFromFile(this.NewCertFile, true)
Expand Down