Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow additional time for PKI certificates to valid #1394

Merged
merged 4 commits into from
Jul 23, 2020

Conversation

jasonodonnell
Copy link
Contributor

@jasonodonnell jasonodonnell commented Jul 20, 2020

Vault adds 30 seconds to pad NotBefore (configurable) field for certificates issued by the PKI secret engine. This pad compensates for clockskew.

An issue I'm seeing with rendering certificates is Consul Template calculates the sleep time using cert.NotAfter - cert.NotBefore of the certificate and configures the read loop for 85-95% of that value, which has the padding included. In some situations, specifically when TTL is low, the certificate expires before the application serving it has time to react.

This PR changes Consul Template to instead use the expiration value sent along with the certificates. This means no cert inspection is required and we can do our calculations based on expiration - time.Now. This will give us a better duration to wait.

If the client has significant clockskew and requested certificates have a low TTL (maybe ~1m), this could still result in rendering new certs after the cert has already expired. As TTL grows, though, this issue quickly goes away.

This logic only applies to certificates that do not have a lease (generate_lease=false is the default). Leased certs use a different code path and renew instead of updating.

@eikenb eikenb added bug vault Related to the Vault integration labels Jul 20, 2020
@eikenb eikenb requested a review from a team July 20, 2020 19:01
@eikenb
Copy link
Contributor

eikenb commented Jul 20, 2020

Is this 30second padding documented? Maybe as part of the API, SDK or configuration?

I ask as my one concern with this is that we are working around an internal detail of Vault that could change in the future without notice.

@jasonodonnell
Copy link
Contributor Author

Hi @eikenb, it's not documented but I agree working around this internal detail isn't great. One thing nice about this code path, though, is if the duration goes below zero, it will default to the 5 minute loop.

Happy to discuss different changes but we're seeing reports of this causing applications to crash because the short time between being valid and rendering can cause errors. It's possible we could tune the random variance of the final duration calculation specifically for certs to be lower? Maybe 65%-75%.

@eikenb
Copy link
Contributor

eikenb commented Jul 21, 2020

In that second suggestion, were you talking about the block in that code where it does?

`sleep = sleep * (.85 + rand.Float64()*0.1)`

@jasonodonnell
Copy link
Contributor Author

That's what I was thinking @eikenb.

@eikenb
Copy link
Contributor

eikenb commented Jul 21, 2020

Bah... I think I'll just accept this as is.. Lowering the duration of those certs by an arbitrary % seems almost like obfuscating the real reason. Better ugly and obvious than clean but hidden.

I was trying to find where in Vault it sets the 30 second padding without luck. Could you point me in the right direction?

@eikenb eikenb added this to the 0.26.0 milestone Jul 21, 2020
@jasonodonnell
Copy link
Contributor Author

jasonodonnell commented Jul 22, 2020

@eikenb I was a little wrong about it being added to validTo field. The extra 30 seconds is actually coming from a padding of the NotBefore field: https://github.com/hashicorp/vault/blob/dcf2308b5cf4f09e6013ba1ce718212f87e0a880/sdk/helper/certutil/helpers.go#L514
https://www.vaultproject.io/api/secret/pki#not_before_duration

Since we're doing validTo - notBefore, we get an extra 30 seconds by default. I don't think we want to recommend tuning this to be lower because it can introduce subtle clock skew bugs.

I think the better solution here is to change the durationFromCert function to do validTo - time.Now. This will give us a more realistic base duration to do additional calculations from.

Thoughts?

@eikenb
Copy link
Contributor

eikenb commented Jul 22, 2020

This sounds promising, but I'd like to see it in code to help me fully get it.
Could you update the PR with your idea?

@kalafut kalafut requested review from calvn and removed request for kalafut July 22, 2020 22:25
@calvn
Copy link
Contributor

calvn commented Jul 23, 2020

Since the padding adjustment is a fixed value on consul template's end, can this issue be re-introduced if the not_before_duration is a non-default value? For instance, if it's 60s instead, there's now again a difference of 30s between when CT renders and when the cert actually expires.

@jasonodonnell
Copy link
Contributor Author

@calvn That's correct based on what I found. I suggested using cert.NotAfter - time.Now. It seems to work but I'm still fixing broken tests that have hardcoded certificates from 2018.

@jasonodonnell
Copy link
Contributor Author

jasonodonnell commented Jul 23, 2020

@calvn @eikenb I think I found a better path forward. After inspecting Vault's reply to a PKI issue request, I noticed one of the fields is expiration. We can use this instead of inspecting certificates.

I updated the code, removed the cert inspection code and added a test.

Sample of the returned values from Vault if you're curious:

/ $ vault write pki/issue/hashicorp-com common_name=www.hashicorp.com ttl=43799h
Key                 Value
---                 -----
certificate         -----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
expiration          1753195959
issuing_ca          -----BEGIN CERTIFICATE-----
...
Lf3EhhGsihUqD25mptQt
-----END CERTIFICATE-----
private_key         -----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
private_key_type    rsa
serial_number       6b:09:69:d5:8e:8d:e9:05:72:66:16:b7:20:79:d8:50:22:a3:6e:53

Copy link
Contributor

@eikenb eikenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Less code, no hacks. Tests.
LGTM!

@jasonodonnell jasonodonnell merged commit afe1806 into master Jul 23, 2020
@jasonodonnell jasonodonnell deleted the vault-cert-rotation branch July 23, 2020 20:42
@eikenb eikenb modified the milestones: 0.26.0, 0.25.1 Jul 23, 2020
eikenb added a commit to hashicorp/hcat that referenced this pull request Jul 23, 2020
manual cherry pick fix from consul-template (github.com/hashicorp/consul-template/pull/1394)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug vault Related to the Vault integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants