-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] allow server cert rotation without a restart #1672
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
This reads and parses the certificates on every request. I feel it would be appropriate to do this periodically instead and atomically swap them. |
Yep, and it was the second item on my to do list in the PR description 😸
I already refactored it to compare mod times and reload only when different, but will think how to improve even more. by the way currently the Prometheus scrape client does this with every request which I agree should be improved. |
50afeaf
to
41aeb30
Compare
Sorry was too hasty. All good :) |
pkg/tls/tls.go
Outdated
return nil, errors.Wrap(err, "building client CA certificate") | ||
} | ||
|
||
// ????? test if true...... This is a pointer it will update the underlying tls config pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is still a work in progress because of these comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep one of the items in my todo list
test if the CA pool gets updated with the ponter
Do we have a story for compatible rotation? e.g how this will work between server and client? (: e.g does the rotation has to happen in the same time for both srv and client? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a story for compatible rotation? e.g how this will work between server and client? (: e.g does the rotation has to happen in the same time for both srv and client?
good point. I think Thanos should only be responsible for rotating the certs when new ones are presented the rest should be handled by the config management.
All that said at the moment the setup allows a single CA and maybe it should be changed to allow multiple CA files to allow an easier rotation.
Even if a single file, CAs can usually be loaded as bundles so in reality there are multiple CAs in one file. |
now after writing the tests I realized that there is another problem. So with this new info I think there are 3 options:
Again I think this is important mainly for the thanos receiver as I imagine that with all other components a new connection and a new tls handshake would be created with every request so it will reload the certs as expected. cc @bwplotka, @s-urbaniak |
@squat , @bwplotka @s-urbaniak waiting for some feedback on which options do you think is best for this implementation. |
@krasi-georgiev Sorry for the late reply. I am voting for option "1. keep the current implementation with a cli flag for MaxConnectionAge" I believe this is a sensible setting and less disruptive than the other options. Also, it does not require external coordination (someone has to implement that timeout). Finally cert rotation happens in timespans of minutes, maybe even hours so a MaxConnectionAge in that timespan sounds sensible to me too. @bwplotka @brancz Do you have additional thoughts or objections? |
How did we work around this one? So I think MaxConnectionAge here might be too short by default ( What's the requirement for those cert rotations in terms of delay? I think ~minutes is ok, so maybe we could start with this, but larger |
agreed, the ~minutes ballpark sounds good to me. |
37a00c4
to
445a68d
Compare
Signed-off-by: Krasi Georgiev <[email protected]>
445a68d
to
7da85ca
Compare
Signed-off-by: Krasi Georgiev <[email protected]>
After a bit of a struggle the tls rotation is implemented with full e2e tests. As mentioned in the first PR comment server CA rotation is missing becasue it is yet not added in the golang std library so I left a TODO note to add it when it is added in the golang std library. I don't think that this is a blocker for this PR and I will follow up the tracking golang issues for updates. @squat I also noticed that there are no e2e tests for the reciever TLS, should I add it to this PR or you prefer to open another PR after this one is merged? |
Signed-off-by: Krasi Georgiev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good so far!
Signed-off-by: Krasi Georgiev <[email protected]>
let's do it in a new PR to keep the concerns separate :) I can make the PR |
ping @bwplotka |
@krasi-georgiev are there any numbers on the performance impact of the current implementation? 👀 does Windows exhibit the same behavior with caching? |
probably, but only for single syscall to get the
I would say yes, but is windows even suported by Thanos? |
Thanks for this PR, it is awesome to automatically reload certificates when they are modified. It works perfectly on Kubernetes with certificates automatically renewed by Cert-Manager 👍 EDIT: Sorry guys, I edit this comment because of huge pebcak on my side ... I have tested this PR and it seems working perfectly ! |
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions. |
Still much needed. Looking for more time to review it properly. Help wanted.
…On Sat, 4 Apr 2020 at 23:35, stale[bot] ***@***.***> wrote:
This issue/PR has been automatically marked as stale because it has not
had recent activity. Please comment on status otherwise the issue will be
closed in a week. Thank you for your contributions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1672 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVA3O4MJTA5B7WPDWLTIYLRK6Y3NANCNFSM4JDEVZAA>
.
|
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
It is indeed (:
…On Thu, 4 Jun 2020 at 10:50, stale[bot] ***@***.***> wrote:
Is this still relevant? If so, what is blocking it? Is there anything you
can do to help move it forward?
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1672 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVA3OYE7SATENMESBDUAWTRU5U5NANCNFSM4JDEVZAA>
.
|
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
still valid (:
…On Mon, 3 Aug 2020 at 11:38, stale[bot] ***@***.***> wrote:
Is this still relevant? If so, what is blocking it? Is there anything you
can do to help move it forward?
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1672 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVA3O7WWUZXGRE2P4OWMI3R62HR3ANCNFSM4JDEVZAA>
.
|
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
For review, help wanted.
Kind Regards,
Bartek Płotka (@bwplotka)
…On Fri, 2 Oct 2020 at 14:54, stale[bot] ***@***.***> wrote:
Is this still relevant? If so, what is blocking it? Is there anything you
can do to help move it forward?
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1672 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVA3O5UKITRY25SMPKGGYDSIXLSXANCNFSM4JDEVZAA>
.
|
stale, someone else needs to take over. |
Why do we need to set |
I did a quick test and reading the file from disk on every request is ok sinse the kernel puts this in the pagecache so in practice it reads the file from memory when not changed and reads it from disk when it is replaced with a new one.
GetServerConfig
as mentioned in proposal: Add GetClientCAs to tls.Config golang/go#16066 (comment)Tracking issue: proposal: crypto/tls: add GetConfigForServer callback to *tls.Config golang/go#22836
should be able to use
VerifyPeerCertificate
for this - proposal: crypto/tls: add GetConfigForServer callback to *tls.Config golang/go#22836 (comment)related modules that use the same idea.
https://godoc.org/golang.org/x/crypto/acme/autocert
https://github.com/johanbrandhorst/certify
I did a quick test and reading the same file from disk is not a problem as the kernel caches it in the memory when the file is not changed.
Test for reading the same file again doesn't hit the disk