Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure a consistent TLS configuration #173

Merged
merged 12 commits into from
Jan 12, 2023
Merged

Conversation

benashz
Copy link
Contributor

@benashz benashz commented Nov 25, 2022

Previously, it was possible for the http.Client's Transport to be missing the necessary root CAs to ensure that all TLS connections between the auth engine and the Kubernetes API were validated against a configured set of CA certificates.

This fix ensures that the http.Client's Transport is always consistent with the configured CA cert chain, by introducing a periodic TLS configuration checker that is started as part of the backend's initialization.

Other fixes:

  • only update the client's transport when the CA certificate pool has changed.

Closes #169 #170

Previously, it was possible for the http.Client's Transport to be
missing the necessary root CAs to ensure that all TLS connections
between the auth engine and the Kubernetes API were validated against a
configured set of CA certificates.

This fix ensures that the http.Client's Transport is always consistent
with the configured CA cert chain, but introducing a periodic TLS
configuration checker that is started as part of the backend's
initialization. The periodic checker provides a signal handler that will
update the backend's TLS configuration on SIGHUP.

Other fixes:
- only update the client's transport when the CA certificate pool has
  changed.
@anthonyralston
Copy link

Hey @benashz, do you have an ETA for when this PR will make it out in a Vault release? As mentioned in #169, we're seeing this issue in our production Vault cluster.

@benashz
Copy link
Contributor Author

benashz commented Dec 5, 2022

Hi @anthonyralston , we are hoping to have this PR merged this week. I can't say for sure when the Vault release will be out. We should have more information about that soon.

backend.go Outdated Show resolved Hide resolved
@benashz benashz marked this pull request as ready for review December 6, 2022 23:49
@benashz benashz requested a review from tomhjp December 6, 2022 23:50
backend.go Outdated Show resolved Hide resolved
backend.go Show resolved Hide resolved
@tsaarni
Copy link
Contributor

tsaarni commented Dec 7, 2022

I'd be curious to learn why is timer based polling needed?

In the scenario where the CA certificate is read from a local file, the logic in this PR might have some unintentional interactions with the previous logic in cachingFileReader.ReadFile(). The reader was introduced to periodically reload the token (and CA) from disk (#122). The caller always calls ReadFile() and in case when the file was already loaded, then "fast path" implementation returns a copy from memory. When configured caching period has passed, then it proceeds with "slow path" and re-reads it from the disk.

If I understood correctly, the go routine in this PR would result in a call ReadFile() once every 30 seconds? Since caching for CA cert file is set up for 1 hour, wouldn't it mean that actual disk read still happens only every 120th tick i.e. once every hour?

// caReloadPeriod is the time period how often the in-memory copy of local
// CA cert can be used, before reading it again from disk.
caReloadPeriod = 1 * time.Hour

In general, I suppose CA certificate is not updated too often. It is likely very long-lived certificate. Long(ish) 1 hour period made sense if having reload at all. I guess the whole reload logic only became relevant now due to the error in storage backend - causing it not to be loaded at all.

I guess the TLSClientConfig could simply be initialized each time getHTTPClient() is called (either from caching reader or config) without too much impact on performance?

@benashz
Copy link
Contributor Author

benashz commented Dec 7, 2022

If I understood correctly, the go routine in this PR would result in a call ReadFile() once every 30 seconds? Since caching for CA cert file is set up for 1 hour, wouldn't it mean that actual disk read still happens only every 120th tick i.e. once every hour?

In the case where the CA source is a local file, then yes. Ideally the read would only happen whenever the file has changed, not based on an arbitrary TTL. There would only ever be a 30 second delay between CA certificate rotation in that case.

I guess the TLSClientConfig could simply be initialized each time getHTTPClient() is called (either from caching reader or config) without too much impact on performance?

Yes, that is definitely one approach, but we would prefer to have the setup/update of the tlsConfig be decoupled from the HTTP client's use.

@tsaarni
Copy link
Contributor

tsaarni commented Dec 7, 2022

In the case where the CA source is a local file, then yes. Ideally the read would only happen whenever the file has changed, not based on an arbitrary TTL. There would only ever be a 30 second delay between CA certificate rotation in that case.

In the other use case, where CA is in config: does it make sense to poll storage every 30 seconds, since storage cannot change without us knowing about it?

@tsaarni
Copy link
Contributor

tsaarni commented Dec 7, 2022

In the case where the CA source is a local file, then yes. Ideally the read would only happen whenever the file has changed, not based on an arbitrary TTL. There would only ever be a 30 second delay between CA certificate rotation in that case.

In the other use case, where CA is in config: does it make sense to poll storage every 30 seconds, since storage cannot change without us knowing about it?

Or is it so that it will poll storage every 30 seconds, regardless of CA being on disk or in config? Is that OK in performance wise?

@benashz
Copy link
Contributor Author

benashz commented Dec 7, 2022

In the case where the CA source is a local file, then yes. Ideally the read would only happen whenever the file has changed, not based on an arbitrary TTL. There would only ever be a 30 second delay between CA certificate rotation in that case.

In the other use case, where CA is in config: does it make sense to poll storage every 30 seconds, since storage cannot change without us knowing about it?

It is possible that a config update might fail to update the TLS config, in this case the poller will ensure that the issue is mitigated without the need for user intervention. We feel that the cost of an extra storage request and cert parsing is worth it here.

@tsaarni
Copy link
Contributor

tsaarni commented Dec 7, 2022

Ok, fair enough! The issue could be avoided e.g. by checking only at getHTTPClient() when the config is anyways fetched - and which is also the only moment when the TLSClientConfig is actually used. I was thinking maybe the original problem is relatively infrequent issue, and therefore not worthy of having a background job that will run constantly regardless of the information being used (or changed).

@benashz benashz requested review from ncabatoff and tvoran December 8, 2022 15:39
@heatherezell heatherezell linked an issue Dec 8, 2022 that may be closed by this pull request
backend.go Outdated Show resolved Hide resolved
backend.go Show resolved Hide resolved
Further clarify the docs related to the horizon vars.
Copy link
Contributor

@tomhjp tomhjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Lots of comments on subtle concurrency concerns, so I may have got some wrong.

backend.go Outdated Show resolved Hide resolved
common_test.go Show resolved Hide resolved
path_login.go Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Show resolved Hide resolved
path_config.go Show resolved Hide resolved
path_login.go Outdated Show resolved Hide resolved
backend.go Show resolved Hide resolved
backend.go Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
@ncabatoff ncabatoff removed their request for review December 13, 2022 14:01
@benashz benashz requested a review from tomhjp January 11, 2023 17:31
Copy link
Contributor

@tomhjp tomhjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just some non-functional nit suggestions

backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
backend.go Outdated Show resolved Hide resolved
@benashz benashz merged commit 6fb49ec into main Jan 12, 2023
@benashz benashz deleted the VAULT-11590/handle-failed-init branch January 12, 2023 20:36
@benashz benashz added backport/vault-1.12.x backport assistant tag and removed backport/vault-1.12.x backport assistant tag labels Jan 13, 2023
benashz added a commit that referenced this pull request Jan 13, 2023
* Ensure a consistent TLS configuration for k8s API requests

Previously, it was possible for the http.Client's Transport to be
missing the necessary root CAs to ensure that all TLS connections
between the auth engine and the Kubernetes API were validated against a
configured set of CA certificates.

This fix ensures that the http.Client's Transport is always consistent
with the configured CA cert chain, by introducing a periodic TLS
configuration checker that is started as part of the backend's
initialization.

Other fixes:
- only update the client's transport when the CA certificate pool has
  changed.

Co-authored-by: Tom Proctor <[email protected]>
benashz added a commit that referenced this pull request Jan 13, 2023
* Ensure a consistent TLS configuration for k8s API requests

Previously, it was possible for the http.Client's Transport to be
missing the necessary root CAs to ensure that all TLS connections
between the auth engine and the Kubernetes API were validated against a
configured set of CA certificates.

This fix ensures that the http.Client's Transport is always consistent
with the configured CA cert chain, by introducing a periodic TLS
configuration checker that is started as part of the backend's
initialization.

Other fixes:
- only update the client's transport when the CA certificate pool has
  changed.

Co-authored-by: Tom Proctor <[email protected]>
@benashz benashz added this to the v0.14.1 milestone Jan 13, 2023
@benashz benashz added the backport/vault-1.11.x backport-assistant label Jan 25, 2023
benashz added a commit that referenced this pull request Mar 23, 2023
* Ensure a consistent TLS configuration for k8s API requests

Previously, it was possible for the http.Client's Transport to be
missing the necessary root CAs to ensure that all TLS connections
between the auth engine and the Kubernetes API were validated against a
configured set of CA certificates.

This fix ensures that the http.Client's Transport is always consistent
with the configured CA cert chain, by introducing a periodic TLS
configuration checker that is started as part of the backend's
initialization.

Other fixes:
- only update the client's transport when the CA certificate pool has
  changed.

Co-authored-by: Tom Proctor <[email protected]>
benashz added a commit that referenced this pull request Mar 23, 2023
#190)

* Ensure a consistent TLS configuration (#173) (#178)

* Ensure a consistent TLS configuration for k8s API requests

Previously, it was possible for the http.Client's Transport to be
missing the necessary root CAs to ensure that all TLS connections
between the auth engine and the Kubernetes API were validated against a
configured set of CA certificates.

This fix ensures that the http.Client's Transport is always consistent
with the configured CA cert chain, by introducing a periodic TLS
configuration checker that is started as part of the backend's
initialization.

Other fixes:
- only update the client's transport when the CA certificate pool has
  changed.

Co-authored-by: Tom Proctor <[email protected]>

* Repo hygiene (#162)

* update go and k8s versions

go 1.19.1
k8s up to 1.25.0

* updated x/net and x/sys

go get golang.org/x/[email protected]
go get golang.org/x/[email protected]

* make fmt

making gofumpt happy

* update chart and vault version

Default to k8s 1.25.0
use chart 0.22.0
use vault 1.11.3

* Remove unused import add from cherry-pick

---------

Co-authored-by: Tom Proctor <[email protected]>
Co-authored-by: Theron Voran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/vault-1.11.x backport-assistant backport/vault-1.12.x backport assistant tag
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CA cert on local disk is not reloaded if changed TLS errors after failed plugin initialization
7 participants