-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide support for certificate rotation for xDS connection in Envoy container images #9359
Comments
I was also thinking a bit about built-in way. It can be more elegant - no need to have parent process monitoring childs, no need to run two Envoys during draining period. Control plane connection is established rarely, so at simplest Envoy could maybe reload certificates unconditionally before connecting, in order to catch updates automatically without watching the files. But since Envoy gRPC client seems to share the TLS related code with data plane, maybe it is not feasible to have this simple approach without impacting performance of data plane? Furthermore, I saw that Istio uses external way (pilot-agent uses hot restart for control plane certificate reload). |
Is the root of the issue that the bootstrap has some certs that are necessary before we even are able to do SDS? If so, why not do SDS with a filesystem subscription, you can update this with a move operation. Envoy picks up the inotify event. |
Cool idea! I tried SDS with filesystem subscription but there are couple of problems:
Here is my cluster for the control plane using SDS config: static_resources:
clusters:
- name: control_plane
type: LOGICAL_DNS
connect_timeout: 1s
load_assignment:
cluster_name: control_plane
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: controlplane
port_value: 8080
http2_protocol_options: {}
transport_socket:
name: "envoy.transport_sockets.tls"
typed_config:
"@type": "type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext"
common_tls_context:
tls_certificate_sds_secret_configs:
sds_config:
path: /etc/envoy/tls-certificate.yaml
validation_context_sds_secret_config:
sds_config:
path: /etc/envoy/validation-context.yaml and here are the DiscoveryRequest files: /etc/envoy/tls-certificate.yaml: resources:
- "@type": "type.googleapis.com/envoy.api.v2.auth.Secret"
tls_certificate:
certificate_chain:
filename: /etc/envoy/envoy.pem
private_key:
filename: /etc/envoy/envoy-key.pem /etc/envoy/validation-context.yaml: resources:
- "@type": "type.googleapis.com/envoy.api.v2.auth.Secret"
validation_context:
trusted_ca:
filename: /etc/envoy/control-plane-root.pem To overcome problem 2, I could use Any ideas / recommendations more than welcome! |
@tsaarni in older versions of Envoy, we only responded to move inotify events. It looks like ba1ecbb#diff-fa160b55c5f1fd25e87dfd27cdb98646 added support for watching modification events though. For (2), yeah, inlining could work. I think you could also either write fresh files and update the reference in the |
Thanks for the tips! I did some experiments to find out more. @htuch: I think solving (1) is problematic due to the way how Kubernetes does file updates with symlinking (details follow below). I wonder would it be acceptable to somehow change Yet another approach to this issue would be to not solve this in Envoy binary, but instead develop a sidecar container that is able to watch Kubernetes Secret with certificate and key files, convert that into inlined Problem 1: SDS file subscription is never fired on KubernetesHere is how
[*] I see that the watch is always added for the directory according to this comment which is good since otherwise this would not work at all, or the watches would need to be re-added at every update cycle that Kubernetes does. I cannot figure out how to solve this so I temporarily just removed the condition diff --git a/source/common/filesystem/inotify/watcher_impl.cc b/source/common/filesystem/inotify/watcher_impl.cc
index 9b4c67372..7ad7f0096 100644
--- a/source/common/filesystem/inotify/watcher_impl.cc
+++ b/source/common/filesystem/inotify/watcher_impl.cc
@@ -84,10 +84,8 @@ void WatcherImpl::onInotifyEvent() {
}
for (FileWatch& watch : callback_map_[file_event->wd].watches_) {
- if (watch.file_ == file && (watch.events_ & events)) {
- ENVOY_LOG(debug, "matched callback: file: {}", file);
- watch.cb_(events);
- }
+ ENVOY_LOG(debug, "matched callback: file: {}", file);
+ watch.cb_(events);
}
index += sizeof(inotify_event) + file_event->len; This is of course not acceptable for general use case. But with this I got Kubernetes Secret updates trigger the callback logic and I could investigate the second problem. When Envoy is used with Contour, I believe file watches are not used for anything else - so there probably would not be any side effects. Problem 2: SDS only notices changes in
|
For (1), I think trying to figure out which inotify events k8s update actions are implying would be a a good start. I think it should be reasonable to have Envoy inotify watches support these, but it depends on the details. For (2), it looks like the issue we were hitting before is that SDS (and other APIs) don't refresh a resource if it appears to be identical on the wire. This is WAI, but it's clear that something is missing for SDS. Either we need to also have it consider file contents or have SDS take out an additional inotify watch on the local file. Either could work. |
The details of inotify events on K8s when kubelet updates a secret volume mount are following: Here I have mounted a secret volume on root@shell-7747b58c9f-nqpld:/# ls -laR /run/secrets/certs/
/run/secrets/certs/:
total 4
drwxrwxrwt 3 root root 140 Jan 17 10:23 .
drwxr-xr-x 4 root root 4096 Jan 17 10:24 ..
drwxr-xr-x 2 root root 100 Jan 17 10:23 ..2020_01_17_10_23_43.853725759
lrwxrwxrwx 1 root root 31 Jan 17 10:23 ..data -> ..2020_01_17_10_23_43.853725759
lrwxrwxrwx 1 root root 20 Jan 17 10:23 envoy-key.pem -> ..data/envoy-key.pem
lrwxrwxrwx 1 root root 16 Jan 17 10:23 envoy.pem -> ..data/envoy.pem
lrwxrwxrwx 1 root root 27 Jan 17 10:23 root-ca.pem -> ..data/root-ca.pem
/run/secrets/certs/..2020_01_17_10_23_43.853725759:
total 12
drwxr-xr-x 2 root root 100 Jan 17 10:23 .
drwxrwxrwt 3 root root 140 Jan 17 10:23 ..
-rw-r--r-- 1 root root 1675 Jan 17 10:23 envoy-key.pem
-rw-r--r-- 1 root root 1155 Jan 17 10:23 envoy.pem
-rw-r--r-- 1 root root 1050 Jan 17 10:23 root-ca.pem So the actual files such as Following happens when the secret content is updated:
Currently with Envoy the only event I get is MOVE_TO ..data in step 4. All the actual changes happen in the "timestamped" subdirectories. If I follow the symlink and watch for a file on that directory, I need to set up new inotify watches at every file update, since the directory will be deleted by Kubernetes at next update. So I experimented a little with the code and following assumption and code changes
Here (a) and (c) are more like a hack which happens to work. However this way I'm pretty sure I'm not breaking any existing functionality. And I got working certificate hot-reloading. I have these changes for viewing in my fork here: master...Nordix:issue-9359 |
Cool. I think (a) shouldn't have to be necessary. We should just have inotify watch on the filesystem subscription as one thing, and the inotify watch on the secrets as another. With that in mind, we should be able to just watch for the events on the Can we just refer to the resources by their |
Setting up separate inotify watch for the certificate and key files is a great idea! I should have thought of that myself :) Thanks! I added a new watch to SdsApi class which is used for the file resources master...Nordix:issue-9359 But the inotify target is still unresolved pain. It did not help to refer the files by root@envoy-57454b4598-2h7cw:/# inotifywait -m /certs/..data
Setting up watches.
Watches established.
/certs/..data OPEN,ISDIR
/certs/..data ACCESS,ISDIR
/certs/..data CLOSE_NOWRITE,CLOSE,ISDIR
/certs/..data OPEN envoy.pem
/certs/..data ACCESS envoy.pem
/certs/..data CLOSE_NOWRITE,CLOSE envoy.pem
/certs/..data OPEN,ISDIR
/certs/..data ACCESS,ISDIR
/certs/..data DELETE internal-root-ca.pem
/certs/..data DELETE envoy.pem
/certs/..data DELETE envoy-key.pem
/certs/..data CLOSE_NOWRITE,CLOSE,ISDIR
/certs/..data DELETE_SELF Inotify behavior is that watch itself gets removed when the target being watched is deleted. One would need to re-arm inotify at every change to work around this type of use case. The symlink & subdirectory swap trick aims to atomically update a bunch of files. When this trick is used, inotify watches at file level just stop working. User should know if they use this trick on their system. So I wonder: would it be acceptable to have some kind of switch to change the behavior of |
@htuch I'm very grateful for all your guidance so far! I understand this takes time and effort. I've prepared some more material with few alternative proposals. This is about the problem with inotify watches I'm currently blocked with. I'm sorry that this inevitably seems to get really complicated and long discussion. Visualization of inotify events during secret updateInitial state: during pod startup, kubelet has created a ramdisk volume and set up an initial directory structure that looks like this: /secret-mountpoint/file1 # symbolic link to ..data/file1
/secret-mountpoint/file2 # symbolic link to ..data/file2
/secret-mountpoint/..data # symbolic link to ..timestamp-1
/secret-mountpoint/..timestamp-1 # directory
/secret-mountpoint/..timestamp-1/file1 # initial version of file1
/secret-mountpoint/..timestamp-1/file2 # initial version of file2 The purpose of this layout is to serve atomic rename - it allows kubelet to prepare a number of files under Update procedure is shown in the figure below. The callouts depict the inotify events emitted by each path component if we had inotify watches added to each of them. ❗️ After the update, the directory looks like this: /secret-mountpoint/file1 # symbolic link to ..data/file1
/secret-mountpoint/file2 # symbolic link to ..data/file2
/secret-mountpoint/..data # symbolic link to ..timestamp-2
/secret-mountpoint/..timestamp-2 # new directory
/secret-mountpoint/..timestamp-2/file1 # new version of file1
/secret-mountpoint/..timestamp-2/file2 # new version of file2 ProposalsHere are few potential approaches to the problem. With the information from my experiments so far, I'd prefer A or B. Proposal A: use watches only as information - some files may have changedWatch for any Known problems:
Proposal B: use timer based pollingDo not use inotify events for secrets at all, but instead implement new timer based polling. Compare files by their hash to find out if they have changed. Known problems:
Proposal C: add watches to leaf pathsThis proposal discusses alternative where watch is added for paths like Known problems:
A variation of this approach would be to watch Known problems:
It is not clear to me if anything could be achieved by watching leafs in case of Kubernetes. |
I wonder if there is any opinions about the above? I would be very much interested in implementing support for certificate rotation. |
@tsaarni from a quick read IMO we should do some variant of A probably, but I'm just skimming this. At a high level I think we should just make the file watcher do what it needs to do to support the K8s use case. It's possible that this new behavior could be opt-in only for watchers that need it (maybe just for SDS for example, not sure). |
Hey, Also to avoid symlinks for mounts we could simply specify exact filepath in mountPath:
|
@povils That is correct, only support for certificate rotation is missing. |
Thanks @tsaarni ! So what is an advantage of using sds config in this case instead of just static resources? Also do we have any other envoy built-in solution to support cert rotation for mounted kubernetes secrets? |
The advantage is that SDS already has the capability to hot-reload certificates and keys without impacting ongoing TLS sessions. Currently this capability cannot be used for path based SDS resources since there is no trigger to reload configuration when files change. This feature is currently being added in #10163.
There is one: Envoy's hot-restart feature. In this alternative a second instance of Envoy is started, it loads the new configuration, including the new certificates. The listening server sockets are passed from old instance to new without ever closing them, and existing client sessions are drained before closing the old instance. |
Create watches for secrets pointed by path based SDS resources and trigger hot-reload when the files change. This allows rotation of TLS certificates and key for the xDS gRPC connection without restart. Risk Level: Medium Testing: unittest Docs Changes: To be defined: add information about the feature somewhere in the documentation Release Notes: Fixes #9359 Signed-off-by: Tero Saarni <[email protected]>
Problem description:
The official docker container is used by e.g. Contour ingress controller.
While Contour is able to replace certificates for user plane by using SDS, it is not currently possible to rotate control plane certificates (xDS gRPC interface) without traffic interruption.
Proposal for a new feature:
Include a new binary in official Envoy container images at https://hub.docker.com/r/envoyproxy/. This new binary can be used by deployments to
envoy
inotify can be used to watch the file updates, which works with e.g. Kubernetes secret volume mounts.
Implementation alternatives:
With your guidance, I'd be interested in implementing the above feature and submitting PR.
The default entrypoint can still remain like it is currently. The new binary could be "opt-in" for deployments that require xDS with TLS and hot-reload.
One implementation alternative would be to extend hot-restarter.py with inotify, but alternatively new version could be implemented with C++ in order to avoid bringing Python as dependency into all Envoy images.
The text was updated successfully, but these errors were encountered: