Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support for certificate rotation for xDS connection in Envoy container images #9359

Closed
tsaarni opened this issue Dec 16, 2019 · 17 comments · Fixed by #10163
Closed
Labels
area/xds design proposal Needs design doc/proposal before implementation help wanted Needs help!

Comments

@tsaarni
Copy link
Member

tsaarni commented Dec 16, 2019

Problem description:

The official docker container is used by e.g. Contour ingress controller.

While Contour is able to replace certificates for user plane by using SDS, it is not currently possible to rotate control plane certificates (xDS gRPC interface) without traffic interruption.

Proposal for a new feature:

Include a new binary in official Envoy container images at https://hub.docker.com/r/envoyproxy/. This new binary can be used by deployments to

  1. acting as pid 1 in the container and forking & execing envoy
  2. triggering Envoy's hot restart procedure when certificate or key files are updated

inotify can be used to watch the file updates, which works with e.g. Kubernetes secret volume mounts.

Implementation alternatives:

With your guidance, I'd be interested in implementing the above feature and submitting PR.

The default entrypoint can still remain like it is currently. The new binary could be "opt-in" for deployments that require xDS with TLS and hot-reload.

One implementation alternative would be to extend hot-restarter.py with inotify, but alternatively new version could be implemented with C++ in order to avoid bringing Python as dependency into all Envoy images.

@tsaarni tsaarni changed the title Provide support for xDS certificate rotation in Envoy container images Provide support for certificate rotation for xDS connection in Envoy container images Dec 16, 2019
@mattklein123 mattklein123 added the design proposal Needs design doc/proposal before implementation label Dec 19, 2019
@mattklein123
Copy link
Member

@htuch @lizan any thoughts on ^? I understand why this is being requested, but I also wonder if there should be some built-in way for Envoy to rotate control plane certs?

@tsaarni
Copy link
Member Author

tsaarni commented Dec 19, 2019

I was also thinking a bit about built-in way. It can be more elegant - no need to have parent process monitoring childs, no need to run two Envoys during draining period.

Control plane connection is established rarely, so at simplest Envoy could maybe reload certificates unconditionally before connecting, in order to catch updates automatically without watching the files. But since Envoy gRPC client seems to share the TLS related code with data plane, maybe it is not feasible to have this simple approach without impacting performance of data plane?

Furthermore, I saw that Istio uses external way (pilot-agent uses hot restart for control plane certificate reload).

@htuch
Copy link
Member

htuch commented Dec 19, 2019

Is the root of the issue that the bootstrap has some certs that are necessary before we even are able to do SDS? If so, why not do SDS with a filesystem subscription, you can update this with a move operation. Envoy picks up the inotify event.

@tsaarni
Copy link
Member Author

tsaarni commented Dec 20, 2019

Cool idea! I tried SDS with filesystem subscription but there are couple of problems:

  1. When Kubernetes updates the certificate files, the only inotify event which fires is for a hidden directory ..data. There is no events for the individual updated certificate files since Kubernetes has set up them as symbolic links that do not change. Since ..data does not match with any of the file watches, WatcherImpl::onInotifyEvent() will just ignore the event and changes will go unnoticed.
  2. How can I do SDS subscribe for certificate and key files? I could only configure watches the DiscoveryRequests with Secrets which point to certificate files, but not the certificate files themselves

Here is my cluster for the control plane using SDS config:

static_resources:
  clusters:
  - name: control_plane
    type: LOGICAL_DNS
    connect_timeout: 1s
    load_assignment:
      cluster_name: control_plane
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: controlplane
                port_value: 8080
    http2_protocol_options: {}
    transport_socket:
      name: "envoy.transport_sockets.tls"
      typed_config:
        "@type": "type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext"
        common_tls_context:
          tls_certificate_sds_secret_configs:
            sds_config:
              path: /etc/envoy/tls-certificate.yaml
          validation_context_sds_secret_config:
            sds_config:
              path: /etc/envoy/validation-context.yaml

and here are the DiscoveryRequest files:

/etc/envoy/tls-certificate.yaml:

resources:
  - "@type": "type.googleapis.com/envoy.api.v2.auth.Secret"
    tls_certificate:
      certificate_chain:
        filename: /etc/envoy/envoy.pem
      private_key:
        filename: /etc/envoy/envoy-key.pem

/etc/envoy/validation-context.yaml:

resources:
  - "@type": "type.googleapis.com/envoy.api.v2.auth.Secret"
    validation_context:
      trusted_ca:
        filename: /etc/envoy/control-plane-root.pem

To overcome problem 2, I could use inline_string instead of filename to inline the certificates and key into the DiscoveryRequest files. But I think user should only configure pem files so then I should run some software to actively monitor updates and convert from pem files to DiscoveryRequest.

Any ideas / recommendations more than welcome!

@htuch
Copy link
Member

htuch commented Dec 22, 2019

@tsaarni in older versions of Envoy, we only responded to move inotify events. It looks like ba1ecbb#diff-fa160b55c5f1fd25e87dfd27cdb98646 added support for watching modification events though.

For (2), yeah, inlining could work. I think you could also either write fresh files and update the reference in the DiscoveryResponse file, or maybe SDS should check for modification on each update. I don't know what the behavior is here, you'll have to ascertain this experimentally or via code audit.

@tsaarni
Copy link
Member Author

tsaarni commented Dec 23, 2019

Thanks for the tips! I did some experiments to find out more.

@htuch: I think solving (1) is problematic due to the way how Kubernetes does file updates with symlinking (details follow below). I wonder would it be acceptable to somehow change WatcherImpl::onInotifyEvent() to make it would work with Kubernetes update logic?

Yet another approach to this issue would be to not solve this in Envoy binary, but instead develop a sidecar container that is able to watch Kubernetes Secret with certificate and key files, convert that into inlined DiscoveryResponse and finally write those on a shared volume. This should work with WatcherImpl and SdsApi as they are today - though it would be nice to offer a built-in solution: the problem is not unique and developing adapter/operator logic like that is complicated too.

Problem 1: SDS file subscription is never fired on Kubernetes

Here is how WatcherImpl interacts with Kubernetes Secret update

  1. Filesystem subscription is added for e.g. /etc/envoy/tls-certificate.yaml so WatcherImpl::addWatch() adds inotify watch for the directory /etc/envoy [*]
  2. Kubernetes has created the watched file as symlink tls-certificate.yaml -> ..data/tls-certificate.yaml. This symlink remains untouched by any update.
  3. The directory is also symlink e.g. ..data -> ..2019_12_20_15_00_12.489612214
  4. When file update happens, Kubernetes will create completely new directory with new copy of the file, e.g. ..2019_12_23_10_01_42.912205642/tls-certificate.yaml and it will update the symlink ..data to point to this new directory ..data -> ..2019_12_23_10_01_42.912205642
  5. WatcherImpl::onInotifyEvent() gets inotify event for ..data. It compares watched filename with event filename tls-certificate.yaml != ..data. Event is ignored.

[*] I see that the watch is always added for the directory according to this comment which is good since otherwise this would not work at all, or the watches would need to be re-added at every update cycle that Kubernetes does.

I cannot figure out how to solve this so I temporarily just removed the condition

diff --git a/source/common/filesystem/inotify/watcher_impl.cc b/source/common/filesystem/inotify/watcher_impl.cc
index 9b4c67372..7ad7f0096 100644
--- a/source/common/filesystem/inotify/watcher_impl.cc
+++ b/source/common/filesystem/inotify/watcher_impl.cc
@@ -84,10 +84,8 @@ void WatcherImpl::onInotifyEvent() {
       }

       for (FileWatch& watch : callback_map_[file_event->wd].watches_) {
-        if (watch.file_ == file && (watch.events_ & events)) {
-          ENVOY_LOG(debug, "matched callback: file: {}", file);
-          watch.cb_(events);
-        }
+        ENVOY_LOG(debug, "matched callback: file: {}", file);
+        watch.cb_(events);
       }

       index += sizeof(inotify_event) + file_event->len;

This is of course not acceptable for general use case. But with this I got Kubernetes Secret updates trigger the callback logic and I could investigate the second problem. When Envoy is used with Contour, I believe file watches are not used for anything else - so there probably would not be any side effects.

Problem 2: SDS only notices changes in DiscoveryResponse and ignores changes in files referred from it

I did following change in SdsApi::onConfigUpdate() to check if DataSource fields refer to files. If yes, I'll run the update callbacks uncoditionally

diff --git a/source/common/secret/sds_api.cc b/source/common/secret/sds_api.cc
index 4514f5ce0..156898fb3 100644
--- a/source/common/secret/sds_api.cc
+++ b/source/common/secret/sds_api.cc
@@ -38,8 +38,30 @@ void SdsApi::onConfigUpdate(const Protobuf::RepeatedPtrField<ProtobufWkt::Any>&
         fmt::format("Unexpected SDS secret (expecting {}): {}", sds_config_name_, secret.name()));
   }

+  bool secret_refers_to_local_file = false;
+
+  switch (secret.type_case()) {
+  case envoy::api::v2::auth::Secret::TypeCase::kTlsCertificate: {
+    auto tls_certificate = secret.tls_certificate();
+    if ((tls_certificate.has_certificate_chain() && tls_certificate.certificate_chain().specifier_case() == envoy::api::v2::core::DataSource::kFilename) ||
+        (tls_certificate.has_private_key() && tls_certificate.private_key().specifier_case() == envoy::api::v2::core::DataSource::kFilename)) {
+      secret_refers_to_local_file = true;
+    }
+    break;
+  }
+  case envoy::api::v2::auth::Secret::TypeCase::kValidationContext: {
+    auto validation_context = secret.validation_context();
+    if (validation_context.has_trusted_ca() && validation_context.trusted_ca().specifier_case() == envoy::api::v2::core::DataSource::kFilename) {
+      secret_refers_to_local_file = true;
+    }
+    break;
+  }
+  default:
+    break;
+  }
+
   const uint64_t new_hash = MessageUtil::hash(secret);
-  if (new_hash != secret_hash_) {
+  if ((new_hash != secret_hash_) || secret_refers_to_local_file) {
     validateConfig(secret);
     secret_hash_ = new_hash;
     setSecret(secret);

This worked: the certificates and key were now reloaded.

Alternatively I could calculate a hash of the certificate and key files instead of the hash of the auth.Secret message itself, which is currently used as a filter to remove redundant callbacks.

I also considered adding file watches for the referred certificate and key files, but the content of auth.Secret was not loaded at the time subscriptions are set up in SdsApi::initialize(). I guess that is solvable, but I think problem (1) is a blocker. Not even inlining will not work due to that.

By the way, I can convert this into PR if wanted? Though I'm very much in exploration phase at this point...

@htuch
Copy link
Member

htuch commented Dec 26, 2019

For (1), I think trying to figure out which inotify events k8s update actions are implying would be a a good start. I think it should be reasonable to have Envoy inotify watches support these, but it depends on the details.

For (2), it looks like the issue we were hitting before is that SDS (and other APIs) don't refresh a resource if it appears to be identical on the wire. This is WAI, but it's clear that something is missing for SDS. Either we need to also have it consider file contents or have SDS take out an additional inotify watch on the local file. Either could work.

@tsaarni
Copy link
Member Author

tsaarni commented Jan 21, 2020

The details of inotify events on K8s when kubelet updates a secret volume mount are following:

Here I have mounted a secret volume on /run/secrets/certs/. Kubernetes sets up the contents of the directory like this:

root@shell-7747b58c9f-nqpld:/# ls -laR /run/secrets/certs/
/run/secrets/certs/:
total 4
drwxrwxrwt 3 root root  140 Jan 17 10:23 .
drwxr-xr-x 4 root root 4096 Jan 17 10:24 ..
drwxr-xr-x 2 root root  100 Jan 17 10:23 ..2020_01_17_10_23_43.853725759
lrwxrwxrwx 1 root root   31 Jan 17 10:23 ..data -> ..2020_01_17_10_23_43.853725759
lrwxrwxrwx 1 root root   20 Jan 17 10:23 envoy-key.pem -> ..data/envoy-key.pem
lrwxrwxrwx 1 root root   16 Jan 17 10:23 envoy.pem -> ..data/envoy.pem
lrwxrwxrwx 1 root root   27 Jan 17 10:23 root-ca.pem -> ..data/root-ca.pem

/run/secrets/certs/..2020_01_17_10_23_43.853725759:
total 12
drwxr-xr-x 2 root root  100 Jan 17 10:23 .
drwxrwxrwt 3 root root  140 Jan 17 10:23 ..
-rw-r--r-- 1 root root 1675 Jan 17 10:23 envoy-key.pem
-rw-r--r-- 1 root root 1155 Jan 17 10:23 envoy.pem
-rw-r--r-- 1 root root 1050 Jan 17 10:23 root-ca.pem

So the actual files such as /run/secrets/certs/envoy.pem is a symbolic link to /run/secrets/certs/..data/envoy.pem. Note that /run/secrets/certs/..data/ itself is a symbolic link to a directory with a timestamp in its name: /run/secrets/certs/..2020_01_17_10_23_43.853725759

Following happens when the secret content is updated:

  1. Kubernetes creates new "timestamped" directory ..2020_01_17_10_28_07.657980562. CREATE inotify even can be observed for this.
  2. Kubernetes writes new certificate and key files under this new directory.
  3. Kubernetes creates new symlink ..data_tmp pointing to the new directory created in step 1. CREATE inotify event can be observed for this too.
  4. Kubernetes moves ..data_tmp to ..data. Following inotify events can be observed: MOVE_FROM ..data_tmp and MOVE_TO ..data
  5. Kubernetes deletes files from old "timestamped" directory and the directory itself. DELETE events can be observed.

Currently with Envoy the only event I get is MOVE_TO ..data in step 4.

All the actual changes happen in the "timestamped" subdirectories. If I follow the symlink and watch for a file on that directory, I need to set up new inotify watches at every file update, since the directory will be deleted by Kubernetes at next update.

So I experimented a little with the code and following assumption and code changes

  1. Envoy adds MOVE_TO watch for the directory where DiscoveryResponse file is. If we store also the certificates on that same directory I will get notifications "for free" without needing to change the filesystem subscription logic.
  2. When calculating hash which guards against unnecessary config updates, I now consider also the file contents of the referred files
  3. In WatcherImpl::onInotifyEvent() I skip the filename check when the inotify descriptor has only single callback registered. I assume it must be a match since it is the only watch registered for that descriptor (as described above, the actual filename from inotify is always ..data which will never match otherwise)

Here (a) and (c) are more like a hack which happens to work. However this way I'm pretty sure I'm not breaking any existing functionality. And I got working certificate hot-reloading.

I have these changes for viewing in my fork here: master...Nordix:issue-9359

@htuch
Copy link
Member

htuch commented Jan 21, 2020

Cool. I think (a) shouldn't have to be necessary. We should just have inotify watch on the filesystem subscription as one thing, and the inotify watch on the secrets as another.

With that in mind, we should be able to just watch for the events on the data dir. Ideally there is no need to modify the filesystem subscription file for that to take effect, since the filenames are always the same.

Can we just refer to the resources by their ..data path?

@tsaarni
Copy link
Member Author

tsaarni commented Jan 24, 2020

Setting up separate inotify watch for the certificate and key files is a great idea! I should have thought of that myself :) Thanks!

I added a new watch to SdsApi class which is used for the file resources master...Nordix:issue-9359

But the inotify target is still unresolved pain. It did not help to refer the files by ..data path. The reason is that the target is deleted at every update as part of atomic directory swap.

root@envoy-57454b4598-2h7cw:/# inotifywait -m /certs/..data
Setting up watches.
Watches established.
/certs/..data OPEN,ISDIR
/certs/..data ACCESS,ISDIR
/certs/..data CLOSE_NOWRITE,CLOSE,ISDIR
/certs/..data OPEN envoy.pem
/certs/..data ACCESS envoy.pem
/certs/..data CLOSE_NOWRITE,CLOSE envoy.pem
/certs/..data OPEN,ISDIR
/certs/..data ACCESS,ISDIR
/certs/..data DELETE internal-root-ca.pem
/certs/..data DELETE envoy.pem
/certs/..data DELETE envoy-key.pem
/certs/..data CLOSE_NOWRITE,CLOSE,ISDIR
/certs/..data DELETE_SELF

Inotify behavior is that watch itself gets removed when the target being watched is deleted. One would need to re-arm inotify at every change to work around this type of use case.

The symlink & subdirectory swap trick aims to atomically update a bunch of files. When this trick is used, inotify watches at file level just stop working.

User should know if they use this trick on their system. So I wonder: would it be acceptable to have some kind of switch to change the behavior of WatcherImpl::onInotifyEvent() to support this? Since WatcherImpl for inotify already watches directories, it would just mean making an if statement optional.

@tsaarni
Copy link
Member Author

tsaarni commented Jan 28, 2020

@htuch I'm very grateful for all your guidance so far! I understand this takes time and effort.

I've prepared some more material with few alternative proposals. This is about the problem with inotify watches I'm currently blocked with. I'm sorry that this inevitably seems to get really complicated and long discussion.

Visualization of inotify events during secret update

Initial state: during pod startup, kubelet has created a ramdisk volume and set up an initial directory structure that looks like this:

/secret-mountpoint/file1                # symbolic link to ..data/file1
/secret-mountpoint/file2                # symbolic link to ..data/file2
/secret-mountpoint/..data               # symbolic link to ..timestamp-1
/secret-mountpoint/..timestamp-1        # directory
/secret-mountpoint/..timestamp-1/file1  # initial version of file1
/secret-mountpoint/..timestamp-1/file2  # initial version of file2

The purpose of this layout is to serve atomic rename - it allows kubelet to prepare a number of files under ..timestampX and release them into use by single rename operation.

Update procedure is shown in the figure below. The callouts depict the inotify events emitted by each path component if we had inotify watches added to each of them.

atomic-directory-rename

❗️ DELETE_SELF events are "terminal": inotify automatically deletes the watch when the watched file is deleted.

After the update, the directory looks like this:

/secret-mountpoint/file1                # symbolic link to ..data/file1
/secret-mountpoint/file2                # symbolic link to ..data/file2
/secret-mountpoint/..data               # symbolic link to ..timestamp-2
/secret-mountpoint/..timestamp-2        # new directory
/secret-mountpoint/..timestamp-2/file1  # new version of file1
/secret-mountpoint/..timestamp-2/file2  # new version of file2

Proposals

Here are few potential approaches to the problem. With the information from my experiments so far, I'd prefer A or B.

Proposal A: use watches only as information - some files may have changed

Watch for any MOVED_TO events emitted from /secret-mountpoint and fire all callbacks regardless of filename. Upper layer logic to filter out spurious events by comparing hash of the file content. This seems to be somewhat already done, since e.g. SdsApi does calculate hash to filter out spurious events.

Known problems:

  1. All users of WatcherImpl (also indirectly through FilesystemSubscriptionImpl) need to implement file content hashing. I'm not sure if all do already, but I can investigate.

Proposal B: use timer based polling

Do not use inotify events for secrets at all, but instead implement new timer based polling. Compare files by their hash to find out if they have changed.

Known problems:

  1. Additional filesystem access is required for every couple of seconds.

Proposal C: add watches to leaf paths

This proposal discusses alternative where watch is added for paths like /secret-mountpoint/..timestamp2. I've explored with this but bumped into several problems, therefore I do not recommend this proposal.

Known problems:

  1. It will still not be possible to receive MOVED_TO events for files from watch on /secret-mountpoint/..timestampX: kubelet writes the files in-place during update.
  2. Kubernetes signals the consistent state of files by atomic move /secret-mountpoint/..data_tmp -> /secret-mountpoint/..data. Watching the temporary /secret-mountpoint/..timestampX leaf path cannot give information about the consistency of the files.

A variation of this approach would be to watch /secret-mountpoint/..data. By doing this it will be possible to receive DELETE_SELF event when atomic rename is done. The event is sent for /secret-mountpoint/..data and not for the files, so it would not help the primary problem of WatcherImpl expecting to get MOVE_TO events for watched files. Additionally it is necessary to add inotify flag IN_DONT_FOLLOW (Don't dereference pathname if it is a symbolic link), otherwise it will be the same as watching /secret-mountpoint/..timestampX.

Known problems:

  1. It will still not be possible to receive MOVED_TO events for files
  2. Adding the inotify flag may change behavior of existing use of watches.
  3. As far as I know ..data is implementation detail in Kubernetes and not part of public API. In theory the path could change at Kubernetes upgrade.

It is not clear to me if anything could be achieved by watching leafs in case of Kubernetes.

@tsaarni
Copy link
Member Author

tsaarni commented Feb 17, 2020

I wonder if there is any opinions about the above? I would be very much interested in implementing support for certificate rotation.

@mattklein123
Copy link
Member

@tsaarni from a quick read IMO we should do some variant of A probably, but I'm just skimming this. At a high level I think we should just make the file watcher do what it needs to do to support the K8s use case. It's possible that this new behavior could be opt-in only for watchers that need it (maybe just for SDS for example, not sure).

@povils
Copy link

povils commented Mar 31, 2020

Hey,
So just to clarify - SDS config for TLS with Kubernetes secrets are working but just without rotation?

Also to avoid symlinks for mounts we could simply specify exact filepath in mountPath:

kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: mypod
    image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: foo
      mountPath: "/certs/envoy.pem"
      subPath: envoy.pem
  volumes:
  - name: foo
    secret:
      secretName: mysecret

@tsaarni
Copy link
Member Author

tsaarni commented Mar 31, 2020

@povils That is correct, only support for certificate rotation is missing.

@povils
Copy link

povils commented Mar 31, 2020

Thanks @tsaarni !

So what is an advantage of using sds config in this case instead of just static resources?

Also do we have any other envoy built-in solution to support cert rotation for mounted kubernetes secrets?

@tsaarni
Copy link
Member Author

tsaarni commented Mar 31, 2020

So what is an advantage of using sds config in this case instead of just static resources?

The advantage is that SDS already has the capability to hot-reload certificates and keys without impacting ongoing TLS sessions. Currently this capability cannot be used for path based SDS resources since there is no trigger to reload configuration when files change. This feature is currently being added in #10163.

Also do we have any other envoy built-in solution to support cert rotation for mounted kubernetes secrets?

There is one: Envoy's hot-restart feature. In this alternative a second instance of Envoy is started, it loads the new configuration, including the new certificates. The listening server sockets are passed from old instance to new without ever closing them, and existing client sessions are drained before closing the old instance.

lizan pushed a commit that referenced this issue Apr 1, 2020
Create watches for secrets pointed by path based SDS resources and trigger
hot-reload when the files change.  This allows rotation of TLS certificates
and key for the xDS gRPC connection without restart.

Risk Level: Medium
Testing: unittest
Docs Changes: To be defined: add information about the feature somewhere in the documentation
Release Notes:

Fixes #9359

Signed-off-by: Tero Saarni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/xds design proposal Needs design doc/proposal before implementation help wanted Needs help!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants