Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sds: certificate hot-reload for xDS gRPC connection #10163

Merged
merged 3 commits into from
Apr 1, 2020

Conversation

tsaarni
Copy link
Member

@tsaarni tsaarni commented Feb 25, 2020

Create watches for secrets pointed by path based SDS resources and trigger
hot-reload when the files change. This allows rotation of TLS certificate
and key for the xDS gRPC connection without restart.

Signed-off-by: Tero Saarni [email protected]

Description:
This PR adds support for certificate rotation for xDS gRPC interface. The implementation approach has been discussed in #9359.

The PR consists of following parts:

  • Add capability to SDS to make filesystem subscriptions for the certificate and key files referred to from SDS resources.
  • Adds capability to inotify WatcherImpl to make subscriptions for directories instead of only files. The change is backwards compatible.

The latter is needed to support commonly used pattern of updating files with atomic directory swap. Watching individual files is not possible, since in this pattern the files are symbolic links which point to a directory. The target of the symbolic links is a directory, which is atomically renamed (replaced) at update, not the files. Kubernetes uses this approach for secret volume mount. The details are covered in: #9359 (comment) and by the test case introduced in this PR.

The feature introduced by this PR is needed by Contour projectcontour/contour#2143 but it can be beneficial to any user of Envoy.

Risk Level: Medium
Testing: unittest
Docs Changes: To be defined: add information about the feature somewhere in the documentation
Release Notes:

Fixes #9359

@tsaarni
Copy link
Member Author

tsaarni commented Feb 25, 2020

I have added the new behavior to only inotify file watcher for Linux, but should this be added to kqueue watcher too - I guess the test case I added may fail on MacOS?

Edit: yes it did. I'll arrange compilation environment on my macbook and fix it!

@tsaarni
Copy link
Member Author

tsaarni commented Mar 4, 2020

After adding kqueue watcher implementation for macos, a different test case failed. I cannot find direct link between my change and the new failure. I see it occasionally failing on my machine too, ranging from 1 failure out of 1000 runs, to 3 out of 1000:

bazel test --runs_per_test 1000 //test/extensions/common/redis:cluster_refresh_manager_test`
...
//test/extensions/common/redis:cluster_refresh_manager_test              FAILED in 3 out of 1000 in 2.6s

This happens on master branch as well.

@stale
Copy link

stale bot commented Mar 11, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 11, 2020
@tsaarni
Copy link
Member Author

tsaarni commented Mar 11, 2020

I'd appreciate review comments on this PR, thank you!

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I'm back from OOO.

source/common/secret/sds_api.h Outdated Show resolved Hide resolved
source/common/secret/sds_api.h Outdated Show resolved Hide resolved
source/common/secret/sds_api.h Outdated Show resolved Hide resolved
@stale
Copy link

stale bot commented Mar 23, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 23, 2020
@lizan
Copy link
Member

lizan commented Mar 23, 2020

Can you merge master? thanks

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Mar 23, 2020
Create watches for secrets pointed by path based SDS resources and trigger
hot-reload when the files change.  This allows rotation of TLS certificates
and key for the xDS gRPC connection without restart.

Signed-off-by: Tero Saarni <[email protected]>
@tsaarni
Copy link
Member Author

tsaarni commented Mar 24, 2020

I've rebased the PR.
There was some flakiness in test environment this time...

@stale
Copy link

stale bot commented Mar 31, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 31, 2020
@lizan
Copy link
Member

lizan commented Mar 31, 2020

Sorry for the delay, can you merge master again? not rebase but just merge.

@tsaarni
Copy link
Member Author

tsaarni commented Apr 1, 2020

I merged but got some new errors with the CI https://dev.azure.com/cncf/envoy/_build/results?buildId=36546&view=logs&j=3bf5ec37-bb6f-5fb6-2cf1-5134dc6106d6&t=41d571fe-3811-578d-07de-68468f467dae&l=1567

  1. variable generic_secret should be generic_secret_: this code existed there before
  2. sys/event.h file missing: this include is in macos specific code and should not exist on linux

@lizan lizan merged commit 5105d9d into envoyproxy:master Apr 1, 2020
@lizan
Copy link
Member

lizan commented Apr 1, 2020

@tsaarni I merged this ahead and will fix clang-tidy later. Thanks.

@mattklein123
Copy link
Member

@lizan @tsaarni this change should have a release note and documentation updates about the new behavior. @tsaarni can you please add some in a follow up? Thank you!

@tsaarni
Copy link
Member Author

tsaarni commented Apr 2, 2020

I propose to:

  1. Add new example in Security / Secret discovery service (SDS) to explain how to use SDS with filesystem based config resources to achieve hot-reload for xDS gRPC.
  2. Mention that filesystem subscriptions are also set up for referred certificate and key files in documentation of v2 core.ConfigSource and config.core.v3.ConfigSource

Though I'm bit unsure if the 2) is the best place in the API documentation for this information - at least those are the places where it already mentions about filesystem subscriptions in general.

I'm preparing documentation and will be submitting a PR soon for commenting.

@mattklein123
Copy link
Member

Thank you @tsaarni we can discuss the details in the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide support for certificate rotation for xDS connection in Envoy container images
3 participants