Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of SDS #4176

Closed
wants to merge 20 commits into from
Closed

Conversation

JimmyCYJ
Copy link
Member

@JimmyCYJ JimmyCYJ commented Aug 16, 2018

Description: Implement SDS api that fetches secrets from remote SDS server. Secrets are stored in Secret Provider. Listeners and Clusters are updated when secrets are received.
Risk Level: Low
Testing: Unit tests and integration tests
Fixes #1194

Signed-off-by: Jimmy Chen [email protected]

@JimmyCYJ
Copy link
Member Author

cc @lizan @qiwzhang @PiotrSikora

@lizan lizan requested a review from PiotrSikora August 16, 2018 00:20
@lizan lizan self-assigned this Aug 16, 2018
@@ -26,6 +26,7 @@
#include "envoy/upstream/upstream.h"

namespace Envoy {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revert

Secret::TlsCertificateConfigProviderSharedPtr
getTlsCertificateConfigProvider(const envoy::api::v2::auth::CommonTlsContext& config,
Secret::SecretManager& secret_manager) {
Secret::TlsCertificateConfigProviderSharedPtr ContextConfigImpl::getTlsCertificateConfigProvider(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this need to be a non static private function? I don't think it uses any member variable?

* Add secret callback into context config.
* @param callback callback that is executed by context config.
*/
virtual void setUpdateCallback(Secret::SecretCallbacks& callback) PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setSecretUpdateCallback

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

/* rest_legacy_constructor */ nullptr,
"envoy.service.discovery.v2.SecretDiscoveryService.FetchSecrets",
"envoy.service.discovery.v2.SecretDiscoveryService.StreamSecrets");
Config::Utility::checkLocalInfo("sds", local_info_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move subcription_ creation into the constructor so we don't need to store so many object references.

Only call start in the initialize() function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot move subcription_ into constructor. Envoy::Config::SubscriptionFactory::subscriptionFromConfigSourceenvoy::api::v2::auth::Secre will access members of those objects, which are not ready when SdsApi constructor is called. That causes segmentation fault in test runs.

if (!secret_provider) {
ASSERT(secret_provider_context.initManager() != nullptr);

std::function<void()> unregister_secret_provider = [map_key, config_name, sds_config_source,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for easy to read, maybe create a member function removeDynamicSecretProvider() function, can the lambda function just call it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@@ -18,6 +19,13 @@ static const std::string INLINE_STRING = "<inline>";

class ContextConfigImpl : public virtual Ssl::ContextConfig {
public:
~ContextConfigImpl() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it .cc file

return !tls_certficate_provider_ || tls_certficate_provider_->secret();
}

void setUpdateCallback(Secret::SecretCallbacks& callback) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it to .cc file

tls_certficate_provider_->removeUpdateCallback(*secret_callback_);
}
secret_callback_ = &callback;
if (tls_certficate_provider_.get() != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only save the callback when provider is not null.

You can do

if (secret_callback_) {
if (secret_callback_) {
secret_callback_->remove
}
secret_callback_ = &callback;
secret_callback_->add
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -17,6 +17,38 @@ using Envoy::Network::PostIoAction;
namespace Envoy {
namespace Ssl {

namespace {

class NotReadySslSocket : public Network::TransportSocket, public Connection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment for this class
// This SslSocket will be used when SSL secret is not fetched from SDS server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ENVOY_LOG(debug, "Unregister secret provider. hash key: {}", map_key);
auto secret_provider = dynamic_secret_providers_.find(map_key);
if (secret_provider != dynamic_secret_providers_.end()) {
dynamic_secret_providers_.erase(map_key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call erase() directly, then check the return value. If 0, log an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice advice. Thanks!

sds_config_source.DebugString());
}
std::function<void()> unregister_secret_provider = [map_key, this]() {
this->removeDynamicSecretProvider(map_key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may not need this variable, just use the lambda directly in the function argument. such

xxx(....,
[map_key, this] () { removeDynamicSecretProvider(map_key); }
);

@lizan
Copy link
Member

lizan commented Aug 16, 2018

Can you merge master and run clang-format on 6.0 per #4168?

@JimmyCYJ
Copy link
Member Author

@lizan I have run clang-format on 6.0

@@ -0,0 +1,22 @@
#pragma once

#include <memory>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unused include

@@ -0,0 +1,21 @@
#pragma once

#include <string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string is not used either...

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work here and extra thanks for the integration tests. It gives me good confidence this change works. Nice! I have some high level comments to get started with. FYI, @htuch is going to take over senior maintainer review on this since I am out for a month starting tomorrow. Thank you!


namespace Server {
namespace Configuration {
class TransportSocketFactoryContext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would probably be better to just move this interface into the secret or SSL/TLS namespace but not a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will keep this.


namespace Envoy {
namespace Secret {
class SecretCallbacks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the forward declare here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this forward declaration and included secret_callbacks.h.

@@ -95,6 +98,17 @@ class ContextConfig {
* @return The maximum TLS protocol version to negotiate.
*/
virtual unsigned maxProtocolVersion() const PURE;

/**
* @return true if the ssl config is ready.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more information about what ready means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is updated.


/**
* Add secret callback into context config.
* @param callback callback that is executed by context config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more information about when callbacks will be invoked? It's not clear at the interface level why/when this happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More information is added into comment. Thanks.

* Pass an init manager to register dynamic secret provider.
* @param init_manager instance of init manager.
*/
virtual void setInitManager(Init::Manager& init_manager) PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not optimal that this interface has a setter and a getter for the init manager. Is there any way to simplify this so that the init manager is known ahead of time and we only need a getter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the init manager should exist when we create TransportSocketFactoryContext, so that we only need a getter. But we create TransportSocketFactoryContext at ClusterManagerFactory and then create init manager per cluster, we have to add it into factory context. @qiwzhang proposed #3831, I think once we are working on that, we can get rid of this setter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the init manager then be part of the factory context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, init manager is per cluster. At the time of the construction of TransportSocketFactoryContext, per cluster init manager is not available. We have to set it when init manager is available.

@@ -17,6 +17,38 @@ using Envoy::Network::PostIoAction;
namespace Envoy {
namespace Ssl {

namespace {

// This SslSocket will be used when SSL secret is not fetched from SDS server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? What causes us to require an instantiated connection? Can't we cause the control flow to return in such a way that there is no socket and we just fail whatever we are doing?

Copy link
Contributor

@qiwzhang qiwzhang Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our first attempt was to return nullptr in socket creation when ssl_context is not ready (SDS client failed to get secret). This works for ListenerImpl, but did not work for ClusterImpl. After looked at the ClusterImpl codes, most of them do not expect nullptr socket nor nullptr connection. It will require a lot of code changes to make these code to handle null connection.

For less code change, we decided to return such dummy socket which would just reset the connect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion on whether this is the best course of action or not. @htuch hopefully can have a look. In general I would prefer that we don't do this but if it's the best way I can accept that.

}

bool ClientSslSocketFactory::implementsSecureTransport() const { return true; }

void ClientSslSocketFactory::onAddOrUpdateSecret() {
ENVOY_LOG(debug, "Secret is updated.");
ssl_ctx_ = manager_.createSslClientContext(stats_scope_, *config_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty positive this needs locking. I think you will be accessing ssl_ctx_ across threads to make transport sockets. Thus, you will need a R/W lock here and ultimately should move to using TLS but a R/W lock is OK for now? I think at a high level having a locking/threading analysis of this change would be useful. Can you add that to the description somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Thanks! Will add R/W lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a lock to protect read/write to ssl_ctx_, and added comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you will need a lock beyond what shared_ptr already provide, it seems unnecessary. Though you might want to have a local variable of shared_ptr during create socket, so the access to the member variable is always atomic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lizan Thanks for pointing this out. Yes the lock is not necessary as we already use shared_ptr. They are removed.

}

bool ServerSslSocketFactory::implementsSecureTransport() const { return true; }

void ServerSslSocketFactory::onAddOrUpdateSecret() {
ENVOY_LOG(debug, "Secret is updated.");
ssl_ctx_ = manager_.createSslServerContext(stats_scope_, *config_, server_names_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking. Is there a way to share this code in a base class somehow? Would prefer to not implement a bunch of this twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock is added, thanks! ServerSslSocketFactory::onAddOrUpdateSecret() and ClientSslSocketFactory::onAddOrUpdateSecret() are different, one owns ServerContextSharedPtr ssl_ctx_ and the other owns ClientContextSharedPtr ssl_ctx_, and they are created by calling different methods at context manager. I would like to leave this method in separate class.

@@ -40,6 +40,8 @@
#include "common/upstream/outlier_detection_impl.h"
#include "common/upstream/resource_manager_impl.h"

#include "server/init_manager_impl.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move init_manager_impl out of server/ and into a new subdirectory in source/ called init/. Feel free to do this in a followup but please add a TODO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO in init_manager_impl.h. Thanks.

*/
void onPreInitComplete();

/**
* Called by every concrete cluster after all Sds api targets registered at SDS init manager are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove references to SDS or make it clear that SDS is one of the things that init manager is used for. We are likely to add other things in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed references to SDS from comment. Thanks.

Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to get started. This is a pretty huge PR; FWIW, I strongly encourage shorter PRs for reviewabilty and velocity.

* Finds and returns a dynamic secret provider associated to SDS config. Create
* a new one if such provider does not exist.
*
* @param config_source a protobuf message object contains SDS config source.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/contains/containing a/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

* a new one if such provider does not exist.
*
* @param config_source a protobuf message object contains SDS config source.
* @param config_name a name that uniquely refers to the SDS config source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: full stop at end of sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

* @param config_name a name that uniquely refers to the SDS config source
* @param secret_provider_context context that provides components for creating and initializing
* secret provider.
* @return the dynamic TLS secret provider.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: @return TlsCertificateConfigProviderSharedPtr the dynamic TLS secret provider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@@ -18,7 +19,17 @@ template <class SecretType> class SecretProvider {
*/
virtual const SecretType* secret() const PURE;

// TODO(lizan): Add more methods for dynamic secret provider.
/**
* Add secret callback into secret provider.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments on thread safety? E.g. from which threads is it safe to call this, on which thread will the callback be invoked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are added. Thanks.

* Pass an init manager to register dynamic secret provider.
* @param init_manager instance of init manager.
*/
virtual void setInitManager(Init::Manager& init_manager) PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the init manager then be part of the factory context?

}
const auto& secret = resources[0];
MessageUtil::validate(secret);
if (!(secret.name() == sds_config_name_)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: !=

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

void SecretManagerImpl::removeDynamicSecretProvider(const std::string& map_key) {
ENVOY_LOG(debug, "Unregister secret provider. hash key: {}", map_key);

ASSERT(dynamic_secret_providers_.erase(map_key) == 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be RELEASE_ASSERT; otherwise in opt builds, this entire line disappears.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. Fixed.

TlsCertificateConfigProviderSharedPtr SecretManagerImpl::findOrCreateDynamicSecretProvider(
const envoy::api::v2::core::ConfigSource& sds_config_source, const std::string& config_name,
Server::Configuration::TransportSocketFactoryContext& secret_provider_context) {
std::string map_key = std::to_string(MessageUtil::hash(sds_config_source)) + config_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:const std::string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ASSERT(secret_provider_context.initManager() != nullptr);

std::function<void()> unregister_secret_provider = [map_key, this]() {
removeDynamicSecretProvider(map_key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some lifetime comments here. @ambuc recently hit issues where these kinds of callbacks were invoked and the equivalent of SdsApi outlived the equivalent of SecreteManagerImpl (this was in ListenerManager).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifetime issue has been captured by integration tests. We have adjusted the order of secret manager and other components to make sure SdsApi objects are destroyed before SecretManagerImpl. Comments are added. Thanks.

return std::make_unique<Ssl::SslSocket>(ssl_ctx_, Ssl::InitialState::Client);
// SDS would update ssl_ctx_ when Envoy is running.
// Need a read lock to let multiple threads gain read access to ssl_ctx_.
std::shared_lock<std::shared_timed_mutex> lock(ssl_ctx_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into this yet, but this is something I'd like to understand the necessity for better; usually in Envoy, needing to do shared memory concurrency is not needed, and other mechanisms like TLS posting are the right solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock is not necessary as @lizan points out here, I have removed them.

@htuch
Copy link
Member

htuch commented Aug 21, 2018

@JimmyCYJ also, personal plea to avoid force push; general GH etiquette avoids this to make reviewer lives easier (so they can just look a the delta between each PR).

Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
This was referenced Aug 22, 2018
@JimmyCYJ
Copy link
Member Author

@htuch Thanks for reviewing this PR, I have created PR #4231 which has SDS api and dummy socket, and they are not in use.

htuch pushed a commit that referenced this pull request Aug 24, 2018
Implement SDS API and dummy socket, and they are not in use. This is split from PR #4176.

Risk Level: Low
Testing: Unit tests
Docs Changes: None

Fixes #1194

Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
Signed-off-by: JimmyCYJ <[email protected]>
@JimmyCYJ
Copy link
Member Author

This PR didn't update correctly. I am going to close this one and create a new PR.

@JimmyCYJ JimmyCYJ closed this Aug 24, 2018
@JimmyCYJ JimmyCYJ deleted the secret_provider_context branch August 24, 2018 21:55
@JimmyCYJ
Copy link
Member Author

I have created PR #4256, please take a look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants