Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload TLS certificates on change #2389

Merged
merged 1 commit into from
Aug 21, 2020

Conversation

pavolloffay
Copy link
Member

@pavolloffay pavolloffay commented Aug 13, 2020

Resolves jaegertracing/jaeger-operator#1099

This PR enables reloading of TLS certs in Elasticsearch client. The same approach can be used for other clients using our tlscfg or we can enable it by default.

I have tested this on OCP 4.4 and Jaeger Operator with self-provisioned ES. To trigger the cert change I have removed master certs and wiped out the tmp dir in the operator pod.

@pavolloffay pavolloffay requested a review from jpkrohling August 13, 2020 15:21
@pavolloffay pavolloffay requested a review from a team as a code owner August 13, 2020 15:21
pkg/es/config/config.go Outdated Show resolved Hide resolved
pkg/config/tlscfg/reload_test.go Outdated Show resolved Hide resolved
pkg/es/config/config.go Outdated Show resolved Hide resolved
Config: &tls.Config{
ServerName: c.TLS.ServerName,
},
CertPath: c.TLS.CertPath,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro instead of relying on the driver to load the certs I have changed it to use our TLS loading.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. I have a few questions, and the only place that deserves some attention is at the commented out tests.

cmd/ingester/main.go Outdated Show resolved Hide resolved
cmd/ingester/main.go Outdated Show resolved Hide resolved
cmd/opentelemetry/cmd/all-in-one/main.go Outdated Show resolved Hide resolved
pkg/config/tlscfg/reload_test.go Outdated Show resolved Hide resolved
pkg/config/tlscfg/reload.go Outdated Show resolved Hide resolved
var err error
switch event.Name {
case w.opts.CAPath:
err = addCertToPool(w.opts.CAPath, rootCAs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if the contents of a CAPath has changed, the new CA is added to the pool, but the old isn't removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the pool does not expose API to remove old certificates

pkg/config/tlscfg/reload.go Outdated Show resolved Hide resolved
if !ok {
return
}
w.logger.Error("Watcher got error", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure what kind of errors it would get. If those are high-frequency as well, it would probably be better to have them at debug level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this at error level for now until we find out that these are high-freq (I haven't seen this being logged yet).

@codecov
Copy link

codecov bot commented Aug 19, 2020

Codecov Report

Merging #2389 into master will decrease coverage by 0.01%.
The diff coverage is 94.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2389      +/-   ##
==========================================
- Coverage   95.60%   95.58%   -0.02%     
==========================================
  Files         206      208       +2     
  Lines       10549    10676     +127     
==========================================
+ Hits        10085    10205     +120     
- Misses        396      398       +2     
- Partials       68       73       +5     
Impacted Files Coverage Δ
cmd/collector/app/server/grpc.go 65.38% <0.00%> (ø)
plugin/storage/badger/factory.go 96.39% <0.00%> (-1.77%) ⬇️
cmd/collector/app/collector.go 68.05% <33.33%> (-1.51%) ⬇️
pkg/config/tlscfg/cert_watcher.go 94.87% <94.87%> (ø)
cmd/agent/app/reporter/client_metrics.go 100.00% <100.00%> (ø)
cmd/agent/app/reporter/grpc/builder.go 100.00% <100.00%> (ø)
cmd/agent/app/reporter/grpc/collector_proxy.go 100.00% <100.00%> (ø)
cmd/query/app/server.go 93.33% <100.00%> (+0.15%) ⬆️
pkg/config/tlscfg/options.go 100.00% <100.00%> (ø)
pkg/multicloser/multicloser.go 100.00% <100.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3eeedf0...88e5c65. Read the comment docs.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but still need to review the reload_test

ServerName string `mapstructure:"server_name"` // only for client-side TLS config
ClientCAPath string `mapstructure:"client_ca"` // only for server-side TLS config for client auth
SkipHostVerify bool `mapstructure:"skip_host_verify"`
watcher *certWatcher `mapstructure:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new name!

@@ -76,6 +76,9 @@ func main() {
consumer.Start()

svc.RunAndThen(func() {
if err := options.TLS.Close(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this not be already included in storageFactory.Close()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storage factory closes the TLS config for the producer. This closes the consumer.

@@ -70,5 +74,6 @@ func (b ProxyBuilder) GetManager() configmanager.ClientConfigManager {
// Close closes connections used by proxy.
func (b ProxyBuilder) Close() error {
b.reporter.Close()
b.tlsCloser.Close()
return b.conn.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use multiclose.Wrap(a, b, c).Close()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, the reporter didn't implement io.Closer I have added return error to it.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if err := c.tlsCloser.Close(); err != nil {
c.logger.Error("failed to close TLS certificate watcher", zap.Error(err))
}

return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this function keeps logging, would be cleaner to return them via multierr.Wrap, but doesn't need to be in this PR.

Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay pavolloffay merged commit 09fde54 into jaegertracing:master Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restart Jaeger collector and query pods when Elasticsearch secrets are changed
3 participants