Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make node TLS certificates totally ephemeral #2675

Merged
merged 8 commits into from
Apr 4, 2020

Conversation

abukosek
Copy link
Contributor

@abukosek abukosek commented Feb 14, 2020

Closes #2098.

TODO:

  • Add a flag to set node TLS certificate rotation interval.
  • Figure out why the storage client unit tests fail with storage client: failed to write to any storage node (I used GetCertificate where I should have used GetClientCertificate 🤦‍♂️ ).
  • Fix sentry nodes (these need to have persistent TLS certs, so no rotation for you).
  • Fix e2e tests (need to take NextCertificate into account when comparing nodes in the registry_cli test).
  • Enable cert rotation in e2e tests.
  • Fix e2e tests again (tests using the IAS proxy should also not do rotation).
  • Put the TLSSigner stuff back in (I temporarily removed this before to make debugging easier).
  • Write changelog fragment.
  • Fix the gRPC proxy so it dials connections itself rather than relying on an externally-established connection.
  • Upstream node should send its TLS certs to the sentry node on init and on every rotation.
  • Enable TLS cert rotation in e2e tests that use sentry nodes.
  • Make the IAS proxy not panic when connection is dropped, it should just reconnect.
  • Address review comments.
  • Add check to gRPC proxy to make sure that upstream connection is in a good state before forwarding calls (and re-dial upstream if the connection has gone bad).
  • Add separate long-term TLS certificates for the sentry node's control connection client.
  • Address review comments.

go/common/grpc/grpc.go Outdated Show resolved Hide resolved
go/common/identity/identity.go Show resolved Hide resolved
}
nextCert, err := tlsCert.Generate(CommonName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure to reduce certificate expiration once this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentry nodes and IAS proxies require non-rotating certificates, so I've left this as-is for now.

go/runtime/committee/client.go Outdated Show resolved Hide resolved
go/worker/registration/worker.go Show resolved Hide resolved
go/registry/api/api.go Outdated Show resolved Hide resolved
go/registry/api/api.go Show resolved Hide resolved
@abukosek abukosek force-pushed the andrej/feature/ephemeral-node-tls-certs branch 26 times, most recently from 413c698 to 5e69dc3 Compare February 20, 2020 14:55
@codecov
Copy link

codecov bot commented Feb 20, 2020

Codecov Report

Merging #2675 into master will increase coverage by 0.02%.
The diff coverage is 50.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2675      +/-   ##
==========================================
+ Coverage   65.67%   65.69%   +0.02%     
==========================================
  Files         342      342              
  Lines       32579    32866     +287     
==========================================
+ Hits        21395    21591     +196     
- Misses       8483     8581      +98     
+ Partials     2701     2694       -7     
Impacted Files Coverage Δ
go/common/node/node.go 64.54% <0.00%> (-0.55%) ⬇️
go/oasis-node/cmd/ias/auth_registry.go 0.00% <0.00%> (ø)
go/oasis-node/cmd/identity/identity.go 37.50% <0.00%> (ø)
go/oasis-node/cmd/storage/benchmark/benchmark.go 3.92% <0.00%> (ø)
go/worker/sentry/grpc/worker.go 10.56% <0.00%> (-1.25%) ⬇️
go/worker/sentry/grpc/init.go 25.24% <6.66%> (-6.01%) ⬇️
go/registry/api/api.go 37.57% <30.30%> (-0.56%) ⬇️
go/oasis-node/cmd/node/node.go 53.95% <33.33%> (ø)
go/sentry/sentry.go 52.38% <33.33%> (-8.34%) ⬇️
go/sentry/api/grpc.go 54.16% <40.00%> (-23.62%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aca538...ee4d9fc. Read the comment docs.

@abukosek abukosek force-pushed the andrej/feature/ephemeral-node-tls-certs branch 2 times, most recently from a0e883a to f9a5dbe Compare February 20, 2020 16:09
@@ -50,7 +50,7 @@ func doNodeInit(cmd *cobra.Command, args []string) {
)
os.Exit(1)
}
if _, err = identity.LoadOrGenerate(dataDir, nodeSignerFactory); err != nil {
if _, err = identity.LoadOrGenerate(dataDir, nodeSignerFactory, true); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that nodes provisioned this way won't have ephemeral TLS certs unless they obliterate the persisted cert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As rotation is off by default for now, I thought this would make most sense.
Once we enable rotation by default, we can also disable persistent TLS certs here or add a flag to control this behavior.

@abukosek abukosek force-pushed the andrej/feature/ephemeral-node-tls-certs branch 2 times, most recently from 57581bb to 6e591d1 Compare March 31, 2020 06:11
.buildkite/code.pipeline.yml Outdated Show resolved Hide resolved
@@ -0,0 +1,6 @@
node: Add automatic TLS certificate rotation support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably open an issue in https://github.com/oasislabs/docs once this is merged, so we wont forget adding/updating docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

go/common/identity/identity.go Show resolved Hide resolved
ch, sub, err := auth.client.WatchRuntimes(ctx)
var redialAttempts uint

Redial:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use the cenkalti/backoff package we use in other similar cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be an overkill for this simple use case -- the only time this connection is dropped is if the upstream node rotates its TLS certificates, a constant 2s retry is good enough.

go/worker/sentry/grpc/init.go Show resolved Hide resolved
go/worker/sentry/worker.go Outdated Show resolved Hide resolved
@abukosek abukosek added the c:breaking/cfg Category: breaks configuration label Mar 31, 2020
@abukosek abukosek force-pushed the andrej/feature/ephemeral-node-tls-certs branch 3 times, most recently from 1216289 to 36e985d Compare April 3, 2020 10:10
@abukosek
Copy link
Contributor Author

abukosek commented Apr 3, 2020

Thanks to everyone for your previous review comments. This PR is now ready for re-review :)

// TLSCertificate is a certificate that can be used for TLS.
TLSCertificate *tls.Certificate
// tlsSigner is a node TLS certificate signer.
tlsSigner signature.Signer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of style I prefer private struct members to be separated from public ones (and not interleaved like it is here).

@abukosek abukosek force-pushed the andrej/feature/ephemeral-node-tls-certs branch from 36e985d to acdb3c9 Compare April 3, 2020 13:45
dialUpstream := func() error {
_, err := g.upstreamDialer(g.ctx)
if err != nil {
if numRetries < 60 {
Copy link
Member

@ptrus ptrus Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could instead wrap the existing constant backoff with WithMaxRetries

upstreamConn, err := initConnection(identity)
if err != nil {
return nil, fmt.Errorf("gRPC sentry worker initializing upstream connection failure: %w", err)
g.upstreamDialer = func(ctx context.Context) (*grpc.ClientConn, error) {
Copy link
Member

@ptrus ptrus Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the mutex be removed if instead the dialer just did initConnection and returned it (without storing it in the sentry worker struct)?

It might be also a bit cleaner, since in that case there will always be two connections, one the proxy will use and one the policy watcher? (and each would keep track of it's own connection like both already do, and if needed, initialize more connections). I think the only difference to current impl. would be that currently the connection is ?sometimes? shared (I think currently it depends if proxy or sentry first invoke the dialer, if the connection is shared or not - since upstreamDialer updates the internal connection state of the sentry, but doesn't the one of the proxy?).


On the other hand if we want to keep using one connection i think it could be done cleaner if upstreamDialer is it's own struct (and not part of sentry worker), that maintains a connection (or a pool of connections), and exposes a dial method that returns it (and redials it if necessary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to replace the policy watcher with a new sentry command (e.g. UpdatePolicies or something) that is called by the upstream node in a similar way to SetUpstreamTLSCertificates added in this PR. This will allow further simplifications of the sentry code, but that should be done in a separate PR.
I'll make an issue for this after I merge the PR.

@@ -0,0 +1,5 @@
Sentry nodes no longer require TLS certificate file of the upstream node
Copy link
Member

@ptrus ptrus Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 🎉

Copy link
Member

@ptrus ptrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left 2 minor comments, otherwise looks good! 👍

@abukosek abukosek force-pushed the andrej/feature/ephemeral-node-tls-certs branch from acdb3c9 to ee4d9fc Compare April 4, 2020 07:38
@abukosek abukosek merged commit 076f06f into master Apr 4, 2020
@abukosek abukosek deleted the andrej/feature/ephemeral-node-tls-certs branch April 4, 2020 08:06
@abukosek
Copy link
Contributor Author

abukosek commented Apr 4, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:breaking/cfg Category: breaks configuration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make node TLS certificates totally ephemeral
4 participants