security: add CA and client certs for tenant usage #49864

tbg · 2020-06-04T09:33:15Z

This commit introduces methods that create a CA certificate for use with
the auth broker envisioned in #49105 and, from that CA, client
certificates for use in #47898. They are not exposed through the cli
(yet), though the certificate loader already supports them.

Touches #49105.
Touches #47898.

Release note: None

cockroach-teamcity · 2020-06-04T09:33:21Z

This change is

knz

Reviewed 6 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl and @bdarnell)

bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl and @tbg)

pkg/security/certs.go, line 505 at r1 (raw file):

}

// WriteTenantClientPair writes a TenantClientPair into certsDir.

Where (on which kind of server) do tenant certs get written to disk? In my understanding of the design these certs are all ephemeral and can be kept in memory.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl and @bdarnell)

pkg/security/certs.go, line 505 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Where (on which kind of server) do tenant certs get written to disk? In my understanding of the design these certs are all ephemeral and can be kept in memory.

Your understanding is correct. In production a client pair either comes from the auth broker (either directly or injected at pod startup, likely through a file then?). I could move this into _test.go to keep the public API clean for now, though I anticipated using this for a first plumbing pass of the security work that switches to these certificates (without using the auth broker yet) as well.

bdarnell

Reviewed 4 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl and @tbg)

pkg/security/certificate_loader.go, line 90 at r1 (raw file):

	ClientPem
	// TenantClientPem describes a SQL tenant client certificate.
	TenantClientPem

How does TenantClientPem relate to NodePem (the "node client" cert)? Wouldn't a SQL tenant pod always use a tenant cert instead of a node cert? I'm tempted to say we shouldn't have separate variables for TenantClientPem and should just load the tenant cert as the node client cert.

At first I wanted to make this argument for TenantCAPem as well, but this gets trickier - KV servers need to be able to accept connections from both the system tenant and other tenants, and they must be able to distinguish between them (since the tenant CA is online in the auth broker, it's important that it not be able to mint omnipotent node certs). So maybe if we keep that we might as well keep TenantClientPem too.

pkg/security/certificate_manager.go, line 198 at r1 (raw file):

func CACertFilename() string { return "ca" + certExtension }

// TenantCACertPath returns the expected file path for the Tenant CA

FYI I have a branch I'm working on that unexports all the "Path" methods from CertificateManager.

pkg/security/certificate_manager.go, line 269 at r1 (raw file):

// TenantClientKeyFilename returns the expected file name for the user's key.
func TenantClientKeyFilename(tenantIdentifier string) string {
	return "tenant." + tenantIdentifier + keyExtension

This makes me nervous because it implies we might have keys for multiple clients written to disk in the same place. As I said in certs.go, I'd just avoid doing anything with naming conventions for these certs.

pkg/security/certs.go, line 443 at r1 (raw file):

}

// TenantClientPair are client certs for use with multi-tenancy.

Why a specific type instead of KeyPair? Why hard-code RSA for the private key ?(instead of crypto.PrivateKey. For that matter, we should get @aaron-crl 's opinion on whether we should be using RSA, ECDSA, or something else here. This is a purely internal part of the system so we don't have to worry as much about rest-of-world compatibility) What format are those bytes in? Why isn't this just a tls.Certificate (which despite the name can carry a private key)?

(for a lot of my comments the answer could be "because that's what this file already does elsewhere". It's accumulated a lot of tech debt but we don't have to fix it all right now)

pkg/security/certs.go, line 505 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Your understanding is correct. In production a client pair either comes from the auth broker (either directly or injected at pod startup, likely through a file then?). I could move this into _test.go to keep the public API clean for now, though I anticipated using this for a first plumbing pass of the security work that switches to these certificates (without using the auth broker yet) as well.

OK. For some additional context, I'm working on some cleanups here to make the certificate manager less file-centric (anticipating future usage of HSM/KMS), and to split it up so we don't have all the different use cases sharing one cert manager object with a bunch of possibly-nil fields.

So I don't think following the current cert patterns is a requirement or even necessarily a good idea. If you'll just be loading certs from disk in testing scenarios, don't worry about enforcing conventions and require the path to the files to be passed on the command line.

pkg/security/x509.go, line 198 at r1 (raw file):

	lifetime time.Duration,
	user string,
	hosts ...string,

Why would client certs have hosts? Are you trying to make them non-transferable and restricted to a single source pod? Does that work? It kind of makes sense but I can't find any reference to this being done. (and in particular verifying clients with DNS lookups is problematic; it might only make sense for it to use IP addresses)

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, and @tbg)

pkg/security/certificate_loader.go, line 90 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

How does TenantClientPem relate to NodePem (the "node client" cert)? Wouldn't a SQL tenant pod always use a tenant cert instead of a node cert? I'm tempted to say we shouldn't have separate variables for TenantClientPem and should just load the tenant cert as the node client cert.

At first I wanted to make this argument for TenantCAPem as well, but this gets trickier - KV servers need to be able to accept connections from both the system tenant and other tenants, and they must be able to distinguish between them (since the tenant CA is online in the auth broker, it's important that it not be able to mint omnipotent node certs). So maybe if we keep that we might as well keep TenantClientPem too.

Yes, tenants have no use for node certs. They use their tenant client cert to talk to KV and they also have some additional cert they use for their Postgres-TLS which doesn't factor in here (yet, though we'll need one). Effectively, we'll set up an rpc.Context for them that uses the tenant client cert instead of the node client cert but otherwise looks mostly the same.

For practical reasons, I would prefer to just "do how this package does" at the time of writing and (i.e. this PR mostly as is) - if you're amenable to it - rely on you to revisit when you're moving away from the file-centric view (which implies that you'll be figuring out the certs-dir complications across the codebase that would otherwise be front-loaded on me).
I'm mostly trying to move through quickly to get these certs into the right places in the SQL tenant cli using the standard --certs-dir flag to build out the tenant server endpoint in the KV layer, and as of this PR I'm there (mod the auth broker, i.e. for testing with filesystem certs, and possibly multiple different tenant's processes at once). If I do anything differently, I have to rework the flags, how the RPC context loads the certs, etc, I think there's just going to be a lot of fallout that I'm not the right person for handling; I think this is better addressed in the work that you've set to do, if I understand it correctly.

I also think there might be a good case for loading tenant cli certs from disk in prod: we will need to inject an initial tenant cert at boot time; naively the filesystem is the right way to do it (or is an env var better?).

Let me know which changes you want me to make before merging.

pkg/security/certificate_manager.go, line 269 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This makes me nervous because it implies we might have keys for multiple clients written to disk in the same place. As I said in certs.go, I'd just avoid doing anything with naming conventions for these certs.

👍 but see my other comment.

pkg/security/certs.go, line 505 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

OK. For some additional context, I'm working on some cleanups here to make the certificate manager less file-centric (anticipating future usage of HSM/KMS), and to split it up so we don't have all the different use cases sharing one cert manager object with a bunch of possibly-nil fields.

So I don't think following the current cert patterns is a requirement or even necessarily a good idea. If you'll just be loading certs from disk in testing scenarios, don't worry about enforcing conventions and require the path to the files to be passed on the command line.

I agree in principal, but see my other comment.

pkg/security/x509.go, line 198 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why would client certs have hosts? Are you trying to make them non-transferable and restricted to a single source pod? Does that work? It kind of makes sense but I can't find any reference to this being done. (and in particular verifying clients with DNS lookups is problematic; it might only make sense for it to use IP addresses)

I just saw that it seemed possible, and so it seemed desirable to try to do it. If you consider this exotic/not worth it I can take it out (otherwise I'd just see whether it works once I have the auth broker). It does seem nice to not have the "token" be transferable to another host. OTOH maybe this is over the top and just causes headaches with pod migrations etc if that is a thing.

bdarnell

No changes required at this time except removing the host option when generating client certs.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, and @tbg)

pkg/security/x509.go, line 198 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I just saw that it seemed possible, and so it seemed desirable to try to do it. If you consider this exotic/not worth it I can take it out (otherwise I'd just see whether it works once I have the auth broker). It does seem nice to not have the "token" be transferable to another host. OTOH maybe this is over the top and just causes headaches with pod migrations etc if that is a thing.

I'd leave it out - in our standard setup, we already have client certs with IPs and hostnames set in SANs but they're not used for client cert verification (this is because our node certs are hybrid client/server). We wouldn't want to add enforcement that would suddenly change the behavior of existing certs.

This commit introduces methods that create a CA certificate for use with the auth broker envisioned in cockroachdb#49105 and, from that CA, client certificates for use in cockroachdb#47898. They are not exposed through the cli (yet), though the certificate loader already supports them. Touches cockroachdb#49105. Touches cockroachdb#47898. Release note: None

tbg

TFTR! I removed the host option.

bors r=bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @knz, and @tbg)

craig · 2020-06-10T08:25:03Z

Build succeeded

GitHub CI (Cockroach)

This is an addendum to cockroachdb#49864. I had added a tenant CA and tenant client certs but did not add tenant server CA and certs. However, this is necessary:a KV server run in a multi-tenancy environment will use - tenant server certs (issued by tenant server CA) to host TLS-enabled endpoint (for use by SQL tenant processes) - tenant client CA (to validate incoming connections) - node server certs (for intra-KV traffic). The expectation is that the tenant server certs/CA will be using external PKI (i.e. publicly trusted certs), but I learned quickly while prototyping cockroachdb#47898 that it will well be worth it to generate these certs internally as well. Release note: None

tbg added the A-multitenancy Related to multi-tenancy label Jun 4, 2020

tbg requested review from bdarnell, knz and aaron-crl June 4, 2020 09:33

tbg force-pushed the mt-authtoken branch from 6b91729 to ad18650 Compare June 4, 2020 09:33

knz approved these changes Jun 4, 2020

View reviewed changes

bdarnell reviewed Jun 5, 2020

View reviewed changes

tbg requested a review from bdarnell June 5, 2020 17:47

tbg commented Jun 5, 2020

View reviewed changes

bdarnell reviewed Jun 5, 2020

View reviewed changes

tbg commented Jun 5, 2020

View reviewed changes

bdarnell approved these changes Jun 8, 2020

View reviewed changes

tbg force-pushed the mt-authtoken branch from ad18650 to 4f0b45f Compare June 10, 2020 07:39

tbg requested review from bdarnell and knz June 10, 2020 07:40

tbg commented Jun 10, 2020

View reviewed changes

craig bot merged commit 7cf408b into cockroachdb:master Jun 10, 2020

tbg deleted the mt-authtoken branch June 10, 2020 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: add CA and client certs for tenant usage #49864

security: add CA and client certs for tenant usage #49864

tbg commented Jun 4, 2020

cockroach-teamcity commented Jun 4, 2020

knz left a comment

bdarnell left a comment

tbg left a comment

bdarnell left a comment

tbg left a comment

bdarnell left a comment

tbg left a comment

craig bot commented Jun 10, 2020

security: add CA and client certs for tenant usage #49864

security: add CA and client certs for tenant usage #49864

Conversation

tbg commented Jun 4, 2020

cockroach-teamcity commented Jun 4, 2020

knz left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

craig bot commented Jun 10, 2020

Build succeeded