-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: use tenant client/server certs #51503
Conversation
dab2e0e
to
dc54e00
Compare
dc54e00
to
145ab26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I don't see anything major here that needs to be discussed further, though I'm ignoring the last commit.
Reviewed 24 of 24 files at r1, 1 of 1 files at r2, 7 of 7 files at r3, 29 of 29 files at r4, 28 of 28 files at r5, 2 of 2 files at r6, 5 of 5 files at r7, 1 of 1 files at r8.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/rpc/context.go, line 239 at r7 (raw file):
} // TODO(tbg): do something else if `o.tenant`.
This is the same TODO as the one five lines up, right? There's not multiple "something else"s that you want to do?
pkg/rpc/context.go, line 770 at r7 (raw file):
// TODO(tbg): remove this override when the KV layer can authenticate tenant // client certs. const override = false
Why keep the code?
pkg/rpc/tls.go, line 85 at r3 (raw file):
ctx.lazy.certificateManager.Do(func() { var opts []security.Option if !roachpb.IsSystemTenantID(ctx.tenID.ToUint64()) {
You can just do ctx.tenID != roachpb.SystemTenantID
.
pkg/rpc/tls.go, line 86 at r3 (raw file):
var opts []security.Option if !roachpb.IsSystemTenantID(ctx.tenID.ToUint64()) { opts = append(opts, security.ForTenant(ctx.tenID.String()))
Any reason to lose type safety here with the tenantIdentifier
?
pkg/rpc/tls.go, line 177 at r7 (raw file):
// certificate for the configured tenant from the cert manager. func (ctx *SecurityContext) GetTenantClientTLSConfig() (*tls.Config, error) { // Early out.
We have almost the exact same logic in each of these methods. Think it's worth generalizing? Something like:
func (ctx *SecurityContext) getTLSConfig(fn (*security.CertificateManager) (*tls.Config, error)) (*tls.Config, error) {
...
}
pkg/server/testserver.go, line 638 at r7 (raw file):
} } sqlCfg.TenantKVAddrs = []string{ts.TenantAddr()}
Good catch. Should this be ServingTenantAddr
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I'll see about this wip commit and try to merge this.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/rpc/tls.go, line 86 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Any reason to lose type safety here with the
tenantIdentifier
?
It didn't seem useful to have to deal with string/tenantID conversions inside of the security
package. Open to revisiting this after this PR.
pkg/rpc/tls.go, line 177 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We have almost the exact same logic in each of these methods. Think it's worth generalizing? Something like:
func (ctx *SecurityContext) getTLSConfig(fn (*security.CertificateManager) (*tls.Config, error)) (*tls.Config, error) { ... }
Yes, that might be a good idea. I would like to get this PR in first, though.
145ab26
to
d9c5fc3
Compare
Fixes cockroachdb#47898. Rebased on cockroachdb#51503 and cockroachdb#52034. Ignore all but the last 3 commits. This commit adds a collection of access control policies for the newly exposed tenant RPC server. These authorization policies ensure that an authenticated tenant is only able to access keys within its keyspace and that no tenant is able to access data from another tenant's keyspace through the tenant RPC server. This is a major step in providing crypto-backed logical isolation between tenants in a multi-tenant cluster. The existing auth mechanism is retained on the standard RPC server, which means that the system tenant is still able to access any key in the system.
fa00dee
to
4c7fca6
Compare
4c7fca6
to
579122a
Compare
I rebased onto #52034. This should turn green now. |
This is currently unused, but already initialized correctly: it's always SystemTenantID except on SQL tenant servers, where it's the tenant's ID. Release note: None
I worried for a short while that this would always return nil, but it does not. Release note: None
This makes sure that a certificate manager for an `rpc.Context` for a given tenant is aware of the tenant ID. This is not used yet, but a new TODO (to be addressed shortly) hints at where this will be used during dialing: when we're a tenant, use the tenant client certs instead of the client certs. Release note: None
I had previously made sure that the multi-tenancy certs were in a different subdirectory, but in hindsight this is not worth the hassle. Release note: None
We already have tests that want to use more than one tenant. These will need to have certs for each of the tenants that they use soon, so pave the way for that by adding a few hard-coded tenant IDs that have certs embedded. Release note: None
As of this commit, tenants would use their proper tenant client certs if it weren't for a manual override that was added. This override exists because the KV layer can not yet authenticate tenant client certs (this will change soon, in a follow-up to cockroachdb#50503). However, uncommenting both the override and the hack in `pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant client certs match those used by the KV nodes gives early validation that this "will work" once the KV side plays ball. Touches cockroachdb#47898. Release note: None
With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches cockroachdb#47898. Release note: None
579122a
to
b0c4384
Compare
bors r=nvanbenschoten |
Build failed: |
Failed on bors r=nvanbenschoten |
Build failed: |
Sigh
bors r=nvanbenschoten |
I have skipped the Example-ORMs target so this will be able to go in now |
Build failed: |
ok I thought I had skipped example-orms and it seems that I did not. |
oh that's because your bors run started before my conf change. Let's try that again. bors r=nvanbenschoten |
51503: rpc: use tenant client/server certs r=nvanbenschoten a=tbg With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches #47898. Release note: None - cli/flags,config: new flag for tenant KV listen addr - sql: route tenant KV traffic to tenant KV address - roachtest: configure --tenant-addr flag in acceptance/multitenant - rpc: add TenantID to rpc.ContextOptions - security: slight test improvement - rpc: pass TenantID to SecurityContext to CertManager - security: use a single test_certs dir - security: embed certs for a few hard-coded tenants - rpc: *almost* use tenant client certs (on tenants) - rpc: use tenant client/server certs where appropriate 52281: bulkio: Correctly handle exhausting retries when reading from HTTP. r=rohany a=miretskiy Fixes #52279 Return an error if we exhaust the retry budget when reading from HTTP. Release Notes: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Build failed (retrying...): |
Build succeeded: |
Fixes cockroachdb#47898. Rebased on cockroachdb#51503 and cockroachdb#52034. Ignore all but the last 3 commits. This commit adds a collection of access control policies for the newly exposed tenant RPC server. These authorization policies ensure that an authenticated tenant is only able to access keys within its keyspace and that no tenant is able to access data from another tenant's keyspace through the tenant RPC server. This is a major step in providing crypto-backed logical isolation between tenants in a multi-tenant cluster. The existing auth mechanism is retained on the standard RPC server, which means that the system tenant is still able to access any key in the system.
51803: cmd/docgen: add HTTP extractor r=mjibson a=mjibson Add a way to extract docs from the status.proto HTTP endpoint. These can be imported into the docs project as needed. Release note: None 52083: roachtest: small misc r=andreimatei a=andreimatei See individual commits. 52094: rpc: implement tenant access control policies at KV RPC boundary r=nvanbenschoten a=nvanbenschoten Fixes #47898. Rebased on #51503 and #52034. Ignore all but the last 3 commits. This commit adds a collection of access control policies for the newly exposed tenant RPC server. These authorization policies ensure that an authenticated tenant is only able to access keys within its keyspace and that no tenant is able to access data from another tenant's keyspace through the tenant RPC server. This is a major step in providing crypto-backed logical isolation between tenants in a multi-tenant cluster. The existing auth mechanism is retained on the standard RPC server, which means that the system tenant is still able to access any key in the system. 52352: sql/pgwire: add regression test for varchar OIDs in RowDescription r=jordanlewis a=rafiss See issue #51360. The bug described in it was fixed somewhat accidentally, so this test will verify that we don't regress again. Release note: None 52386: opt: add SerializingProject exec primitive r=RaduBerinde a=RaduBerinde The top-level projection of a query has a special property - it can project away columns that we want an ordering on (e.g. `SELECT a FROM t ORDER BY b`). The distsql physical planner was designed to tolerate such cases, as they were much more common with the heuristic planner. But the new distsql exec factory does not; it currently relies on a hack: it detects this case by checking if the required output ordering is `nil`. This is fragile and doesn't work in all cases. This change adds a `SerializingProject` primitive which is like a SimpleProject but it forces serialization of all parallel streams into one. The new primitive is used to enforce the final query presentation. We only need to pass column names for the presentation, so we remove `RenameColumns` and remove the column names argument from `SimpleProject` (simplifying some execbuilder code). We also fix a bug in `ConstructSimpleProject` where we weren't taking the `PlanToStreamColMap` into account when building the projection. Release note: None Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
With this PR, tenant SQL servers use tenant client certificates to
connect to the tenant server endpoint at the KV layer. They were
previously using the KV-internal node certs.
No validation is performed as of this PR, but this is the obvious
next step.
Follow-up work will assertions that make sure that we don't see
tenants "accidentally" use the node certs for some operations
when they are available (as is typically the case during testing).
Finally, there will be some work on the heartbeats exchanged by
the RPC context. We don't want a SQL tenant's time signal to
ever trigger KV nodes to crash, for example.
Touches #47898.
Release note: None